Compare commits

...

1609 Commits

Author SHA1 Message Date
493f9e0b73 fix common test refs 2025-10-20 14:35:50 +02:00
d3a3cbd657 more convert_slow models 2025-10-20 13:11:39 +02:00
48eeb50c32 more models --> fast only 2025-10-17 15:26:28 +02:00
d43412a306 apply feedback - delete bert duplicates 2025-10-16 13:48:20 +02:00
0c3caff0fa update all berttokenizer based models 2025-10-15 23:23:10 +02:00
b2c320c297 new toks 2025-10-15 17:13:03 +02:00
f4d956a292 always load blank, then set _tokenizer if we have it 2025-10-14 18:10:14 +02:00
790c092390 fix docstring 2025-10-14 15:36:28 +02:00
c80dd1dba2 update legacy 2025-10-14 15:33:03 +02:00
c4f045c4c5 update legacy 2025-10-14 15:29:36 +02:00
36bc3ef6a5 add _model 2025-10-14 15:10:47 +02:00
ab77f57b39 added renaming 2025-10-14 15:04:21 +02:00
9136d3c801 rm legacy from llama 2025-10-14 14:44:41 +02:00
0e5dbdf4a5 refactor 2025-10-14 14:34:57 +02:00
51e62e1fbd gemma test fix 2025-10-10 18:51:29 +02:00
5fe5666cc1 fixes missed 2025-10-10 18:31:11 +02:00
a9263d1da8 rm pickle tests 2025-10-10 14:52:53 +02:00
19c9b09805 rm specialtokenmixin and stale functions 2025-10-10 14:49:35 +02:00
4980a2fdd6 revert _pad 2025-10-10 13:53:38 +02:00
14d2a8ca48 speed up added tokens 2025-10-09 12:02:12 +02:00
82653f78e8 cleaned up base to be more more abstract for other backends to implement 2025-10-08 19:31:37 +02:00
e0a260d529 cut base down 2025-10-08 17:27:02 +02:00
3ee3525318 spiece tests 2025-10-07 14:21:12 +02:00
d411492d05 rm functions dedicated for batched input 2025-10-07 12:38:47 +02:00
193684d9eb load PreTrainedSentencePieceTokenizer fallback 2025-10-07 12:18:19 +02:00
6c25f26f14 split up tests and remove common ones that shoudl not be run for each model 2025-10-03 13:26:38 +02:00
ec13e3986d gemma 2025-09-30 16:53:15 +02:00
19138cbeb7 cohere 2025-09-30 11:55:05 +02:00
db8923c299 rm prepare_for_model 2025-09-30 11:14:17 +02:00
21433e1878 rm call_one and batch_encode_plus 2025-09-30 11:07:08 +02:00
7fb3d7727f qwen2 2025-09-30 10:54:41 +02:00
42d4e798a1 rm slow qwen2 tok 2025-09-29 17:13:33 +02:00
117ce1dcc3 add qwen2 2025-09-29 15:16:56 +02:00
ba3a0a4654 llama refactored test - mixin temporary 2025-09-29 11:57:10 +02:00
e4b29559bb rm old common tests 2025-09-26 18:08:32 +02:00
26e0887437 save tests 2025-09-26 12:09:13 +02:00
cacf09e854 handle blank tok 2025-09-25 16:06:22 +02:00
f2022400a4 simplify test 2025-09-25 14:01:39 +02:00
dc0611f719 llama 2025-09-25 13:30:58 +02:00
d5e56bbd2c move update post processor and add bos eos properties 2025-09-25 13:30:58 +02:00
87cfea8a20 create_fast_tokenizer file 2025-09-25 13:30:58 +02:00
73be8c48b2 rm protobuf dependency 2025-09-25 13:30:58 +02:00
d7af5a54ed rm slow 2025-09-25 13:30:58 +02:00
44682e7131 Adapt and test huggingface_hub v1.0.0 (#40889)
* Adapt and test huggingface_hub v1.0.0.rc0

* forgot to bump hfh

* bump

* code quality

* code quality

* relax dependency table

* fix has_file

* install hfh 1.0.0.rc0 in circle ci jobs

* repostiryo

* push to hub now returns a commit url

* catch HfHubHTTPError

* check commit on branch

* add it back

* fix ?

* remove deprecated test

* uncomment another test

* trigger

* no proxies

* many more small changes

* fix load PIL Image from httpx

* require 1.0.0.rc0

* fix mocked tests

* fix others

* unchange

* unchange

* args

* Update .circleci/config.yml

* Bump to 1.0.0.rc1

* bump kernels version

* fix deps
2025-09-25 11:13:50 +00:00
750dd2a401 Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)
Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts
2025-09-25 10:33:46 +00:00
7258ea44bc Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-24 16:44:42 +02:00
2c4caa19e7 dummy commit (#41133)
* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-24 16:31:46 +02:00
6d1875924c Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head
2025-09-24 13:13:18 +01:00
3ca43d34b1 Fixed MXFP4 model storage issue (#41118) 2025-09-24 12:11:51 +00:00
b33cb70097 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment
2025-09-24 11:54:55 +00:00
b0c7034d58 Remove self-assignment (#41062)
* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-24 12:43:17 +01:00
04a0bb569c Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:34:12 +00:00
071c7b1423 Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:27:37 +00:00
80f20e0ff8 [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-24 11:18:27 +00:00
1d81247b0c [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors

* enable torchao safetensors support

* add more version checking
2025-09-24 12:32:47 +02:00
b533cec74d Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-24 10:17:41 +00:00
65dcd66cc8 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 12:01:27 +02:00
43a613c8da Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-09-24 06:37:21 +00:00
f64354e89a Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 16:20:01 -07:00
99b0995138 Remove bad test skips (#41109)
* remove bad skips

* remove more

* fix inits
2025-09-23 20:39:28 +02:00
00f3d90720 Fix _get_test_info for inherited tests (#41106)
* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 19:35:24 +02:00
cfa022e719 [tests] gpt2 + CausalLMModelTester (#41003)
* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder
2025-09-23 18:07:06 +01:00
869735d37d 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma
2025-09-23 16:20:00 +00:00
71717ce91c docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
2025-09-23 09:18:49 -07:00
946e5f95ea fix wrong height and width when read video use torchvision (#41091) 2025-09-23 12:35:44 +00:00
870add3daf Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:43:17 +00:00
ae60692821 Remove unused arguments (#40916)
* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:40:51 +00:00
f682797866 Fix typing (#40788)
* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:36:02 +00:00
f4a6c65951 Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:27:04 +00:00
89e0f472f4 Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:14:11 +00:00
62ce6fcb60 Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script

* Adjust vars
2025-09-23 13:05:27 +02:00
257fe5eea8 Switch to python:3.10-slim for CircleCI docker images (#41067)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 12:48:48 +02:00
0ec0325781 Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-23 11:53:51 +02:00
577fa6f167 fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2025-09-23 09:50:23 +00:00
03c92884b5 Update team member list for some CI workflows (#41094)
* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 09:48:40 +00:00
cbb290ec23 Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files
2025-09-22 10:36:20 -07:00
8048c614bf [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions
2025-09-22 09:51:39 -07:00
aa30e0642e Update quantization CI (#41068)
* fix

* new everything

* fix
2025-09-22 18:10:16 +02:00
1bb69cce82 Fix CI jobs being all red 🔴 (false positive) (#41059)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 16:51:00 +02:00
f15258dec2 Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 14:29:50 +00:00
2ec37649e2 Ci utils (#40978)
* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License
2025-09-22 16:16:19 +02:00
b9d337b6f3 Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload

* Address review comments

* Address review comments
2025-09-22 14:13:46 +00:00
646ff51d1a Simplify unnecessary Optional typing (#40839)
Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:50 +00:00
c9939b3ab6 Remove repeated import (#40937)
* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:13 +00:00
4f36011545 [testing] Fix seed_oss (#41052)
* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-22 14:54:30 +02:00
2b8a7e82b5 Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking
2025-09-22 13:42:34 +01:00
226667ec2f Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 13:42:26 +01:00
6eff44bb8d Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:38:07 +00:00
9ff47a71e4 Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-09-22 12:21:38 +00:00
ae9ef2e151 docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 13:21:15 +01:00
f3c481ed87 Use torch.autocast (#40975)
* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:18:24 +00:00
37152f8446 Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:31:46 +00:00
8a52288dba Remove optax (#41030)
Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:30:39 +00:00
5f891b36cd Fix typing of tuples (#41028)
* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:29:07 +00:00
c05f9d2f0e [testing] Fix qwen2_audio (#41018)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 10:45:31 +00:00
55a1eaf6f0 Fix Qwen video tests (#41049)
fix test
2025-09-22 12:28:11 +02:00
db802aafa4 Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-09-22 10:06:59 +00:00
8a2f24a321 Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved

* redundant if condition fix
2025-09-22 09:47:34 +00:00
ebbcf00ad1 Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-09-21 23:46:27 +02:00
67097bf340 Fix benchmark runner argument name (#41012) 2025-09-20 10:53:56 +02:00
8076e755e5 Update after #41007 (#41014)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 21:55:46 +02:00
022c882e14 Fix Glm4v test (#41011)
fix
2025-09-19 18:54:26 +02:00
966b3dbcbe Fix PhimoeIntegrationTest (#41007)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:43:46 +00:00
04bf4112f2 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-19 16:41:22 +01:00
dfc230389c 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point

* update references to transformers-cli
2025-09-19 14:40:27 +00:00
8010f5d1d9 Patch more unittest.case.TestCase.assertXXX methods (#41008)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:38:12 +02:00
5bf633b32a [tests] update test_left_padding_compatibility (and minimize overwrites) (#40980)
* update test (and overwrites)

* better test comment

* 0 as a default for
2025-09-19 15:36:26 +01:00
df12617914 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
2025-09-19 14:36:12 +00:00
2a538b2ed4 fix dict like init for ModelOutput (#41002)
* fix dict like init

* style
2025-09-19 16:14:44 +02:00
96a3e898cd RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:50:26 +00:00
98c8523434 Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree

* fix new models
2025-09-19 09:47:28 -04:00
767f8a4c75 Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:18:38 +00:00
9d9c4d24c5 Make EfficientLoFTRModelTest faster (#41000)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 12:51:05 +00:00
b4ba4e1da0 [RMSNorm] Fix rms norm init for models that center around 1 (#40796)
* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
2025-09-19 12:15:36 +00:00
fce746512b [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
2025-09-19 12:04:12 +01:00
ddfa3d4402 blt wip (#38579)
* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
2025-09-19 11:55:55 +02:00
46ea7e613d [testing] test num_hidden_layers being small in model tester (#40992)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 11:45:07 +02:00
ebdc17b8e5 ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
2025-09-19 10:39:21 +01:00
e2dbde280f Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs

* more
2025-09-19 11:28:34 +02:00
155f7e2e62 🔴[Attention] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style
2025-09-19 11:23:58 +02:00
61eff450d3 Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description
2025-09-19 08:54:49 +00:00
5f6e278a51 Remove set_model_tester_for_less_flaky_tests (#40982)
remove
2025-09-18 18:56:10 +02:00
4df2529d79 🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760)
* setup

* start the purge

* continue the purge

* more and more

* more

* continue the quest: remove loading tf/jax checkpoints

* style

* fix configs

* oups forgot conflict

* continue

* still grinding

* always more

* in tje zone

* never stop

* should fix doc

* fic

* fix

* fix

* fix tests

* still tests

* fix non-deterministic

* style

* remove last rebase issues

* onnx configs

* still on the grind

* always more references

* nearly the end

* could it really be the end?

* small fix

* add converters back

* post rebase

* latest qwen

* add back all converters

* explicitly add functions in converters

* re-add
2025-09-18 18:27:39 +02:00
5ac3c5171a Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 18:27:27 +02:00
d9d7f6a6b9 Revert change in compile_friendly_resize (#40645)
fix
2025-09-18 16:25:45 +01:00
738b223f57 Add captured actual outputs to CI artifacts (#40965)
* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 15:40:53 +02:00
dd7ac4cd59 [tests] Really use small models in all fast tests (#40945)
* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency
2025-09-18 15:24:12 +02:00
2ce35a248f Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
2025-09-18 13:22:19 +00:00
6e51ac31ef [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling
2025-09-18 14:09:08 +01:00
9378f874c1 [Trainer] Fix DP loss (#40799)
* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-18 13:07:20 +00:00
7cf1f5ced0 Use skip_predictor=True in vjepa2 get_vision_features (#40966)
use skip_predictor in vjepa2 `get_vision_features`
2025-09-18 11:51:45 +00:00
f6104189fd Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 11:49:14 +00:00
c532575795 Add new model LFM2-VL (#40624)
* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
2025-09-18 11:01:58 +00:00
564fde14f1 FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-18 09:57:21 +00:00
5748352c27 Update expected values for one more test_speculative_generation after #40949 (#40967)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 11:47:14 +02:00
438343d93f Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 09:05:50 +00:00
449da6bb30 Add FlexOlmo model (#40921)
* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`
2025-09-18 09:04:06 +00:00
3bb1b4867c Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models

* PR review
2025-09-18 08:45:04 +00:00
58e13b9f12 Update expected values for some test_speculative_generation (#40949)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 20:50:38 +02:00
529d3a2b06 Fix Glm4vModelTest::test_eager_matches_fa2_generate (#40947)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 19:53:59 +02:00
a2ac4de8b0 Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports
2025-09-17 13:34:30 -04:00
8e837f6ae2 Consistent naming for images kwargs (#40834)
* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print
2025-09-17 18:40:25 +02:00
eb04363a0d Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning

* add timm

* remove
2025-09-17 18:23:37 +02:00
ecc1d778ce Fix Glm4vMoeIntegrationTest (#40930)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 18:21:18 +02:00
c5553b4120 Fix trainer tests (#40823)
* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-17 16:05:17 +00:00
14f01aee39 docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941) 2025-09-17 08:48:38 -07:00
26b65fb516 Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-17 15:42:30 +00:00
66f97d3f64 [models] remove unused import torch.utils.checkpoint (#40934) 2025-09-17 16:37:56 +01:00
3853bfe4d5 [DOC] Add missing dates in model cards (#40922)
add missing dates
2025-09-17 11:17:06 -04:00
6cade29278 Add LongCat-Flash (#40730)
* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism
2025-09-17 14:48:10 +02:00
48a5565179 Add support for Florence-2 training (#40914)
* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-17 11:49:56 +00:00
89949c5d2d Minor fix for #40727 (#40929)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 11:42:13 +02:00
c830fc1207 Adding activation kernels (#40890)
* first commit

* add mode

* revert modeling

* add compile

* rm print
2025-09-17 11:36:09 +02:00
f6999b00c3 [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 11:20:50 +02:00
8428c7b9c8 Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 09:15:55 +00:00
ddd4caf066 [Llama4] Remove image_sizes arg and deprecate vision_feature_layer (#40832)
* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment
2025-09-17 09:14:13 +00:00
b82cd1c240 Processor load with multi-processing (#40786)
push
2025-09-17 09:46:49 +02:00
6e50a8afb2 [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-16 11:31:28 -07:00
cccef4be91 Fix dtype in Paligemma (#40912)
* fix dtypes

* fix copies

* delete unused attr
2025-09-16 16:07:56 +00:00
beb09cbd5a 🔴Make center_crop fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
2025-09-16 16:01:38 +00:00
d4af0d9f03 [generate] misc fixes (#40906)
misc fixes
2025-09-16 15:18:06 +01:00
3b3f6cd0c1 [gemma3] Gemma3ForConditionalGeneration compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase
2025-09-16 15:08:48 +01:00
88ba0f107e disable test_fast_is_faster_than_slow (#40909)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:34:04 +02:00
270da89708 Remove runner_map (#40880)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:18:07 +02:00
df03fc1f9c Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-16 13:11:48 +00:00
96bc19bcdf remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-16 12:56:11 +00:00
d0af4269ec Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test
2025-09-16 13:28:23 +02:00
65f9ede359 Set seed for Glm4vIntegrationTest (#40905)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 13:01:51 +02:00
0c1839d609 [cache] Only use scalars in get_mask_sizes (#40907)
* remove tensor ops

* style

* style
2025-09-16 12:48:58 +02:00
3688a977d0 Harmonize CacheLayer names (#40892)
* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert
2025-09-16 12:14:12 +02:00
087775d10e [cache] Merge static sliding and static chunked layer (#40893)
* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle
2025-09-16 11:41:20 +02:00
1aff033ec9 Fix flaky Gemma3nAudioFeatureExtractionTest::test_dither (#40902)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 11:00:07 +02:00
65adc3aaa3 Fix getter regression (#40824)
* test things

* style

* move tests to a sane place
2025-09-16 10:57:13 +02:00
8e1a12bbee Fixing the call to kernelize (#40628)
* fix

* style

* overload train and eval

* add getter and setter
2025-09-16 10:50:54 +02:00
21c8379fb0 Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 10:21:48 +02:00
5af248b3e3 [generate] remove docs of a feature that no longer exists (#40895) 2025-09-15 19:22:31 +01:00
20ee3a73f0 🌐 [i18n-KO] Translated imageprocessor.md to Korean (#39557)
* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:07:16 -07:00
2141a5b764 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:06:57 -07:00
2a83792165 Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 17:38:13 +02:00
04d1c8f3d4 Fix deta loading & dataclass (#40878)
* fix

* fix 2
2025-09-15 17:23:13 +02:00
ff26fe8302 Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-15 15:03:43 +00:00
6254bb4a68 Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:54:14 +00:00
e674e9dadb Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:51:22 +00:00
0957999f7f 🔴 Move variable output controls to _prepare_generation_config (#40715)
* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches
2025-09-15 11:08:00 +00:00
5e9ec59d0c Fix modular consistency (#40883)
* reapply modular

* add missing one
2025-09-15 13:07:08 +02:00
3442b2f300 [VaultGemma] Update expectations in integration tests (#40855)
* fix tests

* style
2025-09-15 12:46:30 +02:00
c0dbe095b0 Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-09-15 12:46:18 +02:00
fc5f9105da [Qwen3 Next] Use numerically stable rsqrt (#40848)
use numerically stable inverse
2025-09-15 12:45:13 +02:00
96d3795cfc Update model tags and integration references in bug report (#40881) 2025-09-15 12:08:29 +02:00
f5e1641857 fix: XIELU act parameters not being casted to correct dtype (#40812) 2025-09-15 11:05:55 +02:00
ada64ce452 fix florence kwargs (#40826) 2025-09-15 11:05:47 +02:00
93f810e6fa [docstrings / type hints] Update outdated annotations for past_key_values (#40803)
* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
2025-09-15 10:52:32 +02:00
c65fea0b92 [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-15 10:46:32 +02:00
9c804f7ec4 Redirect MI355 CI results to dummy dataset (#40862) 2025-09-14 18:42:49 +02:00
02ea2b3433 Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-14 15:35:42 +00:00
d42e96a2a7 Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-13 00:49:19 +00:00
6eb3255842 [generate] Always use decoder config to init cache (#40772)
* mega derp

* fix

* always use the decoder
2025-09-12 18:24:22 +02:00
e682f90f60 [tests] move generative tests away from test_modeling_common.py (#40854)
move tests
2025-09-12 16:12:27 +00:00
8d8459132a [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve
2025-09-12 18:07:48 +02:00
291772b6b5 add: differential privacy research model (#40851)
* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 17:36:03 +02:00
8502b41bf1 [Sam2Video] Fix video inference with batched boxes and add test (#40797)
fix video inference with batched boxes and add test
2025-09-12 14:33:28 +00:00
f384bb8ad5 [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)
* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular
2025-09-12 14:21:22 +00:00
4cb41ad2a2 [tests] re-enable aria fast tests (#40846)
* rise from the dead

* test
2025-09-12 15:14:54 +01:00
ef053939ca Fixes for continuous batching (#40828)
* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style
2025-09-12 15:35:31 +02:00
98a8078127 Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-12 14:08:01 +02:00
77aa35ee9c Replace image classification loss functions to self.loss_function (#40764) 2025-09-12 12:59:37 +01:00
797859c9b8 Update no split modules in T5Gemma model (#40810)
* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 10:44:57 +00:00
6e69b60806 Adds Causal Conv 1D kernel for mamba models (#40765)
* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit
2025-09-12 12:22:25 +02:00
827b65c42c Add VideoProcessors to auto-backend requirements (#40843)
* add it

* fix existing ones

* add perception to auto_mapping...
2025-09-12 12:21:12 +02:00
5e2e77fb45 Improve torch_dtype checks (#40808)
* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-12 09:57:59 +00:00
c81f426f9a 🌐 [i18n-KO] Translated clipseg.md to Korean (#39903)
* docs: ko: model_doc/clipseg.md

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-11 17:07:24 -07:00
cf084f5b40 [Jetmoe] Fix RoPE (#40819)
* fix

* remove prints

* why was this there...
2025-09-11 18:41:11 +02:00
dfae7dd98d Push generation config along with checkpoints (#40804) 2025-09-11 17:33:16 +02:00
c264c0ee7e add general hub test for Fast Image Processors in test_image_processing_utils (#40086)
* build unittest for ViTImageProcessorFast

* remove redundant test case

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-09-11 14:31:37 +00:00
895b3ebe41 Fix typos in src (#40782)
Fix typoes in src

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-11 13:15:15 +01:00
6d369124ad Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. (#40807)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-11 13:10:15 +02:00
0f1b128d33 ⚠️ 🔴 Add ministral model (#40247)
* add ministral model

* docs, tests

* nits

* fix tests

* run modular after merge

* opsie

* integration tests

* again

* fff

* dtype

* rerun modular

* arthur review

* ops

* review
2025-09-11 10:30:39 +02:00
02f1d7c091 Fix config dtype parsing for Emu3 edge case (#40766)
* fix emu3 config

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* address comment

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add comments

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-11 08:26:45 +00:00
de01a22aff Fix edge case for tokenize (#36277) (#36555)
* Fix edge case for tokenize (#36277)

* Fix tokenizing dtype for float input cases

* add test for empty input string

* deal empty list of list like [[]]

* add tests for tokenizer for models with input that is not plain text
2025-09-11 09:57:30 +02:00
ec532f20fb feature: Add robust token counting with padding exclusion (#40416)
* created robust token counting by using existing include_num_input_tokens_seen variable and kept bool for backward compatibility and added string also to ensure everything goes well and kept default as is. also robust test cases are created

* some codebase mismatched in my local and remote, commiting to solve it and also solved code quality issue

* ci: retrigger tests

* another attemp to trigger CI for checks
2025-09-11 09:16:06 +02:00
df67cd35f0 Fix DeepSpeed mixed precision precedence over Accelerate defaults (#39856)
* Fix DeepSpeed mixed precision precedence over Accelerate defaults

Resolves issue where Accelerate would default to bf16 mixed precision
when a DeepSpeed config specifies fp16, causing a ValueError. The fix
ensures DeepSpeed config takes precedence over TrainingArguments defaults
while preserving explicit user settings.

Changes:
- Add override_training_args_from_deepspeed() method to handle config precedence
- Reorder mixed precision environment variable setting in TrainingArguments
- Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices

Fixes #39849

* Add tests for DeepSpeed mixed precision precedence fix

- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests
- Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults
- Test user explicit settings being preserved over DeepSpeed config
- Test precedence hierarchy: user settings > DeepSpeed config > defaults
- Replace massive 934-line test bloat with concise 50-line test suite
- Tests cover core functionality of PR #39856 mixed precision precedence fix
2025-09-11 09:12:15 +02:00
549ba5b8b6 [Docs] Add missing class documentation for optimizer_schedules (#31870, #23010) (#40761)
* Add missing class documentation for optimizer_schedules (#31870, #23010)

* Add section level header to the optimizer schedules
2025-09-10 14:58:21 -07:00
dae1ccfb98 fix_image_processing_fast_for_glm4v (#40483)
* fix_image_processing_fast_for_glm4v

* fix(format): auto-ruff format

* add test image processing glm4v

* fix quality

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-10 21:05:27 +00:00
7d57b31e16 Remove use_ipex option from Trainer (#40784)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 17:00:15 +00:00
3378e7dabf Move num_items_in_batch to correct device before accelerator.gather (#40773)
add device
2025-09-10 18:49:42 +02:00
e5ecb03c92 Fix the issue that csm model cannot work with pipeline mode. (#39349)
* Fix the issue that csm model cannot work with pipeline mode.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove batching inference

Signed-off-by: yuanwu <yuan.wu@intel.com>

* csm output is list of tensor

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/pipelines/text_to_audio.py

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Use different waveform key for different model

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make style errors

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add csm tests

Signed-off-by: yuanwu <yuanwu@habana.ai>

* Update src/transformers/models/auto/tokenization_auto.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuanwu@habana.ai>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-10 16:17:35 +00:00
abbed7010b Fix dotted model names (#40745)
* Fix module loading for models with dots in names

* quality check

* added test

* wrong import

* Trigger CI rerun after making test model public

* Update src/transformers/dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Move test

* make fixup

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-09-10 14:34:56 +00:00
75202b0928 Read config pattern for Qwen3Next (#40792)
read it
2025-09-10 15:18:51 +02:00
7401cfa57c Use functools.cached_property (#40607)
* cached_property is avaiable in functools

Signed-off-by: cyy <cyyever@outlook.com>

* Remove cached_property

Signed-off-by: cyy <cyyever@outlook.com>

* Fix docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:15:40 +00:00
8ab2448707 Fix invalid PipelineParallel member (#40789)
Fix invalid enum member

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:06:36 +00:00
6c9f412105 Fix typos in tests and util (#40780)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 11:45:40 +00:00
0997c2f2ab Fix doc for PerceptionLMForConditionalGeneration forward. (#40733)
* Fix doc for PerceptionLMForConditionalGeneration forward.

* fix last nit

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-09-10 11:57:19 +02:00
a72e5a4b9d 🚨 Fix Inconsistant input_feature length and attention_mask length in WhisperFeatureExtractor (#39221)
* Update feature_extraction_whisper.py

* Reformat

* Add feature extractor shape test

* reformat

* fix omni

* fix new failing whisper test

* Update src/transformers/models/whisper/feature_extraction_whisper.py

* make style

* revert omni test changes

* add comment

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-09-10 09:38:47 +00:00
a5ecd94a3f Enable ruff on benchmark and scripts (#40634)
* Enable ruff on benchmark and scripts

Signed-off-by: cyy <cyyever@outlook.com>

* Cover benchmark_v2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* correct

* style

* style

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-10 11:38:06 +02:00
08edec9f7d [processors] Unbloating simple processors (#40377)
* modularize processor - step 1

* typos

* why raise error, super call check it also

* tiny update

* fix copies

* fix style and test

* lost an import / fix copies

* fix tests

* oops deleted accidentally
2025-09-10 10:37:19 +02:00
c52889bd51 Remove reference of video_load_backend and video_fps for processor (#40719)
* Remove reference of video_load_backend and video_fps for processor

Signed-off-by: cyy <cyyever@outlook.com>

* Restore changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-10 08:37:11 +00:00
3340ccbd40 Fix gpt-oss router_indices in EP (#40545)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix router indice

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix mod

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix masking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add safety cheking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable 1 expert per rank

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix skip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add ep plan in config

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add update ep plan

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm ep_plan and add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-10 10:30:55 +02:00
b9282355be Adding Support for Qwen3-Next (#40771)
* Add Qwen3-Next.

* fix

* style

* doc

* simplify

* fix name

* lazy cache init to allow multi-gpu inference

* simplify

* fix config to support different hybrid ratio.

* remove last commit (redundant)

* tests

* fix test

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-09 23:46:57 +02:00
79fdbf2a4a [docs] CPU install (#40631)
* init

* feedback
2025-09-09 12:51:54 -07:00
37c14430c9 [pipeline] ASR pipeline kwargs are forwared to generate (#40375)
* tmp commit

* add test

* PR suggestion
2025-09-09 17:29:25 +00:00
d09fdf5e52 Fix crash when executing MambaCache sample code (#40557)
* Fix the sample code of MambaCache

* Update automatically generated code

* Fix FalconMambaCache documents

* minor doc fixes

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-09-09 16:44:49 +00:00
d33c189e5a [RoPE] run RoPE tests when the model uses RoPE (#40630)
* enable rope tests

* no manual rope test parameterization

* Apply suggestions from code review

* Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py

* PR comment: use generalist torch code to find the rope layer
2025-09-09 17:11:02 +01:00
71ac7ea048 [tests] update test_past_key_values_format and delete overwrites (#40701)
* tmp

* rm some overwrites
2025-09-09 16:40:04 +01:00
7aaef98cbe rm src/transformers/convert_pytorch_checkpoint_to_tf2.py (#40718)
* rm src/transformers/convert_pytorch_checkpoint_to_tf2.py

* doctest skip
2025-09-09 16:34:54 +01:00
de5cbe8b79 [deprecations] Remove generate-related deprecations up to v4.56 (#40729)
remove generate-related deprecations up to v4.56
2025-09-09 16:32:41 +01:00
1cdbbb3e9d Support sliding window in CB (#40688)
* CB example: better compare feature

* Cache managers, still issue w/ effective length

* WIP -- fix for effective length

* Renames

* Wroking, need better parity checks, we mind be missing 1 token

* Small fixes

* Fixed wrong attn mask and broke cache into pieces

* Warmup is slowing down things, disabling it

* Cache was too big, fixed

* Simplified index objects

* Added a profile option to the example

* Avoid calls to memory reporing tools

* Restore full attention read indices for better latency

* Adressed some TODOS and style

* Docstrings for cache managers

* Docstrings for Schedulers

* Refactor scheudlers

* [Important] Cache fix for sliding window, check with small sw size

* Updated doc for cache memory compute and cache as a whole

* Moved a todo

* Nits and style

* Fix for when sliding window is smaller than max batch per token

* Paged interface update

* Support for FLash in new API

* Fix example CB

* Fix bug in CB for paged

* Revert example

* Style

* Review compliance

* Style

* Styleeeee

* Removed NO_SLIDING_WINDOW

* Review #2 compliance

* Better art

* Turn cum_seqlens_k in a dict

* Attn mask is now a dict

* Update examples/pytorch/continuous_batching.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Adressed McPatate pro review

* Style and fix

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-09-09 15:51:11 +02:00
ed100211cb [generate] PromptLookupCandidateGenerator won't generate forbidden tokens (#40726)
* no longer flaky :)

* PR comments

* any token-blocking logits processor works

* ?

* default

* -_-

* create fake tensors once
2025-09-09 11:04:01 +00:00
82d66e5dd0 Fix: swanlab public.cloud.experiment_url api error (#40763)
fix
2025-09-09 09:28:13 +00:00
a871f6f58d Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing (#40215)
* Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing

* Fix fast processor output format and add comprehensive tests

* Fix trailing whitespace in test file

* Apply ruff formatting to test file

* simplify pair validation logic

* add superglue tests to fast image processor

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-08 21:08:02 +00:00
aee5000f16 Fix Bark failing tests (#39478)
* Fix vocab size for Bark generation.

* Fix Bark processor tests.

* Fix style.

* Address comments.

* Fix formatting.

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-08 20:24:51 +02:00
126264d015 🌐 [i18n-KO] Translated 'xclip.md' to Korean (#39594)
* feat: nmt draft

* fix: manual edits

* docs: ko: xclip.md

* feat: nmt draft

* fix: manual edits

* fix: Modify _toctree.yml file to reflect review

* fix: Modify _toctree.yml file to reflect review

* jungnerd_suggestion_modified_01 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* jungnerd_suggestion_modified_02 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-09-08 11:19:10 -07:00
5a468e56b7 Fix continue_final_message in apply_chat_template to prevent substring matching issues (#40732)
* Fix continue_final_message parameter in apply_chat_template

* after run fixup

* Handle trim in the template

* after fixup

* Update src/transformers/utils/chat_template_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-08 17:25:12 +00:00
e8db153599 Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs (#39364) 2025-09-08 10:01:44 -07:00
fd2a29d468 Fix more typos (#40627)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 16:05:40 +00:00
bb8e9cd675 Remove unnecessary tildes from documentation (#40748) 2025-09-08 08:56:35 -07:00
a9b313a0c2 docs: add continuous batching to serving (#40758)
* docs: tmp

* docs: add continuous batching to serving

* docs: reword after @lysandrejik review
2025-09-08 15:50:28 +00:00
2077f17547 feat: err when unsupported attn impl is set w/ --continuous_batching (#40618)
* feat: err when unsupported attn impl is set w/ `--continuous_batching`

* refactor: move defaults and support list to CB code

* feat: add action item in error msg

* fix(serve): add default attn implementation

* feat(serve): add log when `attn_implementation` is `None`

* feat: raise Exception when attn_implementation is not supported by CB
2025-09-08 14:31:49 +00:00
dc262ee6f5 remove FSDP prefix when using save_pretrained with FSDP2 (#40207)
* remove FSDP prefix when using save_pretrained with FSDP2

* Fix: use removeprefix correctly

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 14:52:31 +02:00
9ab6078323 remove gemmas eager training warning (#40744)
* removed warning

* removed remaining warnings
2025-09-08 14:41:52 +02:00
2a1eb5b508 Add BF16 support check for MUSA backend (#40576)
add musa bf16 supported

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-08 12:39:14 +00:00
7b8d40ea7a Set accepts_loss_kwargs to False for ConvNext(|V2)ForImageClassification (#40746) 2025-09-08 14:25:43 +02:00
def7558f74 Fix np array typing (#40741)
Fix typing

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 11:30:40 +00:00
44b3888d2a Fix order of mask functions when using and/or_mask_function (#40753)
fix order
2025-09-08 12:31:42 +02:00
3f7bda4209 [Continous Batching] fix do_Sample=True in continuous batching (#40692)
* fix do_Sample=True in continous batching

* added test

* fix top_p

* test

* Update examples/pytorch/continuous_batching.py
2025-09-08 10:30:15 +02:00
bb45d3631e refactor(serve): move request_id to headers (#40722)
* refactor(serve): move `request_id` to headers

* fix(serve): typo in middleware fn name

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-05 17:50:04 +02:00
12b8e10dbf Skip VitMatteImageProcessingTest::test_fast_is_faster_than_slow (#40713)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 17:36:20 +02:00
6b232618b6 Keypoint matching docs (#40541)
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
2025-09-05 17:24:56 +02:00
948bc0fa34 [Gemma Embedding] Fix SWA (#40700)
* fix gemma embedding flash attention

* fix sdpa

* fix atttempt number 2

* alternative gemma fix

* fix modular
2025-09-05 17:12:00 +02:00
828044cadb Add Optional typing (#40686)
* Add Optional typing

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Format

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 15:05:51 +00:00
e9d6a6907b [tests] remove overwrites of removed test (#40720)
rm tests from method moved to hub
2025-09-05 16:04:22 +01:00
96a5774f2e [serve] re-enable tests (#40717)
run tests
2025-09-05 15:15:34 +01:00
c76387e580 Fix arguments (#40605)
* Fix invalid arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self and other fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 13:50:04 +00:00
21f09032db 🔴 Update Glm4V to use config values (#40712)
* update to use config

* just fix it

* fixup want this to be reformatted
2025-09-05 13:19:50 +00:00
b62e5b6051 Fix parent classes of AllKwargsForChatTemplate (#40685)
Fix parent classes of AllKwargsForChatTemplate because the *Kwargs are members

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 11:08:51 +00:00
313effa7ad [onnx] use logical or for grounding dino mask (#40625)
* change |= operator to use torch logical or for friendly export to different backends

* change |= operator to use torch logical or for friendly export to different backends in grounding dino model

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
2025-09-05 10:55:20 +00:00
f3211b5db7 [moduar] Add missing self in post-process methods (#40711) 2025-09-05 10:49:52 +00:00
a2a8a3ca1e [tests] fix blip2 edge case (#40699) 2025-09-05 11:35:29 +01:00
4e195f1949 🚨 Allow check_model_inputs in core VLMs (#40342)
* allow `check_model_inputs` in core VLMs

* address comments

* fix style

* why this didnt fail prev?

* chec for Noneness instead

* batch update vlms

* fix some tests

* fix copies

* oops delete

* fix efficientloftr

* fix copies

* i am stupid, fix idefics

* fix GC

* return type and other comments

* we shouldn't manually change attention anymore

* fix style

* fix copies

* fix the test
2025-09-05 10:05:56 +00:00
93df343def Fix parent classes of ProcessingKwargs (#40676)
FIx parent classes of ProcessingKwargs

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 10:01:16 +00:00
89e103c15e feat(serve): add healthcheck test (#40697) 2025-09-05 11:56:34 +02:00
a2fffa505d Fetch more test data with hf_hub_download (#40710)
[test-all] tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 09:49:31 +00:00
4a88e81532 Add Fast Image Processor for ImageGPT (#39592)
* initial commit

* initial setup

* Overiding imageGPT specific functions

* imported is_torch_available and utilized it for importing torch in imageGPT fast

* Created init and ImageGPTFastImageProcessorKwargs

* added return_tensors, data_format, and input_data_format to ImageGPTFastImageProcessorKwargs

* set up arguments and process and _preprocess definitions

* Added arguments to _preprocess

* Added additional optional arguments

* Copied logic over from base imageGPT processor

* Implemented 2nd draft of fast imageGPT preprocess using batch processing

* Implemented 3rd draft of imageGPT fast _preprocessor. Pulled logic from BaseImageProcessorFast

* modified imageGPT test file to properly run fast processor tests

* converts images to torch.float32 from torch.unit8

* fixed a typo with self.image_processor_list in the imagegpt test file

* updated more instances of image_processing = self.image_processing_class in the test file to test fast processor

* standardized normalization to not use image mean or std

* Merged changes from solution2 branch

* Merged changes from solution2 test file

* fixed testing through baseImageGPT processor file

* Fixed check_code_quality test. Removed unncessary list comprehension.

* reorganized imports in image_processing_imagegpt_fast

* formatted image_processing_imagegpt_fast.py

* Added arg documentation

* Added FastImageProcessorKwargs class + Docs for new kwargs

* Reformatted previous

* Added F to normalization

* fixed ruff linting and cleaned up fast processor file

* implemented requested changes

* fixed ruff checks

* fixed formatting issues

* fix(ruff after merging main)

* simplify logic and reuse standard equivalenec tests

---------

Co-authored-by: Ethan Ayaay <ayaayethan@gmail.com>
Co-authored-by: chris <christine05789@gmail.com>
Co-authored-by: Ethan Ayaay <98191976+ayaayethan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-04 22:45:06 +00:00
9db11b728b Fetch one missing test data (#40703)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 23:05:23 +02:00
acd820561f Align assisted generate for unified signature in decoding methods (#40657)
* Squashed previous branch

* unify assisted generate to common decoding method signature

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review
2025-09-04 22:47:44 +02:00
16b821c542 Avoid T5GemmaModelTest::test_eager_matches_sdpa_inference being flaky (#40702)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 20:44:40 +00:00
519c2524af Fix broken Llama4 accuracy in MoE part (#40609)
* Fix broken Llama4 accuracy in MoE part

Llama4 accuracy is broken by a bug in
https://github.com/huggingface/transformers/pull/39501 . It forgot to
transpose the router_scores before applying it to routed_in, causing
Llama4 to generate garbage output.

This PR fixes that issue by adding back the transpose() and adding some
comments explaining why the transpose() is needed.

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>

* remove comment

---------

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-04 22:14:44 +02:00
586dc5d06e [Glm4.5V] fix vLLM support (#40696)
* fix

* add a test case
2025-09-04 22:09:20 +02:00
ad2da3ea83 Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#40667)
Fix dropout_p is not defined for SamAttention/Sam2Attention
2025-09-04 19:32:39 +02:00
e39f222096 Fix backward compatibility with accelerate in Trainer (#40668) 2025-09-04 18:15:15 +02:00
d8f670583e Change docker image to preview for the MI355 CI (#40693)
* Change docker image to preview for the MI355 CI

* Use pushed image
2025-09-04 17:23:09 +02:00
4cbca0d1af Fixing bug in Voxtral when merging text and audio embeddings (#40671)
* Fixing bug when replacing text-audio token placeholders with audio embeddings

* apply changes

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-04 15:11:23 +00:00
9a6c6568db feat: support request cancellation (#40599)
* feat: support request cancellation

* test: add cancellation test

* refactor: use exisitng fn to check req cancellation

* feat(cb): make cancellation thread safe

* refactor(serve): update test to use `requests` instead of `httpx`
2025-09-04 17:01:29 +02:00
87f38dbfce add: embedding model (#40694)
* Gemma 3 for Embeddings

* Style fixes

* Rename conversion file for consistency

* Default padding side emb vs gen

* Corrected 270m config

* style fixes

* EmbeddingGemma config

* TODO for built-in prompts

* Resolving the sentence similarity bug and updating the architecture

* code style

* Add query prompt for SentenceTransformers

* Code quality

* Fixing or_mask_function return types

* Adding placeholder prompts for document and passage

* Finalizing prompt templates

* Adding Retrieval ro preconfigured prompts

* Add Gemma 3 270M Config

* Correcting num_linear_layers flag default

* Export Sentence Transformer in correct dtype

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
2025-09-04 16:16:15 +02:00
5b0c01b5e2 Final test data cache - inside CI docker images (#40689)
* run

* build

* build

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 13:12:49 +00:00
1f3cc935cc Load a tiny video to make CI faster (#40684)
* load a tiny video to make CI faster

* add video in url_to_local_path
2025-09-04 14:49:00 +02:00
669230a86f fix broken offline mode when loading tokenizer from hub (#40669)
* fix broken offline mode when loading tokenizer from hub

* formatting

* make quality

* fix import order
2025-09-04 12:15:56 +00:00
91b34be9cf Add codebook_dim attribute to DacVectorQuantize for DacResidualVectorQuantize.from_latents() (#40665)
* Add instance attribute to DacVectorQuantize for use in DacResidualVectorQuantize.from_latents

* add from_latent tests

* style fix

* Fix style for test_modeling_dac.py
2025-09-04 11:29:53 +00:00
25b4a0d8ae Add sequence classification support for small Gemma 3 text models (#40562)
* add seq class for gemma3 text model

* add Gemma3TextForSequenceClassification to modeling file

* After run make fixup

* let's just check

* thiis is why it was crashing, tests were just failing...

* skip it, tested only for seq clf

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-04 09:44:59 +00:00
30a4b8707d CircleCI docker images cleanup / update / fix (#40681)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:42:18 +02:00
7f92e1f91a Mark Aimv2ModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels as flaky (#40683)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:30:14 +02:00
ca9b36a9c1 Avoid night torch CI not run because of irrelevant docker image failing to build (#40677)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 09:06:37 +02:00
d40e7ea52d Skip more fast v.s slow image processor tests (#40675)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 06:35:44 +02:00
34595cf296 Even more test data cached (#40636)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 21:20:37 +00:00
f22ec7f174 Benchmarking V2: framework impl (#40486)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments

* Benchmarking v2

* Fix llama bench parameters

* Working checkpoint

* Readme touchups

* Remove unnecessary test

* Massage the framework a bit

* Small cleanup

* Remove unnecessary flushes

* Remove references to mock benchmark

* Take commit ID from CLI

* Address review comments

* Use Events for thread comms

* Tiny renaming
2025-09-03 22:26:32 +02:00
459c1fa47a refactor: use tolist instead of list comprehension calling .item() (#40646) 2025-09-03 19:25:29 +02:00
afd1393df1 Remove overwritten GitModelTest::test_beam_search_generate (#40666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:55:45 +02:00
68b9cbb7f5 Skip test_prompt_lookup_decoding_matches_greedy_search for qwen2_audio (#40664)
* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:43:35 +02:00
55676d7d4c Fix warning for output_attentions=True (#40597)
* Fix attn_implementation for output_attentions

* remove setting attention, just raise warning

* improve message

* Update src/transformers/utils/generic.py
2025-09-03 16:25:13 +00:00
b67608f587 Skip test_fast_is_faster_than_slow for Owlv2ImageProcessingTest (#40663)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:49:10 +02:00
30d66dc3bc Update check_determinism inside test_determinism (#40661)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:30:39 +02:00
3f40ebf620 Allow custom args in custom_generate Callables and unify generation args structure (#40586)
* Squashed commit of the following:

commit beb2b5f7a04ea9e12876696db66f3589fbae10c5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 16:03:25 2025 +0200

    also standardize _get_stopping_criteria

commit 15c25663fa991e0a215a7f3cdcf13a9d3a989faa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 15:48:38 2025 +0200

    watch super.generate() usages

commit 67dd845be2202d191a54b2872f1cb3f71b74b7d6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:44:32 2025 +0200

    ops

commit 4655dfa28fd59d5dc083a41d8396de042d99858c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:41:36 2025 +0200

    wrong merge

commit 46478143994e7b27d51c972a7881e0fea3cb6e3c
Merge: a72c2c4b2f 8564e210ca
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:36:15 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into fix-custom-gen-from-function2

commit a72c2c4b2f9c0e09fe6ec7992d4d02bfa279da2a
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:04:59 2025 +0200

    ops5

commit e72f91411b961979bb3d271810f57905cee5b577
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 12:06:19 2025 +0200

    ops4

commit 12ca97b1078a42167143e0243036f6ef87d5fdac
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:58:59 2025 +0200

    ops3

commit 8cac6c60a318dd381793d4bf1ef3775823f3c95b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:43:03 2025 +0200

    ops2

commit 4681a7d5dc6c8b96a515d9d79f06380c096b9a9f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:40:51 2025 +0200

    ops

commit 0d72aa6cbd99a5933c5a95a39bea9088ee21e50f
Merge: e0d47e980e 5bb6186b8e
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:37:28 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 5bb6186b8efbd5fdb8e3464a22f958343b9c450c
Merge: 44973dac7d b0db5a02f3
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:36:30 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit 44973dac7df4b4e2111c71f5fac918be21f3de52
Merge: 1ddab4bee1 893d89e5e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:29:48 2025 +0200

    Merge commit '893d89e5e6fac7279fe4292bfa3b027172287162' into remove-constrained-bs

commit e0d47e980e26d32b028c2b402ccb71262637a7a7
Merge: 88128e4563 1ddab4bee1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:52:50 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 88128e4563c0be583728e1d3c639bc93143c4029
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:44:38 2025 +0200

    fix custom generate args, refactor gen mode args

commit 1ddab4bee159f6c20722e7ff5cd41d5041fab0aa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Sun Aug 31 21:03:53 2025 +0200

    fix

commit 6095fdda677ef7fbeb06c05f4f914a11b45257b4
Merge: 4a8b6d2ce1 04addbc9ec
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:49:16 2025 +0200

    Merge branch 'remove-constrained-bs' of github.com:manueldeprada/transformers into remove-constrained-bs

commit 4a8b6d2ce18b3a8b52c5261fea427e2416f65187
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:48:25 2025 +0200

    restore and deprecate beam obkects

commit 04addbc9ec62dd4f59d15128e8cd9499e2cda3bb
Merge: e800c7841e becab2c601
Author: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Date:   Thu Aug 28 14:38:29 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit e800c7841e5c46ce5698fc9be309d0808f85d23c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:38:10 2025 +0200

    tests gone after green

commit 33971d21ac40aef76a7e1122f4a98ef28beadbe8
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:07:11 2025 +0200

    tests green, changed handling of deprecated methods

commit ab303835c184d0a87789da7aed7d8de5ba85d867
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:58:01 2025 +0200

    tests fix

commit ec74274ca52a6aa0b5f300374fda838609680506
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:32:05 2025 +0200

    ops

commit 0fb19004ccd285dcad485fce0865b355ce5493e0
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:45:16 2025 +0200

    whoops

commit c946bea5e45aea021c8878c57fcabc2a13f06fe5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:35:36 2025 +0200

    testing...

commit 924c0dec6d9ea6b4890644fe7f711dc778f820bb
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:22:46 2025 +0200

    sweeep ready for tests

commit b05aa771d3994b07cd460cda74b274c9e4f315e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:13:01 2025 +0200

    restore and deprecate constraints

commit 9c7962d10efa7178b69d3c99e69663756e1cd979
Merge: fceeb383f9 c17bf304d5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 20:44:21 2025 +0200

    Merge branch 'remove-group-bs' into remove-constrained-bs

commit c17bf304d5cf33af7f34f9f6057915d5f5821dae
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 17:00:50 2025 +0200

    fix test

commit d579aeec6706b77fcc24c1f6806cd7277d7db56e
Merge: 822efd8c3c ed5dd2999c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 16:04:31 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 822efd8c3cf475d079e64293aa06e4ab59740fd7
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 15:59:51 2025 +0200

    aaand remove tests after all green!!

commit 62cb274a4acb9f24201902242f1b0dc4e46daac1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:48:19 2025 +0200

    fix

commit c89c892e7b24a7d71831f2b35264456005030925
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:45:20 2025 +0200

    testing that hub works the same

commit fceeb383f99e4a836679d67b1d2a8520152eaf49
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 20:06:59 2025 +0200

    draft

commit 6a9b384078f3798587ba865ac7ddfefc9a79e41c
Merge: 8af3af13ab 58cebc848b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 15:00:05 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 8af3af13abb85ca60e795d0390832f398a56c34f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 11:55:45 2025 +0200

    Squashed commit remove-constrastive-search

* ops

* fix

* ops

* review

* fix

* fix dia

* review
2025-09-03 17:30:09 +02:00
a8f400367d Avoid attention_mask copy in qwen2.5 (#40658)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-03 15:17:22 +00:00
57f5668d0b Fix Metaclip modular conversion (#40660)
* Fix Metaclip modular conversion

* manually run check_copies
2025-09-03 16:13:50 +01:00
238a8274b4 feat(serving): add healthcheck (#40653) 2025-09-03 16:43:12 +02:00
f2416b4fd2 fix pipeline dtype (#40638)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 16:05:48 +02:00
5ea5c8179b Mark LongformerModelTest::test_attention_outputs as flaky (#40655)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 13:19:02 +00:00
fe1a9e0dba Remove TF/Flax examples (#40654)
* Remove TF/Flax examples

* Remove check_full_copies

* Trigger CI
2025-09-03 14:15:57 +01:00
5e2e496149 fix MetaCLIP 2 wrong link & wrong model names in the docstrings (#40565)
* fix MetaCLIP 2 wrong link & wrong model names in the documentation and docstrings

* ruff reformatted

* update files generated by modular

* update meta_clip2 to metaclip_2 to match the original

* _supports_flash_attn = False

---------

Co-authored-by: Yung-Sung Chuang <yungsung@meta.com>
2025-09-03 13:53:56 +01:00
03708ccf6f add DeepseekV3ForTokenClassification (#40641)
* add DeepseekV3ForTokenClassification

* fix typo

---------

Co-authored-by: json.bourne <json.bourne@kakaocorp.com>
2025-09-03 12:30:09 +00:00
c485c52db4 Skip test_prompt_lookup_decoding_matches_greedy_search for voxtral (#40643)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 11:45:29 +00:00
2bbf98a83d Fix: PIL image load in Processing utils apply_chat_template (#40622) 2025-09-03 13:06:05 +02:00
acc968c581 [CP] Add attention_mask to the buffer when the mask is causal (#40619)
Fix attention mask validation for context parallelism

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 10:19:35 +00:00
cb54ce4ec6 [auto-model] propagate kwargs (#40491)
propagate kwargs
2025-09-03 09:59:20 +00:00
ye
0f5e45a6d1 fix: gas for gemma fixed (#40591)
* fix: gas for gemma fixed

* feat: run fix-copies

* feat: added issue label
2025-09-03 08:44:14 +00:00
e690fe61e8 Fix too many requests in TestMistralCommonTokenizer (#40623)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 05:05:03 +02:00
00a8364271 🌐 [i18n-KO] Translated deepseek_v3.md to Korean (#39649)
* docs: ko: deepseek_v3.md

* feat: nmt draft

* fix: manual edits

* fix: glossary edits

* docs : 4N3MONE recommandced modified contents

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* add_toctree.yml

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-02 13:35:56 -07:00
ed49376a42 Remove random flag (#40629)
remove flag
2025-09-02 19:10:02 +02:00
d47ad91c3c Support TF32 flag for MUSA backend (#33187)
* Support MUSA (Moore Threads GPU) backend in transformers
Add accelerate version check, needs accelerate>=0.33.0

* Support TF32 flag for MUSA backend

* fix typo
2025-09-02 16:27:10 +00:00
a470f21396 Enable more ruff UP rules (#40579)
* Import Sequence from collections.abc

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff UP rules

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 17:29:59 +02:00
37103d6f22 Fix invalid typing (#40612)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 13:10:22 +00:00
4f542052b9 Remove unnecessary pillow version check (#40604)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 12:59:22 +00:00
8c60a7c385 Add collated reports job to Nvidia CI (#40470)
* Add collated reports job to Nvidia CI

* machine_type

* Move collated reports job to model_jobs

* Propagate repo id variable

* assifgn runner_type is self-scheduled-caller
2025-09-02 14:25:22 +02:00
97266dfd50 Fix flaky JambaModelTest.test_load_balancing_loss (#40617)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 13:58:16 +02:00
91be12bdc6 Avoid too many request caused by AutoModelTest::test_dynamic_saving_from_local_repo (#40614)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 12:08:52 +02:00
bbd8085b0b Fix processor chat template (#40613)
fix tests
2025-09-02 10:59:48 +02:00
b2b1c30b1b fix: continuous batching in transformers serve (#40479)
* fix: continuous batching in `transformers serve`

* fix: short circuit inner gen loop when prepare_next_batch prepared nothing

* docs: add comment explaining FastAPI lifespan

* test: add CB serving tests

* refactor: remove gen cfg max new tokens override bc unnecessary

* docs: add docstring for `ServeCommand::run`

* feat: use new `DecodeStream` API
2025-09-02 10:45:05 +02:00
8a091cc07c Disable cache for TokenizerTesterMixin temporarily (#40611)
* try no cache

* try no cache

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 08:40:04 +02:00
514b3e81b7 Multiple fixes to FA tests in AMD (#40498)
* Expectations for gemma3

* Fixes for Qwen2_5_VL tests

* Added expectation but underlying pb is still there

* Better handling of mrope section for Qwen2_5_vl

* Fixes for FA2 tests and reformat batch test for Qwen2_5_Omni

* Fix multi-device error in qwen2_5_omni

* Styel and repo-consistency

* Removed inherited test because fix in common

* slow tests fixes

* Style

* Fixes for qwen2_5_vl or omni for FA test
2025-09-01 20:49:50 +02:00
b3655507bb Pin torchcodec to 0.5 in AMD docker (#40598) 2025-09-01 20:39:55 +02:00
4da03d7f57 Reduce more test data fetch (#40595)
* example

* fix

* fix

* add to fetch script

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 18:07:18 +02:00
abf5900a76 [Tests] Fixup duplicated mrope logic (#40592)
cleanup duplicated logic
2025-09-01 17:22:34 +02:00
3beac9c659 Fix quite a lot of FA tests (#40548)
* fix_rope_change

* fix

* do it dynamically

* style

* simplify a lot

* better fix

* fix

* fix

* fix

* fix

* style

* fix
2025-09-01 16:42:50 +02:00
21e708c8fd Fix for missing default values in encoder decoder (#40517)
* Added default_value for is_updated and type check

* Forgot one

* Repo consistency
2025-09-01 16:11:23 +02:00
c99d43e6ec Fix siglip flaky test_eager_matches_sdpa_inference (#40584)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:17:25 +02:00
3c3dac3c12 Add Copilot instructions (#40432)
* Add copilot-instructions.md

* Fix typo

* Update .github/copilot-instructions.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-01 14:09:54 +01:00
2b71c5b7a6 Fix inexistent imports (#40580)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 13:05:00 +00:00
8e0b2c8baf Skip TvpImageProcessingTest::test_slow_fast_equivalence (#40593)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:03:34 +02:00
a543095c99 Fix typos (#40585)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 12:58:23 +00:00
8564e210ca 🚨 Remove Constrained Beam Search decoding strategy (#40518)
* Squashed remove-constrastive-search

* sweeep ready for tests

* testing...

* whoops

* ops

* tests fix

* tests green, changed handling of deprecated methods

* tests gone after green

* restore and deprecate beam obkects

* restore and deprecate constraint objects

* fix ci

* review
2025-09-01 12:34:48 +00:00
564be6d895 Support batch size > 1 image-text inference (#36682)
* update make nested image list

* fix make flat list of images

* update type anno

* fix image_processing_smolvlm

* use first image

* add verbose comment

* fix images

* rollback

* fix ut

* Update image_processing_smolvlm.py

* Update image_processing_idefics3.py

* add tests and fix some processors

* fix copies

* fix after rebase

* make the test cover chat templates

* sjip udop, no point in fixing it

* fix after rebase

* fix a few more tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-09-01 12:26:07 +00:00
3bccb02616 🚨 Remove Group Beam Search decoding strategy (#40495)
* Squashed remove-constrastive-search

* testing that tests pass using hub

* fix

* aaand remove tests after all green!!
2025-09-01 13:42:48 +02:00
90953d5bc1 Fix custom generate relative imports (#40480) 2025-09-01 13:38:56 +02:00
2537ed4477 Update get_*_features methods + update doc snippets (#40555)
* siglip

* clip

* aimv2

* metaclip_2

* align

* align fixup

* altclip

* blip2 (make consistent)

* chineese clip

* clipseg

* flava

* groupvit

* owlv2

* owlvit

* vision_encoder

* clap

* x_clip

* fixup

* fix siglip2

* blip2

* fix blip2 tests (revert to original)

* fix docs
2025-09-01 12:37:43 +01:00
48ebae975e Fix llava image processor (#40588)
fix
2025-09-01 13:32:57 +02:00
db6821b79c Allow remi-or to run-slow (#40590)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 12:30:53 +02:00
6546f288a1 Fix CircleCI step passes in the case of pytest worker crash at test collection time (#40552)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:33:23 +02:00
cfed99d310 Fix test_eager_matches_sdpa_inference not run for CLIP (#40581)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:21:56 +02:00
1d742644c0 [qwen-vl] fix position ids (#40490)
* fix position ids

* fixup

* adjust tests since they are failing on main as well

* add a comment to make it clear
2025-09-01 09:10:41 +00:00
0b24507379 processor tests - use dummy videos (#40537)
* use dummy videos

* failing on main, new model merged had conflicts
2025-09-01 09:04:47 +00:00
b0db5a02f3 Set test_all_params_have_gradient=False for DeepseekV2ModelTest (#40566)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 22:46:31 +02:00
1363fceeec remove the redundant non maintained jieba and use rjieba instead (#40383)
* porting not maintained jieba to rjieba

* Fix format

* replaced the line with rjieba instead of removing it

* cut_all is not included as a parameter. cut_all is a seperate function rjieba

* rev

* jieba remove installation

* Trigger tests

* Update tokenization_cpm.py

* Update tokenization_cpm_fast.py

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-30 13:28:52 +02:00
36fddebcee pin pytest-rerunfailures<16.0 (#40561)
ping pytest-rerunfailures<16.0

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 12:58:44 +02:00
2d3b8863e8 Fix collated reports upload filename (#40556) 2025-08-30 09:35:51 +02:00
ce48e9cac0 Dev version 2025-08-29 20:17:34 +02:00
155fd926d2 Fix GptOssModelTest::test_assisted_decoding_matches_greedy_search_1_same (#40551)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-08-29 15:53:53 +00:00
1067577ad2 fix gpt-oss out shape (#40535)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* reset gpt-oss modeling

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copies

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 15:20:33 +00:00
7efb4c87ca Flaky CI is annoying (#40543)
* mark flaky

* and the non batch one
2025-08-29 16:47:44 +02:00
828a27fd32 Fix gpt-oss rope warning (#40550)
* fix

* fix print

* rm

* real fix

* fix

* style
2025-08-29 14:40:33 +00:00
74a24217f5 Add bfloat16 support detection for MPS in is_torch_bf16_gpu_available() (#40458)
* Add bfloat16 support detection for MPS (Apple Silicon) in is_torch_bf16_gpu_available

bfloat16 seems to have been supported for a few years now in Metal and torch.mps.

Make sure to allow it and not throw on bf16 usage with "Your setup doesn't support bf16/gpu." from TrainingArguments.

* Check bf16 support for MPS using torch method

Actually seems method exists: 5859edf113/torch/_dynamo/device_interface.py (L519)

It simply checks if you are on MacOs 14 or higher.

* Document Metal emulation for bf16 support

Add note about Metal emulation for bf16 support on M1/M2.

* Update bf16 support check for MPS backend

is_bf16_supported() not exposed even if defined on MPSInterface, use same approach as in accelerate pr.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 14:37:15 +00:00
ffdd10fced Allow compression on meta device (#39039)
* disable gradient calculation for int weights

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* Update src/transformers/quantizers/quantizer_compressed_tensors.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* updated model procession before/after weight loading

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* reformat

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

---------

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
2025-08-29 15:49:15 +02:00
f0e778112f Clean-up kernel loading and dispatch (#40542)
* clean

* clean imporrts

* fix imports

* oups

* more imports

* more imports

* more

* move it to integrations

* fix

* style

* fix doc
2025-08-29 14:14:38 +02:00
f68eb5f135 Redundant code removal (#40534)
redundant code
2025-08-29 11:30:23 +00:00
d888bd435d Fix typos (#40511)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-29 11:25:33 +00:00
11a6b95553 Oupsy (#40544)
fix bump!
2025-08-29 12:59:49 +02:00
b07144ac27 tokenizers bump tokenizers version (#40540)
* bump tokenizers version

* use rc0

* ?

* fml

* update
2025-08-29 12:34:41 +02:00
008c0ba8e2 Fix SeamlessM4Tv2ModelWithTextInputTest::test_retain_grad_hidden_states_attentions (#40532)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 23:30:59 +02:00
89ef1b6e0b Set test_all_params_have_gradient=False for HunYuanMoEV1ModelTest (#40530)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 22:32:51 +02:00
2e0f1d6a37 [Qwen Omni/VL] Fix fa tests (#40528)
* fix

* style

* flaky flaky

* flaky flaky

* oopsie, we need the out of place for sure

* flaky flaky

* flaky flaky
2025-08-28 21:07:22 +02:00
68013c505a Improve Gemma3n model and tests (#39764) 2025-08-28 20:25:42 +02:00
ffcb344612 Lazy import torchcodec (#40526)
* lazy import

* parse version

* omg, we need to guard version parse as well
2025-08-28 18:57:14 +02:00
8c7f685079 Fix typo: 'casual' to 'causal' (#40374)
fix typo: 'casual' to 'causal'

Co-authored-by: demo <vamshika0210@gamil.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-28 09:17:37 -07:00
d61fab1549 skip some padding_matches_padding_free_with_position_ids for FA2 (#40521)
skip 1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 17:20:07 +02:00
31336ab750 Fix mistral3 tests after "[Kosmos 2.5] Rename checkpoints" (#40523)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 16:29:54 +02:00
851b8f281d [kernels] If flash attention2 is not installed / fails to import (cc on our cluster) default to kernels (#40178)
* first step if flash not installed but you set to use it

* try importing

* now default to using it

* update our tests as well

* wow yesterday I was not awake

* fixup

* style

* lol the fix was very very simple

* `RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
` for updated dockers

* push review comments

* fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-28 16:20:25 +02:00
de9e2d7a2e Skip some flex attn tests (#40519)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 15:43:38 +02:00
7e1aee4db6 [FA] Remaining Cleanup (#40424)
* fa cleanup

* flaky tests

* readd removed test and changeup comments to reflect the purpose

* flaky tests
2025-08-28 15:01:19 +02:00
893d89e5e6 [omni modality] support composite processor config (#38142)
* dump ugly option to check again tomorrow

* tiny update

* do not save as nested dict yet!

* fix and add tests

* fix dia audio tokenizers

* rename the flag and fix new model Evolla

* fix style

* address comments

* broken from different PRp

* fix saving layoutLM

* delete print

* delete!
2025-08-28 14:40:27 +02:00
becab2c601 Use the config for DynamicCache initialization in all modelings (#40420)
* update all

* remove the most horrible old code

* style
2025-08-28 14:32:30 +02:00
8acbbdcadf [serve] fix request_id unexpected (#40501)
* fix request-id in serving

* style

* fix
2025-08-28 14:16:28 +02:00
2300be3b41 sped up gguf tokenizer for nemotron test (#40509)
sped up tokenizer for nemotron test
2025-08-28 12:10:49 +00:00
b2b654afbf correct kes to keys. (#40489)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-08-28 12:00:22 +00:00
476cd7bab1 [vision] Improve keypoint-matching models docs (#40497)
fix options and add inference_mode
2025-08-28 12:31:21 +01:00
1499f9e356 [Kosmos 2.5] Rename checkpoints (#40338) 2025-08-28 13:30:41 +02:00
10ddfb0be5 Add more missing arguments (#40354)
Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-28 12:21:51 +02:00
d10603f701 Add Apertus (#39381)
* init swissai model

* AutoModelForCausalLM

* AutoModelForCausalLM mapping

* qk norm and post ln optional

* fix wrong shape of qk norm: megatron uses head_dim

* automodel fixes

* minor fix in forward

* fix rope validation to accept llama3 scaling

* `SwissAIForTokenClassification` support

* Align `SwissAI` to v4.52.4

* Align `SwissAI` to v4.53.1

* Init CUDA xIELU

* `SwissAI*`->`Apertus*`

* ci fix

* check_docstring ignore ApertusConfig

* Licensing and placeholder tests

* Placeholder doc

* XIELU syntax

* `_xielu_python` optimization

* Fix xIELU

* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint

* Modular `Apertus`

* CUDA xIELU logging

* ci fix

* ci fix

* ci fix

* Update license

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update tests/models/apertus/test_modeling_apertus.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* `.utils.import_utils.is_torchdynamo_compiling`

* `Apertus` class ordering

* `past_key_value{->s}`, `make fix-copies`

* ci fix

* Remove unused configuration parameters

* `{beta,eps}` saved in checkpoint

* `{beta,eps}` Temporarily on CPU

* Suggestions

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* ci fix

* remove fx_compatible (deprecated)

* remove `rotary_embedding_layer`

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.

* fully removing `Mask4DTestHard` class

Not needed (for now)

* switch to `dtype` instead of `torch_dtype`

Following this:
https://github.com/huggingface/transformers/pull/39782

* remove unused imports

* remove `cache_implementation="static"`

* +Apertus to `docs/source/en/_toctree.yml` for the doc builder

---------

Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
2025-08-28 11:55:43 +02:00
f9b9a5e884 Update quantization overview for XPU (#40331)
* update xpu quantization overview

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix aqlm tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gguf support

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gguf tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu gguf precision error

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* replace deprecated models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix import org

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu ggml tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert wrong change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* xpu optimum-quanto goes green

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-28 09:52:59 +00:00
b824f4986f fix typo (#40484)
* fix typo

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* csm & qwen omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* format

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* Apply style fixes

* omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

---------

Signed-off-by: guochenxu <guochenxu@modelbest.cn>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-28 08:31:25 +00:00
c9ff166718 Various AMD expectations (#40510)
* AMD expectations for qwen2

* Added more detailled excpectation to smolvlm

* Added AMD expectations to TableTransformer

* Style
2025-08-28 10:15:21 +02:00
721d4aee81 Include machine type in collated reports filename (#40514) 2025-08-28 09:28:12 +02:00
98289c5546 [modular] Classes can now be defined and referenced in arbitrary order (without bringing unwanted dependencies) (#40507)
* remove future class from dependency graph

* convert all
2025-08-27 23:06:10 +02:00
e3d8fd730e docs(pixtral): Update Pixtral model card to new format (#40442)
* docs(pixtral): Update Pixtral model card to new format

* docs(pixtral): Change cuda into auto for device_map

* docs(pixtral): Apply suggestions from review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review, changing mistral-community into Mistral AI

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review [!TIP] part

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Finalize model card with tested code examples

This commit finalizes the update for the Pixtral model card.

* Fix the hfoption by the right one

* @BryanBradfo docs(pixtral): Changing the redirection of bitsandbytes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Add of ` to highlight the tokens

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Move image block per final review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-27 11:38:51 -07:00
821384d5d4 Fix the CI workflow of merge to main (#40503)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 18:35:12 +02:00
304225aa15 Collated reports: no need to upload artifact (#40502)
No need to upload collated reports as gh artifact
2025-08-27 18:31:55 +02:00
3c343c6601 [Whisper] Add rocm expected results to certain tests (#40482)
* Add rocm expected results to certain tests

* Specify rocm version in expectations so we know origin. Improved var names

* Update test var names
2025-08-27 16:19:11 +00:00
6350636964 Fix qwen2_moe tests (#40494)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 16:22:04 +02:00
52aaa3f500 [EfficientLoFTR] dynamic image size support (#40329)
* fix: reverted efficientloftr embeddings computation to inference time with lru cache

* fix: added dtype and device for torch ones and zeros creation

* fix: fixed embed height and width computation with aggregation

* fix: make style

* fix error message

* fix fa2 tests

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-27 15:05:08 +01:00
ed5dd2999c [ESM] support attention API (#40370)
* ESM supports attention API

* supports flags

* fix tests

* fix copiees

* another fixup needed after fixing tests

* fix tests and make sure Evolla copied everything

* fix

* order

* forgot about "is_causal" for fa2

* cross attention can't be causal
2025-08-27 15:39:04 +02:00
8b804311ba [modular] Remove ambiguity in all calls to parent class methods + fix dependency graph (#40456)
* fix in modular

* remove leftover print

* fix everything except when it's in assignment

* fix assignment as well

* more general

* better

* better

* better comment

* docstring

* cleaner

* remove base

* doc
2025-08-27 14:51:28 +02:00
a3afebbbbe [modular] Use multi-processing + fix model import issue (#40481)
* add mp and simplify a bit

* improve

* fix

* fix imports

* nit
2025-08-27 14:51:12 +02:00
75d6f17de6 Validate GptOssConfig rope config after it's fully initialized (#40474)
* Validate GptOssConfig rope config after it's fully initialized

Fixes #40461

* Remove whitespaces
2025-08-27 10:16:58 +01:00
80f4c0c6a0 CI when PR merged to main (#40451)
* up

* up

* up

* up

* up

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 10:56:18 +02:00
ff8b88a948 Fix nightly torch CI (#40469)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 22:02:15 +02:00
74ad608a2b Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467) 2025-08-26 20:53:24 +02:00
c8c7623f20 Update SegFormer model card (#40417)
* Update SegFormer model card

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the segformer model card

* Remove quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-26 08:27:25 -07:00
78f32c3917 [pipeline] Add Keypoint Matching pipeline (#39970)
* feat: keypoint-matcher pipeline

* docs: added keypoint-matcher pipeline in docs

* fix: added missing statements for repo consistency

* docs: updated SuperGlue, LightGlue and EfficientLoFTR docs

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* test: fixed run_pipeline_test

* update pipeline typing and docs

* update tests

* update docs snippets

* Fix import error

* fix: pipeline init

* pt framework

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-26 15:26:57 +01:00
6451294f6f [RoPE] explicit factor > implicit factor in YaRN (#40320)
explicit factor > implicit factor
2025-08-26 14:58:28 +01:00
5a8ba87ecf [fast_image_processor] fix image normalization for resize (#40436) 2025-08-26 13:49:51 +00:00
VED
0ce6709e70 deci gguf support (#38669)
* deci gguf support

* make style

* tests for deci

* try except removed

* style

* try except removed
2025-08-26 13:43:17 +00:00
263d06fedc Fix extra template loading (#40455)
* Fix extra template loading

* Reformat

* Trigger tests
2025-08-26 14:01:01 +01:00
58cebc848b flash_paged: s_aux may not exist (#40434)
Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an
`s_aux` arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux
is present in the kwargs, we forward it, otherwise we don't.

The user will still get an error if they use a model like gpt-oss-20b
with an implementation that does not support `s_aux`, but models that
don't use it won't error out. For example, [this is currently
failing](399cd5c04b/examples/pytorch/continuous_batching.py (L16))
because we are sending `s_aux: None` in the dict.
2025-08-26 13:15:59 +02:00
34108a2230 Continuous batching refactor (#40426)
* Rework of the CB example

* Further rework of CB example

* Refactor PA cache, slice on tokens, add debug prints -- WIP

* Slice cache -- WIP

* Added a mechanism to check batched outputs in CB script

* Less logging, debug flag for slice, !better reset! -- WIP

* QOL and safety margins

* Refactor and style

* Better saving of cb example

* Fix

* Fixes and QOL

* Mor einformations about metrics

* Further logging

* Style

* Licenses

* Removed some comments

* Add a slice input flag

* Fix in example

* Added back some open-telemetry deps

* Removed some aux function

* Added FA2 option to example script

* Fixed math (all of it)

* Added a simple example

* Renamed core to classes

* Made allocation of attention mask optionnal

* Style
2025-08-26 13:01:42 +02:00
49e168ff08 🚨 Remove Contrastive Search decoding strategy (#40428)
* delete go brrr

* fix tests

* review
2025-08-26 12:31:46 +02:00
b8184b7ce9 Make cache_config not mandatory (#40316)
* Relaxed assumptions on cache_config

* Review compliance

* Style

* Styyyle

* Removed default and added args

* Rebase mishapfix

* Propagate args to TorchExportableModuleForDecoderOnlyLM

* Fix the test I wanted  fixed in this PR

* Added some AMD expectation related to cache tests
2025-08-26 12:06:17 +02:00
32fcc24667 rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-26 09:56:35 +00:00
f690a2a1e0 [video processors] decode only sampled videos -> less RAM and faster processing (#39600)
* draft update two models for now

* batch update all VLMs first

* update some more image processors

* update

* fix a few tests

* just make CI green for now

* fix copies

* update once more

* update

* unskip the test

* fix these two

* fix torchcodec audio loading

* maybe

* yay, i fixed torchcodec installation and now can actually test it

* fix copies deepseek

* make sure the metadata is returrned when users request it

* add docs

* update

* fixup

* Update src/transformers/audio_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/glm4v/video_processing_glm4v.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update

* what if we set some metadata attr to `None`

* fix CI

* fix one test

* fix 4 channel test

* fix glm timestemps

* rebase gone wrong

* raise warning once

* fixup

* typo

* fix copies

* ifx smolvlm test

* this is why torch's official benchmark was faster, set threads to `0`

* Apply style fixes

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-26 11:38:02 +02:00
64ae6e6b1d fix qwen25-vl grad acc (#40333)
* fix qwen25—vl grad acc

* fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs

* fix ci

* fix ci

* fix typo

* fix CI
2025-08-26 09:30:06 +00:00
6d2bb1e04d [Trainer] accelerate contextparallel support in trainer (#40205)
* initial context_parallel_size support in trainer

* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens

* use parallelism_config.cp_enabled

* add parallelism_config to trainer state

* warn when auto-enabling FSDP

* fix some reviews

* WIP: somewhat matching loss

* Feat: add back nested_gather

* Feat: cleanup

* Fix: raise on non-sdpa attn

* remove context_parallel_size from TrainingArguments

* if we have parallelism_config, we defer to get_state_dict from accelerate

* fix form review

* Feat: add parallelism config support

* Chore: revert some unwanted formatting changes

* Fix: check None

* Check none 2

* Fix: remove duplicate import

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fin

* require accerelate 1.10.1 and higer

---------

Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-26 09:28:48 +00:00
63caaea1fb Refactor ViT-like models (#39816)
* refactor vit

* fix

* fixup

* turn off FX tests

* AST

* deit

* dinov2

* dinov2_with_registers

* dpt

* depth anything (nit)

* depth pro (nit)

* ijepa

* ijepa (modular)

* prompt_depth_anything (nit)

* vilt (nit)

* zoedepth (nit)

* videomae

* vit_mae

* vit_msn

* vivit

* yolos

* eomt

* vitpose

* update auto backbone

* disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose)

* fix kwargs for backbone

* fix

* convnext

* fixup

* update convnext layernorm

* fix-copies layer_norm

* convnextv2

* explicit output_hidden_states for models with backbones

* explicit hidden states collection for dinov2

* tests fixed

* fix DPT as well

* fix dinov2 with registers

* add comment
2025-08-26 11:14:06 +02:00
922e65b3fc Fix non FA2 tests after FA2 installed in CI docker image (#40430)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 10:36:50 +02:00
e68146fbe7 Fix collated reports model name entry (#40441) 2025-08-25 20:36:01 +00:00
8ce633cc75 InternVL MI325 test expectations (#40387)
* Adjust ROCm expectations

* MI355

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-25 22:00:35 +02:00
7637d298b3 Fix collated reports uploading (#40440) 2025-08-25 21:49:59 +02:00
fa59cf9c9f Fix https://github.com/huggingface/transformers/issues/40292 (#40439)
* Fix https://github.com/huggingface/transformers/issues/40292

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-25 20:12:57 +01:00
f0e87b436d Fix collated reports model directory traversal (#40437)
Fix model dir traversal
2025-08-25 18:01:58 +00:00
ef406902bf Gemma3 text fixes: Add expectations for MI325 (#40384)
* Add expectations for MI325

* Ruff

* Adjust CUDA expectations as well

* Another attempt for CUDA expectations
2025-08-25 19:57:50 +02:00
c81723d31b 🌐 [i18n-KO] Translated models.md to Korean (#39518)
* docs: ko: models.md

* feat: nmt draft

* fix: manual edits

* Resolved _toctree.yaml conflict during merge from main

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

* fix: update toctree

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-25 09:17:08 -07:00
6b5eab70e4 Remove working-dir from collated reports job (#40435) 2025-08-25 18:14:35 +02:00
1763ef2951 [docs] remove last references to transformers TF classes/methods (#40429)
* halfway through tasks

* complete

* Update utils/check_docstrings.py
2025-08-25 16:30:59 +01:00
eac4f00bdf Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349) (#40408)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:55 +00:00
d8f2edcc46 Add tokenizer_kwargs argument to the text generation pipeline (#40364)
* Add `tokenizer_kwargs`  arg to text generation pipeline.

* chore: re-run CI

* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline

* Fix `tokenizer_encode_kwargs` doc string.

* Fix note related to `tokenizer _kwargs` in text generation pipeline

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:19 +00:00
1a35d07f56 Update collated reports working directory and --path (#40433) 2025-08-25 15:18:26 +00:00
399cd5c04b Fix modular for modernbert-decoder (#40431)
* fix the modular

* CI
2025-08-25 16:50:49 +02:00
ea8d9c8f06 🚨 Remove DoLa decoding strategy (#40082)
* remove dola generation strategy

* add fast test
2025-08-25 16:33:27 +02:00
6bf6f8490c [Mxfp4] Add a way to save with a quantization method (#40176)
* add a test

* tempdir

* fix import issue[

* wow I am tired

* properly init

* i am not super familiar with quantizer api :|

* set to TRUE fro now

* full support

* push current changes

* will clean this later but the imports are a shitshow here

* this correctly saves the block and scales but forward seems broken

* quanitze was not correct

* fix storage

* why were bias even included

* finally!

* style

* fix style

* remove print

* lazy import

* up

* not sure what happens this works now?

* holy molly it was not so far

* okay this seems to work!

* workings!!!

* allow save_pretrained to create PR

* Apply suggestions from code review

* fixup

* add deqyabtze fakse as wek

* working new

* fix

* rm swizzle and unswizzle during saving

* rm print

* Update src/transformers/modeling_utils.py

* fix

* style

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
2025-08-25 16:27:19 +02:00
04c2bae3a8 Fix label smoothing incompatibility with multi-label classification (#40296)
* Fix label smoothing incompatibility with multi-label classification (#40258)

* Improve label smoothing multi-label check based on reviewer feedback

- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
2025-08-25 14:23:31 +00:00
3b5b9f6518 Fix processing tests (#40379)
* fix tests

* skip failing test in generation as well

* grounding dino was overwritten

* one more overwritten code

* clear comment
2025-08-25 14:50:54 +02:00
a0a37b3250 Gpt oss optim (#40304)
* enable fast index selecting

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gpt-oss tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check tensor

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-08-25 14:36:33 +02:00
d73181b3fc Fix UnboundLocalError in WER metric computation (#40402)
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.

Co-authored-by: pranam-gf <pranam@goodfin.com>
2025-08-25 12:02:22 +00:00
11e12a715a Fix typo: 'seperator' to 'separator' in variable names (#40389)
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py

These typos were in variable names used for parsing path components in weight conversion scripts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-25 11:56:30 +00:00
40299134a8 Fix CI (hunyuan moe does not support fullgraph) (#40423)
fix flag
2025-08-25 12:01:28 +02:00
a2b37bfd58 Fix typo: 'casual' -> 'causal' in code and documentation (#40371) (#40407) 2025-08-25 09:32:15 +00:00
0031c044f8 [docs] flax/jax purge (#40372)
flax/jax purge
2025-08-25 10:25:00 +01:00
14b89fed24 fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194)
* fix to the typings which are unmatched to FA function signature

cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching

It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`

* format changes by ruff

* Update src/transformers/integrations/flash_paged.py

unused function arg in `PagedAttentionCache.update`

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* revert continuous_batching signiture, which is more meaningful

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-25 11:00:13 +02:00
ba095d387d 🧹 🧹 🧹 Get set decoder cleanup (#39509)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* this should be explicit

* fix Nth ESM exception

* tryout decoder

* this as well

* revert again

* 🧠

* aaah ESM has two modelings aaah

* broom broom

* format

* wrong copies

* copies

* modular cleanups

* format

* modularities

* wrong mergefix

* seriously

* align with new model

* new model
2025-08-25 10:57:56 +02:00
2c55c7fc94 Reactivate a lot of tests skipped for no reason anymore (#40378)
* reactivate all the tests

* some tests still failing
2025-08-25 10:44:43 +02:00
4f9b4e62bc Run FA2 tests in CI (#40397)
up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-23 12:30:18 +02:00
28ca27cb2b HF papers in doc (#40381)
* HF papers

* clean

* Update src/transformers/models/gemma3n/configuration_gemma3n.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:07:08 -07:00
7d88f57fc6 Update README_zh-hans.md (#40380)
Fix a typo.
2025-08-22 18:22:26 +00:00
29ddcacea3 Rework the Cache documentation (#40373)
* start working the doc

* remove gemma2

* review
2025-08-22 17:06:28 +02:00
dab66f15a1 Chat Template Doc Fixes (#40173)
* draft commit

* draft commit

* Fixup chat_extras too

* Update conversations.md

* Update the toctree and titles

* Update the writing guide!

* Use @zucchini-nlp's suggestion

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:48:33 +01:00
0a21e870c7 Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352)
* bug fix - return_lse dynamically set

* addressed compatibility with return type - flex_attention_forward

* rename variables

* revert changes to commits
2025-08-22 13:49:26 +00:00
894b2d84b6 Add GptOssForTokenClassification for GPT-OSS models (#40190)
* Add GptOssForTokenClassification for GPT-OSS models

* After run make fixup
2025-08-22 15:14:46 +02:00
56d68c6706 Addiing ByteDance Seed Seed-OSS (#40272)
add seed oss
2025-08-22 14:54:28 +02:00
8a6908c10d fix(example): align parameter names with the latest function definition for gdino (#40369) 2025-08-22 12:27:58 +00:00
7db228a92a [configuration] allow to overwrite kwargs from subconfigs (#40241)
allow to overwrite kwargs from subconfigs
2025-08-22 13:31:25 +02:00
19ffe0219d [processor] move commonalities to mixin (#40339)
* move commonalities to mixin

* revert - unrelated

* fix copies

* fix style

* comments
2025-08-22 13:04:43 +02:00
d8f6d3790a ⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782)
* update everywhere

* style

* pipelines

* switch it everywhere in tests

* switch it everywhere in docs

* switch in converters everywhere

* update in examples

* update in model docstrings

* style

* warnings

* style

* Update configuration_utils.py

* fix

* Update configuration_utils.py

* fixes and add first test

* add pipeline tests

* Update test_pipelines_common.py

* add config test

* Update test_modeling_common.py

* add new ones

* post rebase

* add new

* post rebase adds
2025-08-22 12:34:16 +02:00
9c25820978 [pipelines] add support to skip_special_tokens in the main text generation pipelines (#40356)
* add support to skip_special_tokens in pipelines

* add test

* rm redundant
2025-08-22 10:12:46 +00:00
5c40e7a225 Change multimodal data links to HF hub (#40309)
change multimodal data links to HF hub
2025-08-22 11:50:04 +02:00
e018b77c89 wav2vec2 fixes (#40341)
* Changed datasets to avoid a datasets error

* Changed back split to test
2025-08-22 11:32:29 +02:00
d7fe3111ff Fix idefics3 vision embeddings indices dtype (#40360)
fix idefics3 vision embeddings

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 11:10:45 +02:00
cf487cdf1f HunYuan opensource (#39606)
* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (#5)

* fix attention

* use llamamodel

* fix code

* Fix qk (#6)

* fix qk_norm

* fix

* fix modual

* Fix moe (#7)

* fix some moe code

* fix einsum

* try top1

* use top1

* Fix rotary (#8)

* fix rotary

* fix modeling

* fix modular

* fix testcode

* remove A13B unit test

* Fix moe v1 (#9)

fix moe & gate

* Fix gate norm (#10)

* add norm_topk_prob

* Fix testcase (#11)

* fix&skip test

* Fix testcase (#12)


* skip testcase

* Fix norm topk (#13)

* hardcode norm_topk_prob

* fix testcase

---------

Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Mingji Han <mingjihan@tencent.com>
2025-08-22 07:59:58 +00:00
8365f70e92 DOCS: Clarification on the use of label_names as an argument to TrainingArguments (#40353)
* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-21 17:19:04 -07:00
7c1169e21f [4/N]more docs to device agnostic (#40355)
* more docs to device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 1

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-08-21 10:22:26 -07:00
9568b506ed [generate] handle support for cache classes when num enc layers != num dec layers (#40277)
* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o
2025-08-21 17:35:18 +01:00
7f38068ae0 Qwen2.5-VL test fixes for ROCm (#40308) 2025-08-21 18:13:07 +02:00
cb1df4d26a [FA] Fix some model tests (#40350)
* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit
2025-08-21 18:08:21 +02:00
f46f29dd7c Remove more PyTorch 2.2 compatible code (#40337)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 15:19:53 +00:00
128f42d370 [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300)
fix: use consistent dtype for sine positional embeddings
2025-08-21 15:49:56 +01:00
2121d09239 [serve] add cors warnings (#40112)
* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-21 14:32:36 +01:00
b40b834ab1 Clean up XCodec and other codecs (#40348)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-21 15:32:00 +02:00
75aa7c7252 [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991)
* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion
2025-08-21 15:16:03 +02:00
04b751f07d Fix attention vizualizer (#40285)
* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans
2025-08-21 13:13:35 +00:00
cyn
1e1db12304 (small) fix conditional for input_ids and input_embeds in marian (#40045)
* (small) fix conditional for input_ids and input_embeds in marian

* address comment
2025-08-21 15:13:14 +02:00
7f2f53424e Update test_spm_converter_bytefallback_warning (#40284)
fff

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-21 14:09:28 +02:00
11a49dd9e3 T5 test and target device fixes (#40313)
* Fix cache setup related issues

* Fix target-device-related issues

* Ruff

* Address review comments
2025-08-21 14:07:29 +02:00
c4513a9fe6 Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310)
* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
2025-08-21 11:42:53 +00:00
c7e6f9a485 Fix an infinite loop bug in recursive search of relative imports (#40326)
Fix bug in recursive search of relative imports
2025-08-21 11:39:43 +00:00
e95441bdb5 add type hints (#40319)
* add basic type hints to import module

* run make fixup

* remove optional

* fixes

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-21 12:19:59 +01:00
5c88d8fbcc Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322)
* Only call Trainer.align_special_tokens if model has "config" attribute

* Add efficient test for training a model without model.config

* Reformat
2025-08-21 12:06:42 +01:00
c031f6f994 [docs] remove TF references from /en/model_doc (#40344)
* models up to F

* models up to M

* all models
2025-08-21 11:53:21 +01:00
7b060e5eb7 Add missing arguments to class constructors (#40068)
* Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 10:22:38 +00:00
6ad7f29461 Fix deprecation warning version (#40343)
fix
2025-08-21 12:18:23 +02:00
adf84aec21 Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200)
* Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification

* After run make fixup
2025-08-21 12:01:33 +02:00
1e2e28f3c8 Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066)
* Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch

Signed-off-by: cyy <cyyever@outlook.com>

* subclass RMSNorm

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 11:58:35 +02:00
022af24fcc Fix qwen-omni processor text only mode (#40336)
* Fix qwen-omni processor text only mode

* remove try except

---------

Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>
2025-08-21 11:57:32 +02:00
c99ed492c7 [docs] remove flax references from /en/model_doc (#40311)
* 1st commit

* all models up to D

* all models up to G

* all models up to M

* all remaining models
2025-08-21 10:52:54 +01:00
c2e3cc24e0 Fix chunked attention mask with left-padding (#40324)
* add fix

* add test

* raise proper warning for older versions

* fix

* fix and add 2nd test

* fix for flex and torch 2.5
2025-08-21 10:52:49 +02:00
242bb2cafc One cache class to rule them all (#40276)
* remove all classes

* fix generate

* start replacing everywhere

* finish removing everywhere

* typo

* typo

* fix

* typo

* remove num_layers=1

* CI

* fix all docstrings

* review

* style
2025-08-20 19:36:11 +02:00
1054494dd6 Update notification service amd_daily_ci_workflows definition (#40314) 2025-08-20 17:49:46 +02:00
139cd91713 Fix: Apply get_placeholder_mask in Ovis2 (#40280)
* Refactor special image mask

* Refactor get_placeholder_mask method

* Revert "Refactor special image mask"

This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85.

* Fix

* Revert "Refactor get_placeholder_mask method"

This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.
2025-08-20 17:12:10 +02:00
5d906740d2 Update CI with nightly torch workflow file (#40306)
* fix nightly ci

* Apply suggestions from code review

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-20 16:59:00 +02:00
4977ec2ae8 [GPT OSS] Refactor the tests as it was not properly checking the outputs (#40288)
* it was long due!

* use the official kernel

* more permissive

* update the kernel as well

* mmm should it be this?

* up pu

* fixup

* Update test_modeling_gpt_oss.py

* style

* start with 20b
2025-08-20 16:47:41 +02:00
3b7230124b No more natten (#40287)
get rid off natten

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 16:10:15 +02:00
2df0c323cb byebye torch 2.1 (#40317)
* Bump minimum torch version to 2.2

* Remove is_torch_greater_or_equal_than_2_2

* update versions table

* Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa
2025-08-20 15:03:46 +01:00
c50f140be2 Add back _tp_plan attribute (#39944)
* Update modeling_utils.py

* make sure we update with the module's plan

* use public api

* oups

* update

* fix failing test

* Update src/transformers/integrations/tensor_parallel.py

* Update src/transformers/integrations/tensor_parallel.py

* fix

* make the API more friendly!

* fix tests

* fix styling

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-20 15:29:55 +02:00
a97213d131 Qwen2.5-Omni test fixes (#40307)
Updated expectations, and mp tests
2025-08-20 14:48:30 +02:00
ca543f822f Add support for Florence-2 (#38188)
* init

* add modular

* fixup

* update configuration

* add processing file

* update auto files

* update

* update modular

* green setup_and_quality ci

* it works

* fix some tests

* commit florence2

* update test

* make test cases done - 16 left

* style

* fix few test cases

* fix some tests

* fix init test

* update florence2 vision style

* hope is green

* fix init test

* fix init

* update modular

* refactor vision module

* fix: channel attention use dynamic scale

* update modular

* update

* update attention mask

* update

* fix naming

* Update src/transformers/models/florence2/processing_florence2.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* spatial block works

* more beautiful

* more more beautiful

* merge main

* merge main and fixup

* fix typing hint

* update modeling

* fix eager matches sdpa

* fix style

* fix compile test - all green

* remove florence2 language

* remove Florence2LanguageModel things

* fix style

* update florence2 model

* override prepare encoder_decoder for generation

* add weight conversion script

* rewrite channel attention to use sdpa

* eleminate 1 tranpose op

* support fa2

* fix quality check

* chore: reformat `test_modeling_florence2.py`

* some refactor for processor

* some refactor for processor

* update naming convention and remove BC

* make it pass the test

* fix: correct Embedding Cosine

* update comments and docstring

* support input_embeds

* support input embeds ideally

* fix style

* fix style

* fix style again :D

* add test prcoessor

* refactor processor and add test for processor

* reformat test processor

* make fixup

* fix schema check

* remove image_token

* ensure image token in tokenizer and fix integration tests

* fix processor test

* add more integration tests for large model and rename test_processor to test_processing

* test_assisted_decoding_sample should pass

* update doc and make model work with image text to text pipeline

* docs: add sdpa bagde

* resolve cyril's comments

* fix import torch error

* add helper get_placeholder_mask

* inherit from llava

* florence2 may not _supports_attention_backend because of bart ...

* move florence2 model card to multimodal

* let base model always return_dict

* fix style

* tiny update doc

* set   _checkpoint_conversion_mapping = {}

* fix code quality

* support flex and compile graph and move external func to internal func

* remove condition because it always true

* remove window funcs

* move post processor config out

* fix ci

* new intro to trigger test

* remove `kernel_size` argument

---------

Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 14:28:06 +02:00
959239debc Remove unnecessary contiguous calls for modern torch (#40315) 2025-08-20 12:24:14 +00:00
7d2aa5d6e6 🚨 [Flash Attention] Fix sliding window size (#40163)
* swa fix

* add comment, make fix symmetrical

* modify fa inference test to force swa correctness check

* fixup comment
2025-08-20 14:23:14 +02:00
3128db6927 chore: fix typo in find_executable_batch_size to match new 0.9 ratio (#40206) 2025-08-20 12:18:06 +00:00
ca0aaa8c74 [fix] Pass adamw optimizer parameters to StableAdamW (#40184)
* fix: pass adamw optimizer parameters to StableAdamW

* add test for stable_adamw initialization with trainer arguments

* address copilot suggestion

* fix: update weight_decay handling in stable_adamw kwargs

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-20 11:52:23 +00:00
a01f38b364 Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312)
fix got-ocr patches caculation

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-20 13:13:58 +02:00
a5f0b505a0 Remove OTel SDK dependencies (#40305) 2025-08-20 12:31:44 +02:00
d0f1a6ec36 Clean up X-Codec. (#40271)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.
2025-08-20 12:16:28 +02:00
da9452a592 [docs] delete more TF/Flax docs (#40289)
* delete some TF docs

* update documentation checks to ignore tf/flax

* a few more removals

* nit

* Update utils/check_repo.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 10:44:14 +01:00
a4e1fee44d [FA] Fix dtype in varlen with position ids (#40295)
fix
2025-08-20 11:15:55 +02:00
126bc03b4e Allow to be able to run torch.compile tests with fullgraph=True (#40164)
* fix

* address comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 10:42:33 +02:00
1d46091737 Add MetaCLIP 2 (#39826)
* First draft

* Make fixup

* Use eos_token_id

* Improve tests

* Update clip

* Make fixup

* Fix processor tests

* Add conversion script

* Update docs

* Update tokenization_auto

* Make fixup

* Use check_model_inputs

* Rename to lowercase

* Undo CLIP changes

* Address comment

* Convert all checkpoints

* Update auto files

* Rename checkpoints
2025-08-20 09:25:43 +02:00
0f9c9088d0 [3/3] make docs device agnostic, all en docs for existing models done (#40298)
docs to device agnostic cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-19 21:01:27 -07:00
eaa48c81e9 make model docs device agnostic (2) (#40256)
* doc cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* more models

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mixtral.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 13:10:03 -07:00
42fe769928 SmolVLM test fixes (#40275)
* Fix SmolVLM tests

* Add the proper CUDA expectations as well

* Split 'A10 and A100 expectations

* Ruff

---------

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-19 21:22:06 +02:00
4c017465bd Adjust ROCm test output expectations (#40279)
Adjust ROCm output expectations
2025-08-19 21:21:45 +02:00
0f9ce43687 Standardize BertGeneration model card (#40250)
* Standardize BertGeneration model card: new format, usage examples, quantization

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback: update code examples

* Add missing code example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:22:13 -07:00
6ceb13fb22 SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121)
* Ensure pixel values are converted to the correct dtype for fp16/bf16

* add to modular
2025-08-19 10:39:08 -07:00
92f40da608 Update model card for gpt neox japanese (#39862)
* Update GPT-NeoX-Japanese model card

* Apply suggestions from code review

* Update gpt_neox_japanese.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 09:18:46 -07:00
3a4b2756cf docs: Update TrOCR model card to new format (#40240)
* docs: Update TrOCR model card to new format

* Updated Sugegestions
2025-08-19 09:17:45 -07:00
46d38546f3 Standardize RAG model card (#40222)
* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig
2025-08-19 09:16:10 -07:00
bd96e1e1cc docs(layoutlm): add missing id=usage to <hfoptions> tag in LayoutLM model card (#40273)
docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card
2025-08-19 09:14:43 -07:00
8636b309e6 Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232)
* Fix chat CLI GPU loading and request_id validation issues (#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message

* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming
2025-08-19 15:33:44 +00:00
bebeccb06a fix which routing method (#40283) 2025-08-19 16:35:13 +02:00
249d7c6929 Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252)
* Update image_processing_perception_lm_fast.py

Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.

* Update processing_perception_lm.py to match kwargs vision input type

* Update image_processing_perception_lm_fast.py kwargs to signature args
2025-08-19 11:41:27 +00:00
r0
57bb6db6ee Skipping pytree registration in case fsdp is enabled (#40075)
* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Added pytree registration inside dynamic cache class

* Making ci/cd lords happy

* Adding a check if DynamicCache is already a leaf

* Adding try/catch for multiple initializations of DynamicCache in test suites

* Moving dynamic cache pytree registration to executorch

* Adding try catch back
2025-08-19 11:58:05 +02:00
5b3b7ea472 Add Kosmos-2.5 (#31711)
Add Microsoft Kosmos-2.5

---------

Co-authored-by: kirp@umich.edu <tic-top>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:56:03 +02:00
c93594e286 [detection] fix correct k_proj weight and bias slicing in D-FINE (#40257)
Fix: correct k_proj weight and bias conversion in D-FINE
2025-08-19 09:44:37 +00:00
2f1a8ad4ba Fix setting attention for multimodal models (#39984)
* fix

* use non-explicit `None`

* keep previously set attn if exists
2025-08-19 11:35:11 +02:00
a2e76b908b 🚨🚨 Switch default compilation to fullgraph=False (#40137)
* switch default

* docstring

* docstring

* rework tests and remove outdated restrictions

* simplify

* we need a check for static cache

* fix

* rename var

* fix

* revert

* style

* rename test
2025-08-19 11:26:22 +02:00
2b59207a72 Fix slow static cache export tests (#40261) 2025-08-19 11:24:07 +02:00
56c44213b3 [detection] fix attention mask for RT-DETR-based models (#40269)
* Fix get_contrastive_denoising_training_group attention

* Add bool attention_mask conversion
2025-08-19 09:15:56 +00:00
5d9a715e30 set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248)
* set inputs_embeds to None while generate to avoid audio encoder forward in generation process

* set input_features to none instead

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-08-19 08:45:57 +00:00
28746cdc7b Remove MI300 CI (#40270)
Remove MI300 CI (in history if we need it back)
2025-08-19 08:23:39 +00:00
debc92e60a Skip broken tests (#40157)
skip these tests
2025-08-19 10:04:08 +02:00
6b5bd11723 docs: Update OLMo model card (#40233)
* Updated OLMo model card

* Update OLMo description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli example

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add bitsandbytes info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 13:35:39 -07:00
e472efb9ac Fix benchmark workflow (#40254)
Correct init_db.sql path

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-18 18:14:16 +00:00
59862209ca Correct typo and update notes in docs Readme (#40234)
* Correct typo and update notes in docs readme

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:31:12 -07:00
a7eabf1dde Model card for NLLB (#40074)
* initializing branch and draft PR

* updated model card .md file

* minor

* minor

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resolving comments + adding visuals

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* NllbTokenizerFast and NllbTokenizer added

* endline

* minor

* Update nllb.md

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:05:59 -07:00
01c03bf4ee fix: Catch correct ConnectionError for additional_chat_templates (#39874)
* fix: Catch correct ConnectionError for additional_chat_templates

* fix: don't catch timeout

* fix: formatting
2025-08-18 17:25:47 +01:00
2bcf9f6c7e Fixes for EncoderDecoderCache (#40008)
* Add expectation to t5 for rocm 9.4

* Made EncoderDecoderCache compatible with nn.DataParallel

* Fixed t5gemma EncoderDecoderCache

* Added todos in autoformer

* Ruff

* Init is self-contained

* Review compliance

* Fixed kwargs init of EncoderDecoderCache
2025-08-18 17:51:05 +02:00
aa45824919 [CI] Fix repo consistency (#40249)
* fix

* doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 17:32:17 +02:00
d6fad86d23 [serve] guard imports (#39825)
guard imports
2025-08-18 16:28:10 +01:00
MQY
7a0ba0d7d8 [typing] fix type annotation error in DepthPro model image processor (#40238)
* fix type annotation error in DepthPro model image processor

* fix

* run make fix-copies
2025-08-18 15:42:13 +01:00
00b4dfb786 Add chat_template (jinja2) as an extra dependency (#40128)
* add jinja2 as a dependency

* Make jinja2 a core dependency in install_requires

- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make jinja2 a core dependency in install_requires

* Make jinja2 an extra dependency instead of adding a core dep

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 14:31:40 +00:00
f417a1aad4 remove transpose_for_scores call in ESM-2 (#40210)
* remove transpose_for_scores call

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* fix copied evolla code

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-08-18 14:28:59 +00:00
a36d51e801 🚨 Always return Cache objects in modelings (to align with generate) (#39765)
* watch the world burn

* fix models, pipelines

* make the error a warning

* remove kwargs and return_legacy_cache

* fix reformer
2025-08-18 16:26:35 +02:00
57e230cdb2 Fix more pylint warnings (#40204)
Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 14:17:16 +00:00
47938f8f8d Add Ovis2 model and processor implementation (#37088)
* Add Ovis2 model and processor implementation

* Apply style fixes

* Add unit tests for Ovis2 image processing and processor

* Refactor image processing functions for clarity and efficiency

* Add Ovis2 ImageProcessorFast

* Refactor Ovis2 code

* Refactor Ovis2 model components and update processor functionality

* Fix repo consistency issues for Ovis2: docstring, config cleanup

* Update Ovis2 model integration tests

* Update Ovis2 configuration and processing classes for improved documentation

* Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES

* Fix conflict

* Fix import order

* Update image processor class names

* Update Ovis2 model structure

* Refactor Ovis2 configuration

* Fix typos

* Refactor Ovis2 model classes and remove unused code

* Fix typos

* Refactor Ovis2 model initialization

* Fiix typos

* Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py

* Add license and update type hints

* Refactor token function and update docstring handling

* Add license

* Add Ovis2 model support and update documentation

* Refactor Ovis2 model structure and enhance multimodal capabilities

* Update Ovis2 weight mapping for consistency and clarity in key patterns

* Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently.

* Refactor Ovis2 model test structure to include Ovis2Model

* Add optional disable_grouping param to Ovis2ImageProcessorFast

* Refactor type hints in Ovis2 modules

* Add licensing information in Ovis2 modules and tests

* Refactor Ovis2 model by removing unused methods

* Refactor Ovis2 model tests by renaming test classes and removing skipped tests

* Refactor Ovis2 model output classes

* Refactor Ovis2 weight conversion and Update model embedding classes

* Refactor Ovis2 model imports and remove unused functions

* Enhance vision configuration extraction in Ovis2 weight conversion

* Refactor Ovis2 model's forward method to remove interpolation option

* Update Ovis2 model documentation

* Refactor Ovis2 model input handling and tokenizer configuration

* Update return type hints in Ovis2 model

* Remove commented-out code

* fix config for tests and remove key mappings

* Update tokenizer configuration to use add_special_tokens method

* skip torchscript

* Fix image placeholder generation in Ovis2Processor

* Refactor Ovis2 model to rename visual_table to visual_embeddings_table

* Enhance Ovis2 model by adding vision_feature_select_strategy parameter

* Refactor Ovis2 model weights conversion and architecture

* Refactor Ovis2 model by removing vision_feature_select_strategy parameter

* Update Ovis2 model examples

* Refactor Ovis2 model

* Update Ovis2 model

* Update Ovis2 model configuration

* Refactor Ovis2 model test setup

* Refactor flash attention support

* Refactor

* Fix typo

* Refactor

* Refactor model classes

* Update expected output in Ovis2

* Refactor docstrings

* Fix

* Fix

* Fix

* Update input in tests

* Fix

* Fix get_decoder method

* Refactor

* Refactor Ovis2

* Fix

* Fix

* Fix test

* Add get_placeholder_mask

* Refactor Ovis2 model tests

* Fix

* Refactor

* Fix

* Fix

* Fix Ovis2 test

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 16:05:49 +02:00
2fe43376cd AMD scheduled CI ref env file (#40243)
* Reference env-file to be used in docker running the CI

* Disable MI300 CI for now
2025-08-18 15:23:27 +02:00
e4bd2c858d Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181)
* fix: Error after calling ESM model with input embeddings not input ids

* propagate changes to other models
2025-08-18 13:22:10 +00:00
6333eb986a Fix more typos (#40212)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 12:52:12 +00:00
e5886f9194 [SAM 2] Change checkpoints in docs and tests (#40213)
* change checkpoints in docs and tests

* add notebook
2025-08-18 11:21:34 +02:00
eb2f9da096 fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130)
* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* fix similar errer at qwen2_vl and do make fix-copies

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* Apply style fixes

---------

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-18 08:59:25 +00:00
6ce8f05375 Use correct model_input_names for PixtralImageProcessor (#40226)
add image_sizes to model_input_names
2025-08-18 08:06:52 +00:00
2914ceca20 Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for too long with no output (#40201)
* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"

This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-18 08:40:53 +02:00
cd22550692 docs: Update LayoutLM model card according to new standardized format (#40129)
* docs: Update LayoutLM model card with standardized format

* Apply suggestions from code review

This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Address remaining review comments

* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header

* Update layoutlm.md

* Update layoutlm.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-15 09:33:47 -07:00
05000aefe1 Fix GPT-OSS swiglu_limit not passed in for MXFP4 (#40197)
Add swiglu_limit = 7.0
2025-08-15 17:04:25 +02:00
3f4c85fef0 Add X-Codec model (#38248)
* add working x-codec

* nit

* fix styling + copies

* fix docstring

* fix docstring and config attribute

* Update args + config

* update convertion script

* update docs + cleanup

* Ruff fix

* fix doctrings
2025-08-15 16:24:12 +02:00
29e4e35927 Benchmarking improvements (#39768)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments
2025-08-15 15:59:11 +02:00
de437d0d7a Update: add type hints to check_tokenizers.py (#40094)
* Update check_tokenizers.py

chore(typing): add type hints to check_tokenizers script

- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)

* Update check_tokenizers.py

chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py

- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic
2025-08-15 12:41:28 +00:00
28a03fb78a Fix various Pylint warnings (#40107)
Tidy code

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:40:12 +00:00
ec85d2c44f Avoid CUDA stream sync (#40060)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:37:15 +00:00
c7afaa5b44 Remove _prepare_flash_attention_from_position_ids (#40069)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:35:03 +00:00
c167faa081 Fix typos (#40175)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:10:26 +00:00
5068fcd9a8 Add repr to EncoderDecoderCache (#40195)
* add repr

* oups
2025-08-15 12:57:49 +02:00
421175685d Fix fsdp for generic-task models (#40191)
* remove abc inheritance

* add fast test
2025-08-15 12:28:16 +02:00
4912d5b490 fix to avoid modifying a view in place (#40162)
* fix to avoid modifying a view in place

* add backward test in tensor parallel

* add test to test_modelig_gpt_oss.py

* linting
2025-08-15 10:30:49 +02:00
cc9997878a make model doc device agnostic (#40143)
* make model doc device agnostic

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update align.md

* Update aya_vision.md

* Update byt5.md

* refine

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update granitevision.md

* Update src/transformers/pytorch_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add doc

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* 3 more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 23:31:31 -07:00
85fce2e54c [MINOR:TYPO] Update base.py (#40169)
* [MINOR:TYPO] Update base.py

All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)

Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns:  `ValueError: The task does not provide any default models for options ('EN', 'FR')`

It might be a good idea to allow for uppercase, but that's for another issue.

* [MINOR:TYPO] Update __init__.py
2025-08-14 22:53:57 -07:00
52c6c1bb6e Update dynamic attnt setter for multimodals (#39908)
* update

* fix the test for DepthPro

* PR comments

* wait, I didn't delete this in prev commit?

* fix

* better way

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-14 21:46:13 +02:00
31b6e6e1da Pin torch to 2.7.1 on CircleCI for now (#40174)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-14 20:19:35 +02:00
b02f2d8b6a Add dates to the model docs (#39320)
* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor
2025-08-14 10:08:46 -07:00
8a658ac119 Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051)
* Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md

* Update bartpho.md

Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header).

Added code snippets which were suggested

* Update bartpho.md

Updated with necessary tags

* Update bartpho.md

* Update bartpho.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 09:55:27 -07:00
2b6cbedeb2 Add GptOssForSequenceClassification for GPT-OSS models (#40043)
* Add GptOssForSequenceClassification

* Tiny fix

* make fixup

* trigger CI rerun

* Check config type instead

---------

Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>
2025-08-14 18:32:14 +02:00
b834cb8138 build: Add fast image processor tvp (#39529)
* build: add TvpImageProcessorFast

- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.

* fix: TvpImageProcessorFast with new resize method and update processing logic

* build: add TvpImageProcessorFast

* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests

- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.

* fix: Enhance TvpFastImageProcessorKwargs and update documentation

- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.

* fix: tested now with python 3.9

* fix: remove tvp kwargs from docs

* simplify processing

* remove import and fix tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-14 15:48:18 +00:00
6f259bc83e Fix docs typo (#40167)
* DINOv3 model

* working version

* linter revert

* linter revert

* linter revert

* fix init

* remove flex and add convert to hf script

* DINOv3 convnext

* working version of convnext

* adding to auto

* Dinov3 -> DINOv3

* PR feedback

* complete convert checkpoint

* fix assertion

* bf16 -> fp32

* add fast image processor

* fixup

* change conversion script

* Use Pixtral attention

* minor renaming

* simplify intermediates capturing

* refactor DINOv3ViTPatchEmbeddings

* Refactor DINOv3ViTEmbeddings

* [WIP] rope: remove unused params

* [WIP] rope: rename period -> inv_freq for consistency

* [WIP] rope: move augs

* change inv_freq init (not persistent anymore)

* [WIP] rope: move coords to init

* rope - done!

* use default LayerScale

* conversion: truncate expected outputs

* remove commented code

* Refactor MLP layers

* nit

* clean up config params

* nit docs

* simplify embeddings

* simplify compile compat lru_cache

* fixup

* dynamic patch coords

* move augmentation

* Fix docs

* fixup and type hints

* fix output capturing

* fix tests

* fixup

* fix auto mappings

* Add draft docs

* fix dtype cast issue

* add push to hub

* add image processor tests

* fixup

* add modular

* update modular

* convert and test convnext

* update conversion script

* update prefix

* Update LayerNorm

* refactor DINOv3ConvNextLayer

* rename

* refactor convnext model

* fix doc check

* fix docs

* fix convnext config

* tmp fix for check docstring

* remove unused arg

* fix tests

* (nit) change init

* standardize gated MLP

* clear namings and sat493m

* fix tensors on different devices

* revert linter

* pr

* pr feedbak ruff format

* missing headers

* fix code snippet and collection link in docs

* DINOv3 description

* fix checkpoints in tests

* not doc fixes in configs

* output_hidden_states

* x -> features

* remove sequential

---------

Co-authored-by: Cijo Jose <cijose@meta.com>
2025-08-14 17:29:53 +02:00
41980ce93e [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151)
* [bugfix] fix flash-attention2 unavailable error for Ascend NPU

* remove redundant apply_rotary_emb usage

* fix ruff check error

* pad_input and unpad_input use same implementation as fa2

* rollback redundant codes

* fix ruff check error

* optimize fa2 judgement logic
2025-08-14 14:21:39 +02:00
eba1d62091 [FA2] Fix it finally - revert fa kwargs preparation (#40161)
revert
2025-08-14 13:39:11 +02:00
1c5d2f7fb6 Replace self.tokenizer by self.processing_class (#40119) 2025-08-14 13:24:55 +02:00
cfe52ff4db [Continous Batching] set head_dim when config.head_dim is None (#40159)
* set head_dim when config.head_dim is None

* use model's actual TP setting
2025-08-14 13:23:27 +02:00
c47544b16f Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160)
fix ci
2025-08-14 10:53:23 +00:00
22e89e5385 [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141)
* fix: changed is_causal to be False

* fix: Added original cross attention bug

* fix: fixed the way bordel removal is computed

* fix: added missing normalization on coarse features

* test: fixed integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-14 10:42:59 +01:00
252364fd8e [Cohere2Vision] remove unused arg (#40103)
* remove unused arg

* remove the arg from test as well
2025-08-14 09:10:25 +00:00
e446372f76 Create self-scheduled-amd-mi355-caller.yml (#40134) 2025-08-14 01:33:45 +02:00
be1ab5103f Update Dockerfiles to install packages inside a virtual environment (#39098)
* Removed un-necessary virtual environment creation in Dockerfiles.

* Updated Dockerfiles to install packages in a virtual environment.

* use venv's python

* update

* build and trigger

* trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:51:52 +02:00
591708d9ce Add pytest marker: torch_compile_test and torch_export_test (#39950)
* new marker

* trigger CI

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:47:15 +02:00
12e49cda32 Fix quantized cache with only cache_implementation in generate (#40144)
* fix args

* comment
2025-08-13 23:21:41 +02:00
e651ae0a32 🌐 [i18n-KO] Translated gemma3.md to Korean (#39865)
* docs: ko: gemma3.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2025-08-13 13:25:20 -07:00
0f9c2595cd updated visualBERT modelcard (#40057)
* updated visualBERT modelcard

* fix: Review for VisualBERT card
2025-08-13 12:47:32 -07:00
412c9c3030 Remove an old badly designed test (#40142)
remove it
2025-08-13 20:47:00 +02:00
eb5768a86e [docs] Fix ko toctree (#40138)
Update _toctree.yml
2025-08-13 11:24:58 -07:00
68a13cd4a6 Add Segment Anything 2 (SAM2) (#32317)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* nit fixes + optimization

* split sam2 and sam2_video in two models

* PR review part 1

* fix None for default slow processor of sam2

* remove unecessary code path in sam2_video

* refactor/simplify RoPE

* replace embedding module list with embedding matrix

* fix tests

* remove kernel

* nit

* use lru_cache for sine_pos_embeddings

* reorder sam2_video methods

* simplify sam2_video

* PR review part 1

* simplify sam2 video a lot

* more simplification

* update integration tests with updated conftest

* more explicit config for hieradet

* do post_processing outside of sam2 video model

* Improve Sam2VideoVisionRotaryEmbedding

* fix tests

* update docs and fix mask2former/oneformer

* avoid unnecessary reshapes/permute

* fix device concatenating points

* small dtype fix

* PR review

* nit

* fix style and finish up doc

* fix style

* fix docstrings

* fix modular

---------

Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-13 14:18:05 -04:00
25ad9c8c92 Fix Janus (#40140)
fix
2025-08-13 20:12:21 +02:00
bec6926696 gpt oss is important (#40139) 2025-08-13 19:49:54 +02:00
ab9108517a 🌐 [i18n-KO] Translated pipelines.md to Korean (#39577)
* docs: ko: pipelines.md

* feat: gpt draft

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

번역 문서 수정

* Update pipelines.md

ToC 수정

* Update pipelines.md

---------

Co-authored-by: xhaktm <tnwjd318@hs.ac.kr>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-13 10:26:17 -07:00
20c6b478cd 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007)
* use lru_cache for sine pos embeddings maskformer

* fix calls to pos embed

* change maxsize to 1
2025-08-13 17:05:22 +00:00
6b728f1830 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861)
* docs: ko: grounding-dino.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* docs: add AP explanation for better readability

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-13 10:01:05 -07:00
127e33f759 🌐 [i18n-KO] Translated optimizers.md to Korean (#40011)
* docs: ko: optimizers.md

* feat: optimizers draft

* fix: manual edits

* docs: ko: update optimizers.md

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* docs: ko: final updates to optimizers and toctree

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>
2025-08-13 10:00:47 -07:00
ac52c77a66 🌐 [i18n-KO] Translated gpt2.md to Korean (#39808)
* docs: ko: bamba.md

* feat: nmt draft

* fix: manual edits

* docs: ko: gpt2.md

* feat: nmt draft

* fix: manual edits

* Remove bamba.md from docs/source/ko/model_doc/

* Update _toctree.yml
2025-08-13 10:00:25 -07:00
5337f3052d 🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults (#40135)
* working?

* fix tests
2025-08-13 17:57:41 +02:00
e4223fa915 🌐 [i18n-KO] Translated main_classes/optimizer_schedules.md to Korean (#39713)
* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
2025-08-13 08:23:09 -07:00
9e21e50241 🌐 [i18n-KO] Translated jamba.md to Korean (#39890)
* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
2025-08-13 08:22:28 -07:00
486844579b 🌐 [i18n-KO] Translated main_classes/processors.md to Korean (#39519)
* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2025-08-13 08:21:38 -07:00
f445caeb0f Fix hidden torchvision>=0.15 dependency issue (#39928)
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit
2025-08-13 15:13:42 +00:00
11537c3e0c [trainer] handle case where EOS token is None in generation_config (#40127)
* handle case where EOS token is None in gen config

* update eli5 dataset
2025-08-13 15:57:17 +01:00
8ef5cd6579 DOCS: Add missing space in SECURITY.md (#40087) 2025-08-13 12:57:37 +00:00
ebceef343a Collated reports (#40080)
* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report
2025-08-13 14:48:15 +02:00
e78571f5ce decoding_method argument in generate (#40085)
* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-13 12:45:50 +00:00
8d19231bca [serve] allow array content inputs for LLMs (#39829)
fix bug; add tests
2025-08-13 11:26:19 +01:00
34a1fc6426 Fix QuantoQuantizedCache import issues (#40109)
* fix quantoquantized
2025-08-13 10:22:59 +00:00
060b86e21d changed xLSTMRMSNorm to RMSNorm (#40113)
* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>
2025-08-13 11:10:42 +02:00
849c3778c6 [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975)
* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda
2025-08-13 09:58:50 +02:00
85d536a93b 🌐 [i18n-KO] Translated tiny_agents.md to Korean (#39913)
* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2025-08-12 22:54:16 -07:00
31ab7168ff remove sequence parallel in llama4 (#40084) 2025-08-13 00:12:45 +02:00
a1a4fcd03e Add model card for MobileViT (#40033)
* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-12 11:36:59 -07:00
e5e73e4b95 [docs] Add reference to HF-maintained custom_generate collections (#39894)
decoding -> generation; add collections
2025-08-12 17:38:00 +01:00
0ce24f5a88 Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707)
Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-12 16:16:28 +00:00
83dbebc429 [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441)
* tmp commit

* add test

* make fixup

* reset warns/info in test
2025-08-12 16:32:07 +01:00
9977cf1739 [Flash Attention] Fix flash attention integration (#40002)
* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment
2025-08-12 15:24:10 +00:00
b6ba595543 Default to dequantize if cpu in device_map for mxfp4 (#39993)
* default to dq if cpu

* an other check

* style

* revert some changes
2025-08-12 16:48:52 +02:00
a5fac1c394 Fix error on importing unavailable torch.distributed (#40038)
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
2025-08-12 16:30:51 +02:00
085e02383c Fix Qwen3 MoE GGUF architecture mismatch (#39976)
* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>
2025-08-12 13:38:48 +00:00
2ce0dae390 Switch the order of args in StaticCache (for BC and future logic) (#40100)
* switch order for BC and future logic

* in generate as well
2025-08-12 15:30:44 +02:00
f7cbd5f3ef Fix regression in mllama vision encoder (#40083)
fix mllama vision encoder

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-12 15:29:45 +02:00
35dc88829c Replace logger.warning with logger.warning_once in GradientCheckpointingLayer (#40091) 2025-08-12 15:26:47 +02:00
b1b46555cd Re-apply make style (#40106)
make style
2025-08-12 15:02:16 +02:00
a07b5e90f2 feat: add is_fast to ImageProcessor (#39603)
* feat: add `is_fast` to ImageProcessor

* test_image_processing_common.py 업데이트

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* feat: add missing BaseImageProcessorFast import

* fix: `issubclass` for discriminating subclass of BaseImageProcessorFast

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-08-12 12:14:57 +00:00
952fac100d Enable SIM rules (#39806)
* Enable SIM rules

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-12 12:14:26 +00:00
41d1717882 New DynamicSlidingWindowLayer & associated Cache (#40039)
* start adding the layer

* style

* improve

* modular

* fix

* fix

* improve

* generate integration

* comment

* remove old one

* remove

* fix

* fix

* fix

* fix all recompiles

* fix

* doc

* fix

* add text config check

* fix encoderdecoder cache

* add it for all models with sliding/hybrid support

* revert

* start fixing

* prophetnet

* fsmt

* fix ddp_data

* add test for mistral

* improve mistral test and add gemma2 test

* docstrings
2025-08-12 14:09:52 +02:00
ab455e0d88 Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743)
audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock
2025-08-12 12:08:28 +00:00
4b3a1a62cc Causal loss for ForConditionalGeneration (#39973)
* feat: add ForConditionalGeneration loss to LOSS_MAPPING

* consistent spelling of "recognized"
2025-08-12 14:03:09 +02:00
f6b6e17719 Add glm4.5&&glm4.5V doc (#40095)
* Docs: GLM-4-MoE & GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan <lambert@gmail.com>
2025-08-12 11:44:53 +00:00
1c5e17c025 Update Glm4V processor and add tests (#39988)
* update GLm4V and add tests

* Update tests/models/glm4v/test_processor_glm4v.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove min/max pixels for BC

* fix video tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-08-12 13:40:54 +02:00
913c0a8c33 [docs] Zero Shot Object Detection Task (#40096)
* refactor zsod task docs

* keeping the image guided od section

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/tasks/zero_shot_object_detection.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-08-12 11:43:38 +01:00
c6fbfab61b [fix] batch inference for llava_onevision (#40021)
* [fix] llava onevision batch inference

* style

* cannot pass inconsistent list & handle text-only case
2025-08-12 11:01:00 +02:00
86bb1fcd26 Revert FA2 kwargs construction (#40029)
* revert

* use imports

* went way too high in imports level

* style
2025-08-12 10:48:35 +02:00
3ff2e984d2 Fix PerceptionLM image preprocessing for non-tiled image input. (#40006)
* Fix PerceptionLM image preprocessing for non-tiled image input.

* Add test for single tile vanilla image processing.

* ruff format

* recover missing test skip

* Simplify test.

* minor test name fix
2025-08-12 08:40:22 +00:00
4668ef1459 Update notification service MI325 (#40078)
add mi325 to amd_daily_ci_workflows
2025-08-12 10:22:52 +02:00
1cea763ba4 feat: extract rev in attn_implementation kernels via @ (#40009)
* feat: extract rev in attn_implementation kernels via @

* fix: adjust for ruff

* fix: update regex and add explanatory comment

* fix: move attn_implementation kernel doc

* fix: remove extra line
2025-08-11 15:14:13 -04:00
e29919f993 [GPT Big Code] Fix attention scaling (#40041)
* fix

* update integration tests

* fmt

* add regression test
2025-08-11 19:01:31 +00:00
eca703026e chore: standardize DeBERTa model card (#37409)
* chore: standardize DeBERTa model card

* Apply suggestions from code review in docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Update deberta.md with code cleanup suggestions

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update deberta.md

* Update deberta.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-11 10:30:37 -07:00
43001fd3c6 Fix time_spent in notification_service.py. (#40081)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-11 18:30:58 +02:00
5521c62b89 added Textnet fast image processor (#39884)
* feat: add fast image processor implementation for TextNet model

* chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests

* chore: update init method

* chore: coding and style checks

* chore: fixed code quality issue

* chore: override resize to handle size_divisor, move all preprocessing logic to child class

* fix: autoImageProcessor issue for textnet

* chore: cleanup

* simplify resize

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-11 11:44:31 -04:00
6b70d79b61 Fix repo consistency (#40077)
fix
2025-08-11 15:26:22 +02:00
7dd82f307b guard on model.eval when using torch.compile + FSDP2 (#37413)
guard on model.eval

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-11 13:22:42 +02:00
68eb1a9a63 Remove deprecated cache-related objects (#40035)
remove them
2025-08-11 10:30:14 +02:00
480653d271 fix: move super().__init__ after vision_config init in Mistral3Config (#40063)
fix: move super().__init__ after vision_config init in Mistral3Config (#40062)
2025-08-11 09:21:54 +02:00
502f253e20 [gemma3] update conversion key mapping (#39778)
update conversion key mapping
2025-08-11 09:21:13 +02:00
3124d1b439 [qwen-vl] fix beam search with videos (#39726)
* fix

* fix copies
2025-08-11 09:21:04 +02:00
1372a5b8c4 fix: resolve triton version check compatibility on windows (#39986)
* fix: resolve triton version check compatibility on windows

* style: remove trailing space

* fix: fix typo

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-11 08:53:19 +02:00
99c747539e unpin torchcodec==0.5.0 and use torch 2.8 on daily CI (#40072)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-10 22:27:39 +02:00
b59140b696 Update HuBERT model card according to template (#39742)
* Update HuBERT model card according to template

Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section.

* Address review comments and changes requested to hubert.md

* Update hubert.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-10 11:32:45 -07:00
f4d57f2f0c Revert "fix notification_service.py about time_spent" (#40044)
Revert "fix `notification_service.py` about `time_spent` (#40037)"

This reverts commit d2ba153b29feb9cc0e9818c1ce63a07679b47250.
2025-08-08 22:32:24 +02:00
7b20915f4e GLM-4.5V Model Support (#39805)
* init

* update

* uupdate

* ruff

* t patch is 2 defalut not 1

* draft

* back

* back1

* update

* config update

* update using glm-41 format

* add self.rope_scaling = config.rope_scaling

* update config

* update

* remove the processor

* update

* fix tests

* update

* for test

* update

* update 2126

* self.rope_scaling is missing in GLM4MOE lets add it

* update

* update

* Update modular_glm4v_moe.py

* change config

* update apply_multimodal_rotary_pos_emb

* format

* update

* Delete 3-rollout_qas_thinking_answers.py

* use right name

* update with place holder

* update

* use right rotary

* Update image_processing_glm4v_fast.py

* rope_config_validation needs to rewrite the entire config file in modular

* update

* changed name

* update

* Update modeling_glm4v_moe.py

* _init_weights shoud be add in Glm4vMoePreTrainedModel

* remove use_qk_norm

* Update modular_glm4v_moe.py

* remove use_qk_norm as it is not use

* fix style

* deprecations are not needed on new models

* fix merge issues

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-08-08 17:39:52 +02:00
d2ba153b29 fix notification_service.py about time_spent (#40037)
temp

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-08 17:11:16 +02:00
f639c0c780 Bnb failling tests (#40026)
* initial commit

* style

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-08 16:28:00 +02:00
a96cccd0dd Tie weights recursively on all submodels (#39996)
* recursive call

* add missing keys

* remove bad keys
2025-08-08 16:03:16 +02:00
a78263dbb5 fix 2025-08-08 15:32:23 +02:00
dc11a3cbb2 [core] Refactor the Cache logic to make it simpler and more general (#39797)
* Simplify the logic quite a bit

* Update cache_utils.py

* continue work

* continue simplifying a lot

* style

* Update cache_utils.py

* offloading much simpler

* style

* Update cache_utils.py

* update inits

* Update cache_utils.py

* consistemncy

* Update cache_utils.py

* update generate

* style

* fix

* fix

* add early_initialization

* fix

* fix mamba caches

* update

* fix

* fix

* fix

* fix tests

* fix configs

* revert

* fix tests

* alright

* Update modeling_gptj.py

* fix the constructors

* cache tests

* Update test_cache_utils.py

* fix

* simplify

* back to before -> avoid compile bug

* doc

* mistral test

* llama4 test dtype

* Update test_modeling_llama4.py

* CIs

* Finally find a nice impl

* Update cache_utils.py

* Update cache_utils.py

* add lazy methods in autodoc

* typo

* better doc

* Add detailed docstring for lazy init

* CIs

* style

* fix
2025-08-08 14:47:21 +02:00
95510ab018 Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991) (#40024)
* Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)

* Switched definition of optional from| None to Optiona[] (Issue #39991)

---------

Co-authored-by: Laurenz Ruzicka <Laurenz.Ruzicka@ait.ac.at>
2025-08-08 10:43:42 +00:00
5c3fb7f731 Harmonize past_key_value to past_key_valueS everywhere (#39956)
* all modulars and llama

* apply modular

* bert and gpt2 copies

* fix imports

* do it everywhere

* fix import

* finalize it

* fix

* oups set it in modular

* style

* fix

* Add 1 version to deprecation cycle

* Update modeling_layers.py
2025-08-08 11:52:57 +02:00
2469cce621 Fix an annoying flaky test (#40000)
annoying flaky test
2025-08-08 10:32:51 +02:00
fe1bf82159 Higgs modules_to_not_convert standardization (#39989)
fix higgs
2025-08-08 10:22:59 +02:00
b374c3d12e Fix broken image inference for Fuyu model (#39915)
* fix fuyu

Signed-off-by: Isotr0py <2037008807@qq.com>

* oops

Signed-off-by: Isotr0py <2037008807@qq.com>

* run test on GPU

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* clean unused

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* revert

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add fuyu multimodal test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-08 07:21:49 +00:00
4d57c39007 pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI (#40013)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 23:05:39 +02:00
3e0333fa4a Update expected output values after #39885 (part 2) (#40015)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 22:52:53 +02:00
12f248bced Raising error when quantizing a quantized model (#39998)
* error when quantizing a quantized model

* style
2025-08-07 20:37:25 +00:00
efaf3714dc docs: fix duplication in 'en/optimizers.md' (#40014) 2025-08-07 13:28:43 -07:00
ca4cbb1e3f unpin torch<2.8 on circleci (#40012)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 21:31:17 +02:00
78922577e9 FA2 can continue generation from cache (#39843)
* add fa2 support to continue generation from cache

* update q-len
2025-08-07 19:26:23 +02:00
9bfbdd2945 Fix default values of getenv (#39867)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-07 17:25:40 +00:00
692d336908 Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips (#39965)
* fix hgnet docs and image-classification pipeline

* use positional argument

* fix dit close hfoptions tag

* fix alphabet order

* fix hgnnet modular docstring

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: hgnet reference

* change hgnet to en doc

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-07 09:33:29 -07:00
0659214196 fix: remove CHAT_TEMPLATE import in tests for deepseek-vl (#40003)
* remove CHAT_TEMPLATE import in tests

* update and use prepare_processor_dict
2025-08-07 16:19:36 +00:00
27997eeb8d Fix missing video inputs for PerceptionLM. (#39971)
* Fix missing video inputs for PerceptionLM.

* Minor fix for vanilla input image (only C,H,W, no tiles dim).

* Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)."

This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.
2025-08-07 15:54:45 +00:00
bf1bd6ac1f Fix int4 quantized model cannot work with cpu (#39724)
* Fix int4 quantized model cannot work with cpu

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update the comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-07 15:24:00 +00:00
43d3b1931a Update expected output values after #39885 (part 1) (#39990)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 16:00:28 +02:00
d5a0809707 Fix consistency (#39995)
* modular

* fix
2025-08-07 15:52:40 +02:00
b347e93567 [typing] Fix return typehint for decoder and inv_freq annotation (#39610)
* fix return typehint for decoder and annotate inv_freq

* fix modular

* Fix consistency

* Move annotation on class level

* missing annotations

* add comment
2025-08-07 14:10:22 +01:00
7188e2e28c Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu (#39967)
Bump transformers in /examples/tensorflow/language-modeling-tpu

Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.53.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.48.0...v4.53.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.53.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-07 12:13:48 +01:00
2b19a06692 Fix gemma3n feature extractor's incorrect squeeze (#39919)
* fix gemma3n squeeze

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-07 18:34:28 +08:00
555cbf5917 [Idefics] fix device mismatch (#39981)
fix
2025-08-07 11:12:04 +02:00
597ed1a11d Various test fixes for AMD (#39978)
* Add amd expectation in internvl

* Add amd expectation to llama

* Added bnb decorator for a llava test that requires bnb

* Added amd expectation for mistral3

* Style
2025-08-07 10:57:04 +02:00
6121e9e46c Support input_embeds in torch exportable decoders (#39836)
* Support input_embeds in torch exportable decoders

* Hybrid cache update

* Manually change some callsites

* AI changes the rest of the call sites

* Make either input_ids/inputs_embeds mandatory

* Clean up

* Ruff check --fix

* Fix test

* pr review

* Revert config/generation_config changes

* Ruff check
2025-08-07 08:51:31 +00:00
cdeaad96b7 [superglue] Fixed the way batch mask was applied to the scores before match assignment computation (#39968)
fix: mask filling to score was wrong
2025-08-07 09:49:39 +01:00
2593932f10 Gemma3 fixes (#39960)
* Fix multiple devices issue

* Added expectations for rocm 9.4

* Ruff
2025-08-07 09:57:21 +02:00
513f76853b Modular fix: remove the model name in find_file_type (#39897)
* remove the model name in the class name

* add comment
2025-08-06 23:31:07 +00:00
743bb5f52e chore: update Deformable_Detr model card (#39902)
* chore: update Deformable_Detr model card

* fix: added pipeline, automodel examples and checkpoints link

* Update deformable_detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-06 12:45:14 -07:00
ac0b468465 [bugfix] fix flash_attention_2 unavailable error on Ascend NPU (#39844) 2025-08-06 17:48:52 +00:00
cf243a1bf8 Fix fix_and_overwrite mode of utils/check_docstring.py (#39369)
* bug in fix mode of check_docstring
2025-08-06 19:37:25 +02:00
6902ffa505 remove triton_kernels dep with kernels instead (#39926)
* remove dep

* style

* rm import

* fix

* style

* simplify

* style
2025-08-06 19:31:20 +02:00
cb2e0df2ec [image processor] fix glm4v (#39964)
* fix glm4v image process

* Update src/transformers/models/glm4v/image_processing_glm4v.py

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-06 17:46:58 +01:00
9ab75fc428 fix typo (#39936)
* fix typo

* fix modular instead

* fix

---------

Co-authored-by: y.korobko <y.korobko@tbank.ru>
2025-08-06 16:21:24 +00:00
43b3f58875 Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts (#39959)
* Fix grammatical error: expert_hitted -> expert_hit in MoE implementations

* Fix grammatical error: hitted_experts -> hit_experts in MoE implementation
2025-08-06 15:45:19 +00:00
dff6185d61 docs: fix typo in 'quantization-aware training' (#39904) 2025-08-06 14:52:43 +00:00
c7844c7a8e Enable gpt-oss mxfp4 on older hardware (sm75+) (#39940)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-06 13:39:21 +00:00
dd70a8cb9d Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)
* Fix MXFP4 quantizer validation to enable CPU dequantization

Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.

* Add tests for MXFP4 quantizer CPU dequantization validation

* fix: format mxfp4 test file with ruff
2025-08-06 15:20:41 +02:00
82eb67e62a [docs] ko toc fix (#39927) 2025-08-06 10:12:34 +00:00
9e76a6bb54 circleci: pin torch 2.7.1 until torchcodec is updated (#39951)
circleci torch 2.7.1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-06 11:18:00 +02:00
910b319357 Fix CI: Tests failing on CPU due to torch.device('cpu').index being None (#39933)
replace routing_weights.device.index with a
2025-08-06 10:22:43 +02:00
369c99d0ce Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com) (#39918)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-05 21:52:20 +02:00
b771e476a8 [CI] post-GptOss fixes for green CI (#39929) 2025-08-05 20:04:59 +02:00
eb6e26acf3 Dev version 2025-08-05 18:09:30 +02:00
c54203a32e gpt_oss last chat template changes (#39925)
Last chat template changes
2025-08-05 18:08:08 +02:00
7c38d8fc23 Add GPT OSS model from OpenAI (#39923)
* fix

* nice

* where i am at

* Bro this works

* Update src/transformers/integrations/tensor_parallel.py

* cleanups

* yups that was breaking

* Update src/transformers/models/openai_moe/modeling_openai_moe.py

* gather on experts and not mlp

* add changes for latest convert branch

* adds options to get output_router_logits from config

* bring chat temlate + special tokens back into the script.

* initial commmit

* update

* working with shards

* add model.safetensors.index.json

* fix

* fix

* mxfp4 flag

* rm print

* Fix PAD/EOS/BOS (#18)

* fix pad/eos/bos

* base model maybe one day

* add some doc

* special tokens based on harmony.

* add in tokenizer config as well.

* prepare for rebase with main

* Fix for initialize_tensor_parallelism  now returning 4-tuple

```
[rank0]:   File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]:     model = AutoModelForCausalLM.from_pretrained(
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]:     tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```

* mxfp4

* mxfp4 draft

* fix

* fix import

* draft

* draft impl

* finally working !

* simplify

* add import

* working version

* consider blocks and scales

* device mesh fix

* initial commit

* add working dequant + quant logic

* update

* non nan, gibberish output

* working EP + quantization finally !

* start cleaning

* remove reversing process

* style

* some cleaning

* initial commmit

* more cleaning

* more cleaning

* simplify

* more cleaning

* rm duplicated function

* changing tp_plan

* update tp plan check

* add loading attribute

* dequantizing logic

* use subfunctions

* import cleaning

* update_param_name

* adds clamped swiglu

* add clamping to training path

* simplify dequant logic

* update

* Bad merge

* more simplifications & tests

* fix !

* fix registering custom attention

* fix order

* fixes

* some test nits

* nits

* nit

* fix

* Clamp sink logits

* Clean

* Soft-max trick

* Clean up

* p

* fix deepspeed

* update both modeling and modular for cleanup

* contiguous

* update tests

* fix top_k router call

* revert renaming

* test nits

* small fixes for EP

* fix path for our local tests

* update as I should not have broken that!

* fix the loss of mixtral

* revert part of the changes related to router_scores, kernel probably no ready for that!

* deleting a small nit

* update arch

* fix post processing

* update

* running version but not expected output

* moving to cuda

* initial commit

* revert

* erroring when loading on cpu

* updates

* del blocks, scales

* fix

* style

* rm comm

* comment

* add comment

* style

* remove duplicated lines

* Fix minor issue with weight_map conversion script

* fix sampling params

* rename to final name

* upate pre-final version of template

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* fix batched inference

* serve fixes

* swizzle !

* update final chat template by Matt.

* fix responses; pin oai

* sinplify

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* fix

* Use ROCm kernels from HUB

* Make kernel modes explicit

* update final chat template by Matt. x2

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Fix installation

* Update setup.py

Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>

* allow no content

* fix: update message handling in write_tokenizer function

* Fix template logic for user message role

* last nits for CB and flash_paged!

* there was one bad merge

* fix CB (hardcode for now, its just using kv groups instead)

* fix

* better fix for device_map

* minor device fix

* Fix flash paged

* updates

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* fix merge

* fix

* Fix line break when custom model indentity

* nits testing

* to locals first and pass sliding window to flash paged

* register modes for MegaBlocksMoeMlp

* add integration test in fixtures -> now update the tests to use it!

* update integration tests

* initial fix

* style and update tests

* fix

* chore(gpt oss): remove mlp_bias from configuration

It was just a leftover.

* stats

* Integration tests

* whoops

* Shouldn't move model

* Ensure assistant messages without thinking always go to "final" channel

* More checks to ensure expected format

* Add pad_token_id to model configuration in write_model function (#51)

* Add oai fix fast tests (#59)

* Fix some fast tests

* Force some updates

* Remove unnecessary fixes

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* reasoning -> Reasoning

* Add additional integration tests

* fixup

* Slight fixes

* align chat template with harmony

* simplify

* Add comment

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* Revert fixup

* skip 2 test remove todo

* merge

* padding side should be left for integration tests

* fix modular wrt to changes made to modeling

* style

* isort

* fix opies for the loss

* mmmm

---------

Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-05 18:02:18 +02:00
738c1a3899 🌐 [i18n-KO] Translated cache_explanation.md to Korean (#39535)
* update: _toctree.yml

* docs: ko: cache_explanation.md

* feat: nmt draft

* fix: apply yijun-lee's comments

* fix: apply 4N3MONE's comments

* docs: update cache_position

* docs: update cache-storage-implementation

* update: add h2 tag in cache-position

---------

Co-authored-by: taehyeonjeon <xogus294@gmail.com>
2025-08-05 08:20:13 -07:00
d2ae766836 Export SmolvLM (#39614)
Export SmolVLM for ExecuTorch
2025-08-05 16:20:23 +02:00
c430047602 [docs] update object detection guide (#39909)
* Update object_detection.md

* Update object_detection.md
2025-08-05 14:07:21 +00:00
dedcbd6e3d run model debugging with forward arg (#39905)
* run model debugging a lot simpler

* fixup

* Update src/transformers/utils/generic.py

* fixup

* mode syle?

* guard a bit
2025-08-05 15:46:19 +02:00
20ce210ab7 Revert "remove dtensors, not explicit (#39840)" (#39912)
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* style?
2025-08-05 15:12:14 +02:00
2589a52c5c Fix aria tests (#39879)
* fix aria tests

* awful bug

* fix copies

* fix tests

* fix style

* revert this
2025-08-05 13:48:47 +02:00
6e4a9a5b43 Fix eval thread fork bomb (#39717) 2025-08-05 10:50:32 +00:00
98a3c49135 Replace video_fps with fps in tests (#39898)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-05 10:39:55 +00:00
1af1071081 Fix misleading WandB error when WANDB_DISABLED is set (#39891)
When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment,
the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb."

This was confusing because wandb was actually installed, just disabled via the environment variable.

The fix detects this specific case and provides a clear, actionable error message explaining
the conflict and how to resolve it.
2025-08-05 10:18:18 +00:00
78ef84921b Avoid aliasing in cond's branches for torch 2.8 (#39488)
Avoid alaising in cond's branches

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 11:18:11 +02:00
9e676e6a0e [qwen] remove unnecessary CUDA sync in qwen2_5_vl (#39870)
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-05 08:54:16 +00:00
392be3b282 fix test_working_of_tp failure of accelerate ut (#39828)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 08:52:57 +00:00
cc5de36454 [Exaone4] Fixes the attn implementation! (#39906)
* fix

* fix config
2025-08-05 09:29:16 +02:00
00d47757bf Reorder serving docs (#39634)
* Slight reorg

* LLMs + draft VLMs

* Actual VLM examples

* Initial responses

* Reorder

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/tiny_agents.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/open_webui.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/cursor.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Responses API

* Address Pedro's comments

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-08-05 08:43:06 +02:00
8c4ea670dc chore: update DETR model card (#39822)
* Update model card for DETR

* fix: applied suggested changes

* fix: simplified pipeline and modified notes and resources

* Update detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 12:25:53 -07:00
0bd91cc822 Add support for ModernBertForMultipleChoice (#39232)
* implement ModernBertForMultipleChoice

* fixup, style, repo consistency

* generate modeling_modernbert

* add tests + docs

* fix test
2025-08-04 20:45:43 +02:00
801e869b67 send some feedback when manually building doc via comment (#39889)
* fix

* fix

* fix

* Update .github/workflows/pr_build_doc_with_comment.yml

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-04 18:20:48 +00:00
ee7eb2d0b1 Update cohere2 vision test (#39888)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 20:08:18 +02:00
3bafa128dc [DOCS] : Improved mimi model card (#39824)
* [DOCS] : Improved mimi model card

* Removed additional header

* Review: addressed feedback

* Update mimi.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 10:07:06 -07:00
192acc2d0f Fix link to models in README (#39880)
Update README.md
2025-08-04 09:34:41 -07:00
7dca2ff8cf [typing] better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText (#39881)
* Better return type hint for  AutoModelForCausalLM and AutoModelForImageTextToText

* fix imports

* fix
2025-08-04 15:03:53 +00:00
3edd14610e Set torch.backends.cudnn.allow_tf32 = False for CI (#39885)
* fix

* fix

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 16:55:16 +02:00
e3505cd4dc Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor (#39858)
Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor
2025-08-04 16:39:19 +02:00
380b2a0317 Rework add-new-model-like with modular and make test filenames coherent (#39612)
* remove tf/flax

* fix

* style

* Update add_new_model_like.py

* work in progress

* continue

* more cleanup

* simplify and first final version

* fixes -> it works

* add linter checks

* Update add_new_model_like.py

* fix

* add modular conversion at the end

* Update add_new_model_like.py

* add video processor

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* fix

* Update image_processing_auto.py

* Update image_processing_auto.py

* fix post rebase

* start test filenames replacement

* rename all test_processor -> test_processing

* fix copied from

* add docstrings

* Update add_new_model_like.py

* fix regex

* improve wording

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* start adding test

* fix

* fix

* proper first test

* tests

* fix

* fix

* fix

* fix

* modular can be used from anywhere

* protect import

* fix

* Update add_new_model_like.py

* fix
2025-08-04 14:41:09 +02:00
5fb5b6cfaf Fix quant docker for fp-quant (#39641)
* fix quant docker

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-04 11:57:08 +00:00
16d6faef9a [core] Fix attn_implementation setter with missing sub_configs (#39855)
* fix

* add sub_configs

* remove case for attention setter

* fix None

* Add test

* Fix sub-configs

* fix tests_config

* fix consistency

* fix fsmt

* fix
2025-08-04 11:35:09 +01:00
2a9febd632 Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494)
* added code for handling video object ,as dictionary of frames and metadata, in chat template

* added new test where videos are passed as objects (dict of frames, metadata) in the chat template

* modified hardcoded video_len check that does not match with increased number of tests cases.

* Modify hardcoded video_len check that fails with increased number of tests

* update documentation of multi-modal chat templating with extra information about including video object in chat template.

* add array handling in load_video()

* temporary test video inlcuded

* skip testing smolvlm with videos that are list of frames

* update documentation & make fixup

* Address review comments
2025-08-04 11:49:42 +02:00
0d511f7a77 Use comment to build doc on PRs (#39846)
* try

* try

* try

* try

* try

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 10:24:45 +02:00
4819adbbaa Refactor label name handling for PEFT models in Trainer class (#39265)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-04 06:29:57 +00:00
166fcad3f8 Improve is_wandb_available function to verify WandB installation (#39875)
Improve `is_wandb_available` function to verify WandB installation by checking for a key attribute
2025-08-04 08:22:52 +02:00
6dfd561d9c remove dtensors, not explicit (#39840)
* remove dtensors, not explicit

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>

* style

* fix test

* update

* as we broke saving try to fix

* output layouts should exit

* nit

* devicemesh exists if it was distributed

* use _device_mesh of self

* update

* lol

* fix

* nit

* update

* fix!

* this???

* grumble grumble

* ?

* fuck me

---------

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>
2025-08-01 22:02:47 +02:00
b727c2b20e Allow TrackioCallback to work when pynvml is not installed (#39851)
Allow TrackioCallback to work when pynvml is not installed
2025-08-01 18:57:25 +02:00
1ec0feccdd [image-processing] deprecate plot_keypoint_matching, make visualize_keypoint_matching as a standard (#39830)
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models

* refactor: added copied from

* fix: make style

* fix: repo consistency

* fix: make style

* docs: added missing method in SuperGlue docs
2025-08-01 16:29:57 +00:00
7b4d9843ba Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739)
* add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid

* fix after review
2025-08-01 12:20:08 -04:00
88ead3f518 Fix responses add tests (#39848)
* Quick responses fix

* [serve] Fix responses API and add tests

* Remove typo

* Remove typo

* Tests
2025-08-01 18:06:08 +02:00
6ea646a03a Update ux cb (#39845)
* clenaup

* nits

* updates

* fix logging

* push updates?

* just passexception

* update

* nits

* fix

* add tokencount

* style
2025-08-01 16:50:28 +02:00
3951d4ad5d Add MM Grounding DINO (#37925)
* first commit

Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.

TODO: Some tests are failing so that needs to be fixed.

* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model

* cleaned up a hack in the conversion script

* Fixed the expected values in integration tests

Cross att masking and cpu-gpu consistency tests are still failing however.

* changes for make style and quality

* add documentation

* clean up contrastive embedding

* add mm grounding dino to loss mapping

* add model link to config docstring

* hack fix for mm grounding dino consistency tests

* add special cases for unused config attr check

* add all models and update docs

* update model doc to the new style

* Use super_kwargs for modular config

* Move init to the _init_weights function

* Add copied from for tests

* fixup

* update typehints

* Fix-copies for tests

* fix-copies

* Fix init test

* fix snippets in docs

* fix consistency

* fix consistency

* update conversion script

* fix nits in readme and remove old comments from conversion script

* add license

* remove unused config args

* remove unnecessary if/else in model init

* fix quality

* Update references

* fix test

* fixup

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-01 15:43:23 +01:00
50145474b7 [typecheck] proper export of private symbols (#39729)
* Export private symbols

Signed-off-by: cyy <cyyever@outlook.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix format

Signed-off-by: cyy <cyyever@outlook.com>

* Add a comment for exported symbols

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-01 13:36:47 +01:00
c962f1515e [attn_implementation] remove recursive, allows custom kernels with wrappers (#39823)
* fix?

* fixme and style

* Update src/transformers/modeling_utils.py

* update

* update

* fix

* small fixees

* nit

* nits

* fix init check?

* fix

* fix default

* or fucks me

* nits

* include a small nit

* does this make it hapy?

* fixup

* fix the remaining ones
2025-08-01 12:18:28 +02:00
d3b8627b56 [VLMs] split out "get placeholder mask" to helper (#39777)
* batch upidate all models

* update

* forgot about llava onevision

* update

* fix tests

* delete file

* typo

* fix emu3 once and forever

* update cohere2 vision as well
2025-08-01 08:01:06 +00:00
a115b67392 Fix tp cb (#39838)
* fixes

* one more
2025-08-01 09:59:04 +02:00
2c0af41ce5 Fix bad markdown links (#39819)
Fix bad markdown links.
2025-07-31 09:14:14 -07:00
4fcf455517 Fix broken links (#39809)
Replace links in the form of `[text]((url))` to `[text](url)`. This is
the correct format of a url in the markdown.
2025-07-31 13:23:04 +00:00
b937d47455 [cohere2 vision] move doc to multimodal section (#39820)
move doc to multimodal section
2025-07-31 15:13:02 +02:00
6ba8a1ff45 Update documentation for Cohere2Vision models (#39817)
* Update docs with pipeline example

* Add Cohere2Vision to list of vision models

* Sort models
2025-07-31 11:58:45 +00:00
e1688d28d3 [Model] Cohere2 Vision (#39810)
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025

* update and add modualr file

* update processors and check with orig impl later

* delete unused files

* image processor reduce LOC and re-use GotOCR2

* update the config to use modular

* model tests pass

* processor fixes

* check model outputs decorator

* address one more comment

* Update tokens. Temp - need to read from tokenizer'

* fix for multi-gpu

* Fix image token handling

* upadte image token expansion logic

* fix a few issues with remote code loading

* not related but modular forces us to change all files now

* Add overview and code sample to cohere vision docs

* add scripts. TMP.

* Update inference script

* Create script

* set dtype in export script

* TO revert: modular export fix

* Fix scripts

* Revert "TO revert: modular export fix"

This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.

* Use modular weights

* Upload to hub

Removed OOD weights ad script

* Updated docs

* fix import error

Update docs

Added pipeline test

* Updated docs

* Run modular script

remove modular for config

Added patch_size

Added docstrings in modular

Fix OOM

Add docs, fixup integration tests. 8-gpu passing

* tiny updates

* address comments + fixup

* add test for chat template

* check model outputs workaround

* aya vision fix check model inputs

* Revert "add test for chat template"

This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.

* reveert more changes

* last revert

* skip and merge

* faulty copy from

---------

Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
2025-07-31 10:57:34 +00:00
6c3f27ba61 [docs] fix korean docs yet again (#39813)
fix korean docs yet again
2025-07-31 09:13:25 +00:00
cb289ad243 feat(tokenization): add encode_message to tokenize messages one by one (#39507)
* feat(tokenization): add encode_message to tokenize messages one by one

* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.

* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.

* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.

* Docs fix

* Revert changes in docstring of pad()

* Revert changes in docstring

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.

* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.

---------

Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-31 10:55:45 +02:00
4f93cc9174 fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test (#39300)
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous

* test cache_position

* move test

* propagate changes

---------

Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
2025-07-30 17:30:28 +00:00
9b3203f47b Add callback to monitor progress in whisper transcription (#37483)
* Add callback to monitor progress in whisper transcription

* Added `` around variables, rewording

* Add example of `monitor_progress`.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
2025-07-30 17:40:53 +02:00
7abb5d3992 Update mT5 model card (#39702)
* Update mt5 model card

* Fix casing of model title

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:35:04 -07:00
1019b00028 Update model card for Cohere2 (Command R7B) (#39604)
* Update model card for Cohere2 (Command R7B)

* fix: applied suggested changes
2025-07-30 08:34:26 -07:00
ecbb5ee194 standardized BARThez model card (#39701)
* standardized barthez model card according to template

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* suggested changes to barthez model card

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:33:13 -07:00
8e077a3e45 Fix re-compilations for cross attention cache (#39788)
fix recompilations for cross attn cache
2025-07-30 14:52:03 +02:00
1e0665a191 Simplify conditional code (#39781)
* Use !=

Signed-off-by: cyy <cyyever@outlook.com>

* Use get

Signed-off-by: cyy <cyyever@outlook.com>

* Format

* Simplify bool operations

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:32:10 +00:00
b94929eb49 Fix an invalid condition (#39762)
Fix an invalid judgement

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:19:17 +00:00
bb2ac66453 fix chameleonvision UT failure (#39646)
* fix chameleonvision UT failure

Signed-off-by: matrix.yao@intel.com <Yao Matrix>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: matrix.yao@intel.com <Yao Matrix>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: root <Yao Matrix>
2025-07-30 12:09:26 +00:00
5348445dfa Super tiny update (#39727)
super tiny update
2025-07-30 12:21:41 +02:00
54cbea5615 more info in model_results.json (#39783)
more info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-30 11:43:10 +02:00
01d5f94695 [ASR pipline] fix with datasets 4.0 (#39504)
* fix

* handle edge case

* make
2025-07-30 08:13:40 +00:00
8ab21be570 enable static cache on vision encoder decoder (#39773)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-30 08:10:46 +00:00
67cfe11528 Fix Evolla and xLSTM tests (#39769)
* fix all evolla

* xlstm
2025-07-30 09:51:55 +02:00
ec4033457e Don't set run_name when none (#39695)
* Don't set run_name when none

* revert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-30 01:39:29 +00:00
551a89a4a3 Standardize CLAP model card format (#39738)
* Standardize CLAP model card format

* Apply review feedback

* Remove Resources section
2025-07-29 14:13:04 -07:00
da70b1389a docs: Update EfficientLoFTR documentation (#39620)
* docs: Update EfficientLoFTR documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-29 13:54:44 -07:00
ddd2100767 Fix OmDet test after arg deprecation (#39766)
fix arg name
2025-07-29 22:10:36 +02:00
4abb053b6c Remove python3.7 reference from doc link (#39706) 2025-07-29 09:17:13 -07:00
33aa49df9d [docs] Ko doc fixes after toc update (#39660)
* update docs

* doc builder working

* make fixup
2025-07-29 17:05:26 +01:00
c4e2069898 Fix Cache.max_cache_len max value for Hybrid models (#39737)
* fix gemma

* fix min

* fix quant init issue

* fix gemma 3n

* skip quant cache test

* fix modular

* new test for Gemma

* include cyril change

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-29 17:12:50 +02:00
075dbbceaa fix(trainer): Correct loss scaling for incomplete gradient accumulation steps (#39659)
* Fix issue[#38837]: wrong loss scaled in last step of epoch

* chore: trigger CI

* Update src/transformers/trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/modeling_flash_attention_utils.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: taihang <taihang@U-2RHYVWX7-2207.local>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-07-29 17:12:31 +02:00
1d061536cf 🌐 [i18n-KO] Translated how_to_hack_models.md to Korean (#39536)
* docs: ko: how_to_hack_models.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:09:16 -07:00
43fe41c0a8 🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean (#39552)
* docs: ko: perf_train_gpu_one.md

* feat: nmt draft

* fix: manual edits

* fix: Manually added missing backticks

* Update docs/source/ko/perf_train_gpu_one.md

fix: remove space between heading and GPU anchor

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: clarify table headers to indicate training speed boost and memory savings

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix : rephrase explanation of data preloading to improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-29 08:08:57 -07:00
9f38763731 🌐 [i18n-KO] Translated pipeline_gradio.md to Korean (#39520)
* docs: ko: pipeline_gradio.md

* feat: nmt draft

* fix: manual edits

* docs: ko: pipeline_gradio.md
2025-07-29 08:04:30 -07:00
f72311796b 🌐 [i18n-KO] Translated tokenizer.md to Korean (#39532)
* docs: ko: tokenizer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com>

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-29 08:04:14 -07:00
d346d46752 🌐 [i18n-KO] Translated tvp.md to Korean (#39578)
* docs: ko: tvp.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
2025-07-29 08:04:00 -07:00
2f59c15b33 🌐 [i18n-KO] Translated albert.md to Korean (#39524)
* docs: ko: albert.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:03:40 -07:00
98386dcee9 🌐 [i18n-KO] Translated main_classes/peft.md (#39515)
* docs: ko: main_classes/peft.md

* feat: nmt draft

* docs: add missing TOC to documentation for `PeftAdapterMixin` section

Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).

* fix: Improve naturalness of purpose expression in Korean

Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.

* fix: Simplify plural form and make expression more concise

Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.

* fix: Replace technical term '주입' with more natural '적용'

Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:

'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise

'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.

* fix: update toctree path for PEFT to lowercase

Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.

* docs: update as per reviewer feedback after rebase
2025-07-29 08:03:17 -07:00
1ad216bd7d [modenbert] fix regression (#39750)
* fix regression

* add FA2 test
2025-07-29 16:58:59 +02:00
379209b603 add libcst to extras["testing"] in setup.py (#39761)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 16:58:51 +02:00
abf101af1f Fix version issue in modeling_utils.py (#39759)
fix version issue
2025-07-29 16:15:30 +02:00
8db4d79161 Enable xpu allocator on caching_allocator_warmup (#39654)
* add xpu allocator

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix variable name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm useless default value

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-29 16:06:52 +02:00
fb141e2c90 Support loading Qwen3 MoE GGUF (#39638)
* support loading qwen3 gguf

* qwen3moe test cases

* fix whitespaces

* fix ggml tests
2025-07-29 13:44:44 +00:00
ccb2e0e03b Fix GPT2 with cross attention (#39754)
* fix

* use new mask API

* style

* fix copies and attention tests

* fix head pruning tests
2025-07-29 15:40:31 +02:00
dfd616e658 Avoid OOM when other tests are failing (#39758)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 15:35:44 +02:00
65df73aa88 AMD disable torchcodec (#39757)
Temporarily disable torchcodec installation because of bizarre segfault
2025-07-29 13:07:25 +00:00
63b3200779 Use --gpus all in workflow files (#39752)
gpu all

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 14:53:33 +02:00
95faabf0a6 Apply several ruff SIM rules (#37283)
* Apply ruff SIM118 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM910 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM101 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Format code

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-29 11:40:34 +00:00
cf97f6cfd1 Fix mamba regression (#39728)
* fix mamba regression

* fix compile test
2025-07-29 12:44:28 +02:00
66984ed4f6 Update IMPORTANT_MODELS list (#39734) 2025-07-29 12:34:57 +02:00
de8d0cec30 update GemmaIntegrationTest::test_model_2b_bf16_dola again (#39731)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 11:42:55 +02:00
85d5aeb324 Fix: add back base model plan (#39733)
* Fix: add back base model plan

* Fix: typo

* fixup

* remove unused import

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-29 11:37:33 +02:00
2a90193dd8 [Fix] import two missing typos in models/__init__.py for typo checking (#39745)
* [Fix] import lost gemma3n for type checking in vscode

* [Fix] import missing qwen2_5_omni typo

* [Refactor] sort by ascii order
2025-07-29 11:35:22 +02:00
f2aca3eccc fix cache inheritance (#39748)
* fix cache inheritance

* styule
2025-07-29 11:24:44 +02:00
f3598a95c7 extend more trainer test cases to XPU, all pass (#39652)
extend more trainer test cases to XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-07-29 10:51:00 +02:00
75794792ad BLIPs clean-up (#35560)
* blips clean up

* update processor

* readability

* fix processor length

* fix copies

* tmp

* update and fix copies

* why keep these, delete?

* fix test fetcher

* irrelevant comment

* fix tests

* fix tests

* fix copies
2025-07-29 10:03:06 +02:00
4f8f51be4e Add Fast Segformer Processor (#37024)
* Add Fast Segformer Processor

* Modified the params according to segformer model

* modified test_image_processing_Segformer_fast args

- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class

* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast

* fixed code_quality

* added recommended fixes and tests to make sure everything processess smoothly

* Fixed SegmentationMapsLogic

- modified the preprocessing of segmentation maps to use tensors
- added batch support

* fixed some mismatched files

* modified the tolerance for tests

* use modular

* fix ci

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 19:22:32 +00:00
c353f2bb5e Superpoint fast image processor (#37804)
* feat: superpoint fast image processor

* fix: reran fast cli command to generate fast config

* feat: updated test cases

* fix: removed old model add

* fix: format fix

* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix: ported to torch and made requested changes

* fix: removed changes to init

* fix: init fix

* fix: init format fix

* fixed testcases and ported to torch

* fix: format fixes

* failed
test case fix

* fix superpoint fast

* fix docstring

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 18:15:06 +00:00
14adcbd937 Fix AMD dockerfile for audio models (#39669) 2025-07-28 19:05:41 +02:00
1c6b47451d Fix cache-related tests (#39676)
* fix

* fix kyutai at last

* fix unrelated tests and copies

* update musicgen as well

* revert tensor

* fix old test failures

* why it wasn't added?
2025-07-28 17:30:11 +02:00
fc2bd1eac0 Fix Layer device placement in Caches (#39732)
* fix device placement

* style

* typo in comment
2025-07-28 16:37:11 +02:00
7623aa3e5f Fix Qwen2AudioForConditionalGeneration.forward() and test_flash_attn_kernels_inference_equivalence (#39503)
* Add missing cache_position argument.

* Pass cache_position to language model.

* Overwrite prepare_inputs_for_generation.

* Set model to half precision for Flash Attention test.

* Cast model to bfloat16.
2025-07-28 16:35:08 +02:00
28f2619868 skip Glm4MoeModelTest::test_torch_compile_for_training (#39670)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:30:40 +02:00
88aed92b59 Update QAPipelineTests::test_large_model_course after #39193 (#39666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:26:49 +02:00
da823fc04e mllama outputs refactor (#39643)
* mllama outputs refactor

* forgot kwargs

* fix output

* add can_record_outputs

* correct @check_model_inputs placement

* ruff and copies

* rebase

* feedback

* only return hidden_states

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
2025-07-28 15:59:20 +02:00
686bb3b098 Remove all expired deprecation cycles (#39725)
* remove all deprecation cycles

* style

* fix

* remove

* remove

* fix

* Update modular_dpt.py

* back

* typo

* typo

* final fix

* remove all args
2025-07-28 15:43:41 +02:00
a0fa500a3d [CI] Add Eric to comment slow ci (#39601)
add to ci
2025-07-28 13:24:00 +00:00
4c7da9fedf PATCH: add back n-dim device-mesh + fix tp trainer saving (#39693)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

* Fix: saving of distributed model in trainer

* Fix: distributed saving with trainer

* Feat: add pure tp saving

* Only require tp dim if ndim > 1

* Fix: default to None

* Fix: better comments/errors

* Fix: properly check tp_size attribute

* Fix: properly check for None in tp_size

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-28 12:29:58 +00:00
cbede2969b Add self-hosted runner scale set workflow for mi325 CI (#39651) 2025-07-28 13:32:25 +02:00
b56d721397 [configuration] remove redundant classmethod (#38812)
* remove redundant classmethod

* warning message, add space between words

* fix tests

* fix copies
2025-07-28 10:38:48 +00:00
02ea23cbde update ernie model card (#39657)
* update ernie model doc

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address ruff format error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address check_repository_consistency error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

---------

Signed-off-by: Zhang Jun <jzhang533@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-28 10:21:18 +00:00
8b237b8639 [processors] add tests for helper fn (#39629)
* add tests for helpers

* duplicate test for each model

* why llava next video has no helper

* oops must have been in the commit

* fix test after rebase

* add copy from
2025-07-28 09:41:58 +00:00
6638b3642d xpu optimization for generation case (#39573)
* xpu optimization for generation case

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-28 11:34:58 +02:00
5c15eb55d2 fix(tokenization): check token.content for trie (#39587)
fix: check token.content for trie
2025-07-28 11:28:56 +02:00
6a61e16626 Fix missing initialization of FastSpeech2Conformer (#39689)
* fix missing initialization of FastSpeech2Conformer

* switch order and reactivate tests

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-28 10:47:39 +02:00
a6393e7d28 fix missing model._tp_size from ep refactor (#39688)
* fix missing model._tp_size from ep refactor

* restore setting device_mesh too
2025-07-26 12:26:36 +02:00
18a7c29ff8 More robust tied weight test (#39681)
* Update test_modeling_common.py

* remove old ones

* Update test_modeling_common.py

* Update test_modeling_common.py

* add

* Update test_modeling_musicgen_melody.py
2025-07-25 22:03:21 +02:00
c3401d6fad dev version 4.55 2025-07-25 21:11:20 +02:00
97f8c71f52 Add padding-free to Granite hybrid moe models (#39677)
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
2025-07-25 20:10:50 +02:00
d6e9f71a6e Fix tied weight test (#39680)
Update test_modeling_common.py
2025-07-25 20:09:33 +02:00
5da6ad2731 fix break for ckpt without _tp_plan (#39658)
* fix break for ckpt without _tp_plan

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

---------

Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 20:03:48 +02:00
c06d4cd6ce Add EXAONE 4.0 model (#39129)
* Add EXAONE 4.0 model

* Refactor EXAONE 4.0 modeling code

* Fix cache slicing on SWA + FA2

* Fix cache slicing on FA2 + HybridCache

* Update EXAONE 4.0 modeling code for main branch

* Update o_proj for asymmetric projection

* Address PR feedback

* Add EXAONE 4.0 docs

* Update EXAONE 4.0 modeling code for main branch

* update

* fix updates

* updates

* fix

* fix

* fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 19:58:28 +02:00
3e4d584a5b Support typing.Literal as type of tool parameters or return value (#39633)
* support `typing.Literal` as type of tool parameters

* validate the `args` of `typing.Literal` roughly

* add test to get json schema for `typing.Literal` type hint

* fix: add `"type"` attribute to the parsed result of `typing.Literal`

* test: add argument `booleanish` to test multi-type literal

* style: auto fixup
2025-07-25 17:51:28 +00:00
300d42a43e Add ep (#39501)
* EP + updates

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>

* remove unrelated change

* not working yet but let's see where it goes!

* update the api a bit

* udpate

* where I am at for now

* fix ep

* refactor the API

* yups

* fix

* fixup

* clean modeling

* just support llama4 for now!

* properly avoid

* fix

* nits

* Update src/transformers/models/llama4/modeling_llama4.py

* Update src/transformers/integrations/tensor_parallel.py

* style

* ,,,,

* update

---------

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
2025-07-25 19:46:17 +02:00
abaa043d60 bad_words_ids no longer slow on mps (#39556)
* fix: bad_words_ids no longer slow on mps

* fix: SequenceBiasLogitsProcessor slow `_prepare_bias_variables` method

* fix: re-adding a deleted comment

* fix: bug in no_bad_words_logits

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 19:45:41 +02:00
6630c5b714 Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.

* Fix style.

* Fix modeling test.

* Make xLSTM package optional.

* Fix: Update torch version check.

* Fix: Bad variable naming in test.

* Fix: Import structure cleaning with Ruff.

* Fix: Update docstrings.

* Fix: Mitigate unused config attr tests by explicit usage.

* Fix: Skip tests, if xlstm library is not installed.

* Feat: Enable longer context window for inference by chunking.

* Fix: Make training test pass by lowering target accuracy.

* Chore: Increase test verbosity for failing generation test.

* Update docs/source/en/model_doc/xlstm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix: Make xlstm available even without CUDA.

* Chore: Remove unnecessary import.

* Fix: Remove BOS insertion.

* Chore: Improve xLSTMCache documentation.

* Integrate basic xLSTM fallback code.

* Chore: Remove unnecessary import.

* Chore: Remove duplicate LayerNorm.

* chore: update copyright, minor reformatting

* fix: refactor mLSTMStateType due to missing torch import

* fix: add missing import

* Chore: Replace einops.

* fix: apply ruff formatting

* fix: run `make fix-copies` to re-generate dummy_pt_objects.py

* fix: make type hints Python 3.9 compatible

* fix: remove obsolete import

* fix: remove obsolete method from docs

* chore: remove obsolete `force_bos_token_insert` from config

* Chore: Remove duplicated xLSTMCache class.

* Fix: Formatting of modeling_xlstm.py

* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.

* Fix: Update xLSTMCache docstring.

* Feat: Add proper initialization of xLSTM.

* Chore: Re-format files.

* Chore: Adapt format.

* Fix: xLSTMCache import restructuring.

* Fix: Add __all__ lists to modeling and configuration files.

* Chore: Reformat.

* Fix: Remove unnecessary update_rnn_state function.

* Fix: Undo test accuracy quickfix.

* Fix: Update copyright year, remvoe config copy.

* Chore: Flatten all internal configs to xLSTMConfig.

* Fix: Unused config variables check.

* Chore: Remove unnecessary imports.

* Fix: Unify xlstm cache argument from batch_size to max_batch_size.

* Chore: Remove bad default arg value for xLSTMCache.

* Chore: Rename core configuration arguments to HF default in xLSTM.

* Chore: Fix formatting.

* Fix: xLSTM Cache config access.

* Fix: Update xlstm tests for config update.

* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.

* Fix: Configuration xLSTM python3.9 syntax.

* Fix: Difference to main in test_utils.py assertion.

* Fix: Bad syntax in xlstm config for python3.9.

* Fix: xLSTMConfig docstring.

* Fix: xLSTMConfig docstring.

* Fix typing issues in xLSTM and BeiT, Paligemma.

* Fix: Exclude xLSTM from test cache utils.

* Chore: Fix style.

* Chore: Fix format.

* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.

* Chore: Remove asserts and replace with ValueErrors.

* Chore: Update __init__.py structure of xLSTM.

* Chore: Clean xLSTM initialization of weights.

* Fix index names in modeling_xlstm.py

* Update xlstm model test typing annotations.

* Fix: Remove all asserts.

* Revert changes to the main __init__.py

* Fix: Move xLSTMCache to modeling_xlstm.py

* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py

* Remove xLSTMCache from dummy_pt_objects.py

* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.

* Revert test_cache_utils.py xLSTM change.

* Fix: Move xLSTM init functions before init call.

* Remove xLSTMCache from generation utils.

* Fix: Clean xLSTM init functionality for recursive calls.

* Fix: Move xLSTMCache before its first call.

* Fix formatting.

* Add partial docstring for xLSTMModel forward.

* Fix xLSTMCache docstring in xLSTMModel.

* Remove xLSTMCache from public documentation. Update auto_docstring.

* Remove all agressive shape comments

* style

* Fix names

* simplify

* remove output_hidden_states

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* Update test_modeling_xlstm.py

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* fix

* fix

* style

* style

---------

Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
ed9a96bc6d Use auto_docstring for perception_lm fast image processor (#39679) 2025-07-25 17:32:48 +00:00
d913b39ef3 fix: HWIO to OIHW (#39200)
* fix: HWIO to OIHW

* Bug in attention type

* Conversion script docstring

* style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-25 19:23:15 +02:00
a26f0fabb8 Fix auto_docstring crashing when dependencies are missing (#39564)
* add try except to not crash auto_docstring when some dependency are missing

* safeguard None value in placeholder dict
2025-07-25 19:19:23 +02:00
69cff312f5 Add support for DeepseekAI's DeepseekVL (#36248)
* upload initial code

* update deepseek-vl adaptor

* update hierarchy of vision model classes

* udpate aligner model

* add text model

* Added Image Processor

* Added Image Processor

* Added Image Processor

* apply masks

* remove projection; add aligner

* remove interpolate_pos_encoding

* remove unused params in config

* cleaning

* Add the __init__ file

* added processing deepseek_vl class

* modified the deepseek-vl processor

* modified the deepseek-vl processor

* update __init__

* Update the image processor class name

* Added Deepseek to src/transformers/__init__.py file

* Added Deepseek to image_processing_auto.py

* update the __init__ file

* update deepseek_vl image processor

* Update Deepseek Processor

* upload fast image processor

* Revert "upload fast image processor"

This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.

* update image processor

* flatten heirarchy

* remove DeepseekVLModel

* major update (complete modeling)

* auto modeling and other files

* formatting

* fix quality

* replace torchvision in modeling

* set default do_normalize to False

* add fast image processor template using tool

* update image processors

* add fast image processor to other files

* update liscense

* Added deepseek image testcases

* update image test

* update processor

* write CHAT_TEMPLATE

* update model for processor

* fix processor

* minor fixes and formatting

* fix image processing and tests

* fix interpolation in sam

* fix output_attentions in DeepseekVLModel

* upload test_modeling

* fix tests because of vocab size

* set use_high_res_vision=False in tests

* fix all modeling tests

* fix styling

* remove explicit background_color from image processors

* added test_processor

* added test_processor

* fix processor tests

* update docs

* update docs

* update docs

* update conversion script

* Fixed typos

* minor fixes from review

- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025

* fix type in config docstring

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* update get_image_features

* fix config

* improve DeepseekVLImageProcessor.preprocess

* return image_hidden_states

* use AutoTokenizer and AutoImageProcessor in Processor

* fix model outputs

* make num_image_tokens configurable

* fix docstring of processor

* move system prompt to chat template

* fix repo consistency

* fix return_dict

* replace SamVisionEncoder with SamVisionModel

* update to remove deepcopy

* 🛠️  Major Architectural Changes (Adds DeepseekVLHybrid)

* fix quality checks

* add missing hybrid in auto modeling

* run make style

* update sam_hq

* update high_res_size in test

* update docs following #36979

* update code with auto_docstring

* update conversion scripts

* fix style

* fix failing test because of tuple

* set weights_only=True in conversion script

* use safetensors.torch.load_file instead of torch.load in conversion script

* make output_dir optional in conversion script

* fix code snippets in docs (now the examples work fine)

* integration tests for DeepseekVL

* update expected texts

* make style

* integration tests for DeepseekVLHybrid

* fix class name

* update expected texts for hybrid

* run "make style"

* update since changes in main

* run make-style

* nits since changes in main

* undo changes in sam

* fix tests

* fix tests; update with main

* update with main: output_attention/output_hidden_states

* fix copied part in deepseek_vl

* run fix-copies

* fix output_hidden_states

* sam: fix _init_weigths

* use modular for DeepseekVL

* make image processor more modular

* modular: use JanusPreTrainedModel

* janus: provide kwargs in loss

* update processors in conversion script

* Revert "sam: fix _init_weigths"

This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.

* run fix-copies

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-07-25 19:18:50 +02:00
a98bbc294c Add missing flag for CacheLayer (#39678)
* fix

* Update cache_utils.py
2025-07-25 19:12:13 +02:00
45c7bfb157 Add evolla rebase main (#36232)
* add evolla

* adding protein encoder part

* add initial processing test

* save processor

* add docstring

* add evolla processor

* add two test

* change vision to protein

* change resampler to sequence_compressor

* change vision to protein

* initial update for llama

* add initial update for llamaForCausalLM

* add `test_processor`, `test_saprot_output`, `test_protein_encoder_output`

* change evolla, but still working on it

* add test_single_forward

* pass test_attention_outputs

* pass test_hidden_states_output

* pass test_save_load and test_from_pretrained_no_checkpoint

* pass test_cpu_offload

* skip some tests

* update new progress

* skip test_model_is_small

* pass test_model_weights_reload_no_missing_tied_weights

* pass test_model_get_set_embeddings

* pass test_cpu_offload

* skip test_resize_embeddings

* add pipeline_model_mapping

* remote old setUp

* pass processor save_pretrained and load_pretrained

* remove pooling layer

* pass test_inputs_embeds_matches_input_ids

* pass test_model_is_small

* pass test_attention_outputs

* pass test_initialization

* pass test_model_get_set_embeddings

* pass test_single_forward

* skip test_disk_offload_bin and test_disk_offload_safetensors

* fix most tests

* pass test_protein_encoder_output

* remove useless code

* add EvollaForProteinText2Text

* pass test_saprot_output

* pass all EvollaModelTest test and remove processor test

* add processor test to its own file

* skip is_training since esm skipped it and the saprot code causes error when setting is_training True

* pass processor tests

* solve all except config

* pass most cases

* change init

* add doc to `configuration_evolla.py`

* remove image_processing test

* remove extra processor test

* remove extra modules

* remove extra modules

* change all configs into one config

* pass all evolla test

* pass `make fixup`

* update short summary

* update Evolla-10B-hf

* pass check_dummies.py and check_code_quality

* fix  `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings`

* remove dummy codes

* change format

* fix llava issue

* update format

* update to solve llama3 access issue

* update to make forward right

* solve processor save load problem from instructblip solution

* remove unexpected file

* skip `test_generation_tester_mixin_inheritance`

* add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning`

* add `modular_evolla.py`

* solved issue #36362

* run `make fixup`

* update modular

* solve float32 training

* add fix

* solve `utils/check_docstrings.py`

* update

* update

* update

* remove other files and replace sequential and einsum

* add use case in document

* update the models

* update model

* change some wrong code

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix issues mentioned in PR

* update style and rearrange the placement

* fix return_dict argument issue

* solve SaProtConfig issue

* Solve EvollaSaProtRotaryEmbedding issue

* solve attention_mask issue

* solve almosst all issues

* make style

* update config

* remove unrelated pickle file

* delete pickle files

* fix config

* simplify a lot

* remove past k-v from encoder

* continue work

* style

* skip it from init

* fix init

* fix init

* simplify more

* fill in docstrings

* change test for generation

* skip test

* fix style

---------

Co-authored-by: Chenchen Han <13980209828@163.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-25 19:11:57 +02:00
2670da66ce update expected outputs for whisper after #38778 (#39304)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-25 16:48:10 +00:00
4b125e2993 fix kyutai tests (#39416)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-07-25 18:42:04 +02:00
4f17bf0572 Fixes the BC (#39636)
* fix

* update

* Update src/transformers/utils/generic.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixup

* fixes

* fix more models

* fix fix fix

* add embedding to more models

* update

* update

* fix

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 18:41:21 +02:00
ddb0546d14 Delete bad rebasing functions (#39672)
* remove outdated stuff

* remove comment

* use register

* remove finally clause (to allow further check if fallback to sdpa)

* general exception

* add wrapper

* revert check

* typo
2025-07-25 18:28:09 +02:00
a91653561e [Ernie 4.5] Post merge adaptations (#39664)
* ernie 4.5 fixes

* Apply style fixes

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 17:36:18 +02:00
5d0ba3e479 [CI] revert device in test_export_static_cache (#39662)
* revert device

* add todo
2025-07-25 15:36:12 +00:00
850bdeaa95 Fix ModernBERT Decoder model (#39671)
fix
2025-07-25 16:20:12 +01:00
17f02102c5 🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor (#39591)
* init

* Force qwen2VL image proc to fast

* refactor qwen2 vl fast

* fix copies

* Update after PR review and update tests to use return_tensors="pt"

* fix processor tests

* add BC for min pixels/max pixels
2025-07-25 11:11:28 -04:00
f90de364c2 Rename huggingface_cli to hf (#39630)
* Rename huggingface_cli to hf

* hfh
2025-07-25 14:10:04 +02:00
3b3f9c0c46 fix(voxtral): correct typo in apply_transcription_request (#39572)
* fix(voxtral): correct typo in apply_transcription_request

* temporary wrapper: apply_transcrition_request

* Update processing_voxtral.py

* style: sort imports in processing_voxtral.py

* docs(voxtral): fix typo in voxtral.md

* make style

* doc update

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-07-25 12:09:44 +00:00
2a82cf06ad make fixup (#39661) 2025-07-25 11:27:45 +00:00
e3760501b0 [docs] fix ko cache docs (#39644)
fix ko docs
2025-07-25 10:06:03 +01:00
91f591f7bc Make pytorch examples UV-compatible (#39635)
* update release.py

* add uv headers in some pytorch examples

* rest of pytorch examples

* style
2025-07-25 10:46:22 +02:00
c46c17db57 revert change to cu_seqlen_k and max_k when preparing from position_ids (#39653) 2025-07-25 10:28:22 +02:00
4600c27c4f Fix: explicit not none check for tensors in flash attention (#39639)
fix: explicit not none check for tensors
2025-07-25 10:09:14 +02:00
c392d47c9b [attention] fix test for packed padfree masking (#39582)
* fix most tests

* skip a few more tests

* address comments

* fix chameleon tests

* forgot to uncomment

* qwen has its own tests with images, rename it as well
2025-07-25 07:44:52 +00:00
565c035a2e Add owlv2 fast processor (#39041)
* add owlv2 fast image processor

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* change references to owlVit to owlv2 in docstrings for post process methods

* change type hints from List, Dict, Tuple to list, dict, tuple

* remove unused typing imports

* add disable grouping argument to group images by shape

* run make quality and repo-consistency

* use modular

* fix auto_docstring

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-25 02:40:11 +00:00
5a81d7e0b3 revert behavior of _prepare_from_posids (#39622)
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
2025-07-24 20:31:00 +02:00
ad6fd2da0e [Voxtral] values for A10 runners (#39605)
* values for A10 runners

* make

* as for Llava

* does not apply to Voxtral
2025-07-24 18:52:35 +02:00
4741e1f1b7 [timm] new timm pin (#39640) 2025-07-24 16:01:59 +00:00
12b612830d [efficientloftr] fix model_id in tests (#39621)
fix: wrong EfficientLoFTR model id in tests
2025-07-24 10:41:06 +01:00
947a37e8f5 Update recent processors for vLLM backend (#39583)
* update recent models and make sure it runs withh vLLM

* delete!
2025-07-24 10:29:27 +02:00
7b897fe583 [Docs] Translate audio_classification.md from English to Spanish (#39513)
* Docs: translate audio_classification to Spanish

* Update audio_classification.md

* Remove space
* Normalize backticks

* Update audio_classification.md

* Apply corrections recommended by aaronjimv

* Update _toctree.yml

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 15:55:13 -07:00
9b7244f189 standardized YOLOS model card according to template in #36979 (#39528)
* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections

* removed cli section

* Update yolos.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 11:00:25 -07:00
ec8a09a5fe Feature/standardize opt model card (#39568)
* docs: Standardize OPT model card with enhanced details

* Remove incorrect link from OPT model card

* Address review feedback on OPT model card

* Update opt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 10:57:48 -07:00
c5a80dd6c4 🔴 Fix EnCodec internals and integration tests (#39431)
* EnCodec fixes and update integration tests.

* Apply padding mask when normalize is False.

* Update comment of copied function.

* Fix padding mask within modeling.

* Revert padding function.

* Simplify handling of padding_mask.

* Address variable codebook size.

* Add output for padding for consistency with original model, fix docstrings.

* last_frame_pad_length as int

* Update example code.

* Improve docstring/comments.

* Shorten expected output.

* Consistent docstring.

* Parameterize tests.

* Properties for derived variables.

* Update expected outputs from GitHub runner.

* Consistent outputs with runner GPUs.
2025-07-23 19:39:27 +02:00
7a4e2e7868 Fix DAC integration tests and checkpoint conversion. (#39313)
* Fix DAC (slow) integration tests.

* Fix DAC conversion.

* Address comments

* Sync with main, uncomment nn.utils.parametrizations.weight_norm.

* Update DAC integration tests with expected outputs.

* Added info about encoder/decoder error and longer decoder outputs.

* Parameterize tests.

* Set expected values to GitHub runners.
2025-07-23 19:21:26 +02:00
596a75f6e9 Move openai import (#39613) 2025-07-23 19:05:39 +02:00
a0e5a7d34b Transformers serve VLM (#39454)
* Add support for VLMs in Transformers Serve

* Raushan comments

* Update src/transformers/commands/serving.py

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Quick fix

* CPU -> Auto

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixup

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 17:03:18 +02:00
ea56eb6bed Fix important models CI (#39576)
* relax test boundaries and fix from config

* eager is always supported.
2025-07-23 16:24:29 +02:00
0fe03afeb8 Fix typos and grammar issues in documentation and code (#39598)
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-23 12:43:11 +00:00
82603b6cc2 Allow device_mesh have multiple dim (#38949)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:27:36 +00:00
10c990f7e2 enable triton backend on awq xpu (#39443)
* enable triton backend on awq xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_awq.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* fix dtype check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:10:38 +00:00
e7e6efcbbd [idefics3] fix for vLLM (#39470)
* fix idefics3 for vllm tests

* fix copies
2025-07-23 14:00:43 +02:00
a62f65a989 fix moe routing_weights (#39581)
* fix moe routing_weights

* fix ernie4_5_moe routing_weights

* fix integration test

---------

Co-authored-by: llbdyiu66 <llbdyiu66@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-23 11:20:23 +00:00
623ab01039 FP-Quant support (#38696)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-23 11:41:10 +02:00
eb1a007f7f Rename supports_static_cache to can_compile_fullgraph (#39505)
* update all

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* apply suggestions

* fix copies

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 09:35:18 +00:00
b357cbb19d [Trackio] Allow single-gpu training and monitor power (#39595)
Allow not distributed and monitor power
2025-07-23 11:22:50 +02:00
019b74977d Generic task-specific base classes (#39584)
* first shot

* Update modeling_layers.py

* fix mro order

* finalize llama

* all modular and copied from from llama

* fix
2025-07-23 10:49:47 +02:00
5dba4bc7b2 Fix DynamicCache and simplify Cache classes a bit (#39590)
* fix

* use kwargs

* simplify

* Update cache_utils.py

* Update cache_utils.py

* Update test_cache_utils.py

* fix

* style
2025-07-23 10:13:45 +02:00
d9b35c635e Mask2former & Maskformer Fast Image Processor (#35685)
* add maskformerfast

* test

* revert do_reduce_labels and add testing

* make style & fix-copies

* add mask2former and make fix-copies
TO DO:
	add test for mask2former

* make fix-copies

* fill docstring

* enable mask2former fast processor

* python utils/custom_init_isort.py

* make fix-copies

* fix PR's comments

* modular file update

* add license

* make style

* modular file

* make fix-copies

* merge

* temp commit

* finish up maskformer mask2former

* remove zero shot examples

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-23 02:47:47 +00:00
6e9972962f 🎯 Trackio integration (#38814)
* First attempt

* fix

* fix

* Enhance TrackioCallback to log GPU memory usage and allocation

* Enhance Trackio integration in callbacks and training arguments documentation

* re order

* remove unused lines

* fix torch optional
2025-07-22 14:50:20 -07:00
c6d0500d15 [WIP] Add OneformerFastImageProcessor (#38343)
* [WIP] OneformerFastImageProcessor

* update init

* Fully working oneformer image processor fast

* change Nearest to Neares exact interpolation where needed

* fix doc

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-22 20:41:39 +00:00
4884b6bf41 Fix link in "Inference server backends" doc (#39589)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-22 16:44:08 +00:00
075a65657a Torchdec RuntimeError catch (#39580)
* fix

* fix

* maybe better

* style
2025-07-22 18:35:03 +02:00
2936902a76 [Paged-Attention] Handle continuous batching for repetition penalty (#39457)
* Handle continuous batching for repetition penalty

* fix last scores and with token mask creation

* add test

* Update src/transformers/generation/continuous_batching.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* remove unneeded cast

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-22 18:13:40 +02:00
cbcb8e6c1f updated mistral3 model card (#39531)
* updated mistral3 model card (#1)

* updated mistral3 model card

* applying suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* made all changes to mistral3.md

* adding space between paragraphs in docs/source/en/model_doc/mistral3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* removing duplicate in mistral3.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* adding 4 backticks to preserve formatting

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 09:01:55 -07:00
601260fd96 Update docs/source/ko/_toctree.yml (#39516)
docs: update `docs/source/ko/_toctree.yml`
2025-07-22 09:00:42 -07:00
c338fd43b0 [cache refactor] Move all the caching logic to a per-layer approach (#39106)
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2
2025-07-22 16:10:25 +02:00
b16688e96a General weight initialization scheme (#39579)
* general + modulars from llama

* all modular models

* style and fix musicgen

* fix

* Update configuration_musicgen.py

* Update modeling_utils.py
2025-07-22 16:04:20 +02:00
015b62bf3e Add AMD GPU expectations for LLaVA tests (#39486)
* Add AMD GPU expectation to llava tests

* FMT

* Remove debug print

* Address review  comments
2025-07-22 14:01:54 +00:00
efceeaf267 Kernels flash attn (#39474)
* use partial to wrap around `transformers` utils!

* try to refactor?

* revert one wrong change

* just a nit

* push

* reverter watever was wrong!

* some nits

* fixes when there is no attention mask

* bring the licence back

* some fixes

* nit

* style

* remove prints

* correct dtype

* fa flags for testing

* update

* use paged attention if requested!

* updates

* a clone was needed, not sure why

* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute

* simplify and improve?

* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else

* fix!

* protect kernels import

* update

* properly parse generation config being passed

* revert and update

* add two tests

* some fixes

* fix test FA2

* takes comment into account

* fixup

* revert changes

* revert the clone, it is only needed because the metal kernel is not doing it?

* [docs] update attention implementation and cache docs (#39547)

* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix mps on our side for now

* Update src/transformers/integrations/flash_paged.py

* no qa

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:41:06 +02:00
b62557e712 Add AMD expectations to Mistral3 tests (#39481)
Add AMD expectations to mistral3 tests
2025-07-22 15:40:16 +02:00
1806583390 [docs] Create page on inference servers with transformers backend (#39550)
* draft docs on inference servers

* Update docs/source/en/_toctree.yml

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* update

* dic build failed

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply last suggestions

---------

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:31:10 +02:00
cd98c1fee3 [docs] update attention implementation and cache docs (#39547)
* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:06:43 +02:00
ef99537f37 Add AMD test expectations to DETR model (#39539)
* Add AMD test expectations to DETR model

* Fix baseline expectation

* Address review comments

* Make formatting a bit more consistent
2025-07-22 12:07:10 +00:00
30567c28e8 [timm_wrapper] add support for gradient checkpointing (#39287)
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification

* ruff fix

* refactor + add test for not supported model

* ruff

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 11:07:52 +00:00
a44dcbe513 Fixes needed for n-d parallelism and TP (#39562)
Handle non-DTensors cases in TP Layers

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-22 10:24:59 +00:00
0cae633ce1 Bump AMD container for 2.7.1 PyTorch (#39458)
* Bump AMD container for 2.7.1 PyTorch

* Forgot to update pinned packages
2025-07-22 12:11:38 +02:00
a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00
3bc726b381 [gemma3] fix bidirectional image mask (#39396)
* fix gemma3 mask

* make compile happy, and use only torch ops

* no full attention between images

* update tests

* fix tests

* add a fast test
2025-07-22 10:04:56 +02:00
fbeaf96f9e Update OLMoE model card (#39344)
* Update OLMoE model card

* Checks Test

* Add license and code

* Update docs/source/en/model_doc/olmoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update olmoe.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:41:01 -07:00
641aaed7c0 Update modernbertdecoder docs (#39453)
* update docs with paper and real model

* nit

* Apply suggestions from code review

Thanks to @stevhlui!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove usage examples, add quantization

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:40:22 -07:00
049a674e68 [CI] Fix post merge ernie 4.5 (#39561)
fix repo consistency
2025-07-21 20:56:24 +02:00
b3ebc761e2 [Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) (#39489)
* improve handlike of other image-like inputs in fast image processors

* fix issues with _prepare_images_structure

* update sam image processor fast

* use dict update
2025-07-21 14:12:14 -04:00
b4115a426e [Ernie 4.5] Add ernie text models (#39228)
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
69b158260f Refactor embedding input/output getter/setter (#39339)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
2025-07-21 18:18:14 +02:00
2da97f0943 🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean (#39441)
* docs: ko: perf_infer_gpu_many.md

* feat: nmt draft

* docs: refine KO translation and enhance naturalness

* docs: add missing TOC to documentation

* Align toctree and filename with original: perf_infer_gpu_multi

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Refine Korean translation

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-21 09:14:15 -07:00
82807e56b1 [Fast image processor] refactor fast image processor glm4v (#39490)
refactor fast image processor glm4v
2025-07-21 11:18:46 -04:00
4b4f04fcca fix ndim check of device_mesh for TP (#39538) 2025-07-21 13:09:33 +00:00
1aa7256f01 Refactor MambaCache to modeling_mamba.py (#38086)
* Refactor MambaCache to modeling_mamba.py (parity with Zamba)

* ruff

* fix dummies

* update

* update

* remove mamba ref in cache tests

* remove cache_implementation from tests

* update

* ruff

* ruff

* sneaky regression

* model consistency

* fix test_multi_gpu_data_parallel_forward

* fix falcon slow tests

* ruff

* ruff

* add sample false

* try to fix slow tests

* Revert "fix test_multi_gpu_data_parallel_forward"

This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33.

* fix tests on nvidia t4, remove dataparallel tests from mamba

* ruff

* remove DDP tests from mamba and falcon_mamba

* add explicit error for MambaCache

* mamba2 also needs to init cache in prepare_inputs_for_generation

* ruff

* ruff

* move MambaCache to its own file

* ruff

* unprotected import fix

* another attempt to fix unprotected imports

* Revert "another attempt to fix unprotected imports"

This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2.

* fixing unprotected import, attempt 3

* Update src/transformers/cache_utils.py

* ruff's fault

* fix arthur review

* modular falcon mamba

* found a hack

* fix config docs

* fix docs

* add export info

* merge modular falcon branch

* oopsie

* fix fast path failing

* new approach

* oopsie

* fix types

* Revert new pragma in modular

This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6.

* trying another modular workaround

* review & fix ci

* oopsie

* clear prepare_inputs on mamba/mamba2/falcon_mamba
2025-07-21 14:59:36 +02:00
a419a40234 Fix Docstring of BarkProcessor (#39546)
* Fix Docstring of BarkProcessor

* Fix typo

* Add type hint of return value for BarkProcessor.__call__
2025-07-21 12:56:44 +00:00
9323d0873c use the enable_gqa param in torch.nn.functional.scaled_dot_product_at… (#39412)
* use the enable_gqa param in torch.nn.functional.scaled_dot_product_attention

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ci failure fix

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add check

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code, extend to cuda

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix review comments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine the PR

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:46:43 +02:00
6b3a1f2f51 Fix missing initializations for models created in 2023 (#39239)
* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:43:52 +02:00
970d9a75ce Raise TypeError instead of ValueError for invalid types (#38660)
* Raise TypeError instead of ValueError for invalid types.

* Removed un-necessary changes.

* Resolved conflicts

* Code quality

* Fix failing tests.

* Fix failing tests.
2025-07-21 12:42:00 +00:00
822c5e45b2 Fix pylint warnings (#39477)
* Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* Fix variable names

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-21 12:38:05 +00:00
dc017cd763 Fix Qwen Omni integration test (#39553)
fix
2025-07-21 14:11:46 +02:00
fdc0566e15 🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments (#39395)
Enable average_tokens_across_devices by default in TrainingArguments

Fixes #39392

This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-21 12:11:20 +00:00
8c102e2eb1 Rename _supports_flash_attn_2 in examples and tests (#39471)
* delete `_supports_flash_attn_2` from examples and tests

* simplify docs
2025-07-21 14:02:57 +02:00
3a152e3a5c Fix the check in flex test (#39548)
* fix the check

* fix flags

* flags
2025-07-21 13:29:44 +02:00
78fb2d2760 Fix bad tensor shape in failing Hubert test. (#39502)
Fix bad tensor shape in Hubert test.
2025-07-21 12:25:52 +01:00
39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
344012b3a6 [qwen2 vl] fix packing with all attentions (#39447)
* fix qwen2 vl packing in FA2

* why? delete!

* qwen2-5-vl seems to work now

* update

* fix tests

* start by adapting FA2 tests

* add similar tests for sdpa/eager

* address comments

* why is this even in conditional model and not base model?
2025-07-21 12:19:15 +02:00
e42681b48b [gemma3] support sequence classification task (#39465)
* add seq clf class

* fix docs and add in auto-map

* skip tests

* optional pixels
2025-07-21 11:03:20 +02:00
34133d0a79 Fix placeholders replacement logic in auto_docstring (#39433)
Fix and simplify placeholders replacement logic
2025-07-18 22:56:23 +00:00
433d2a23d7 Update SAM/SAM HQ attention implementation + fix Cuda sync issues (#39386)
* update attention implementation and improve inference speed

* modular sam_hq + fix integration tests on A10

* fixup

* fix after review

* softmax in correct place

* return attn_weights in sam/sam_hq
2025-07-18 18:46:27 -04:00
541bed22d6 Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py (#39439)
* rename `args_doc.py` to `auto_docstring.py` and improve doc

* modifs after review
2025-07-18 18:00:34 +00:00
de0dd3139d Add fast image processor SAM (#39385)
* add fast image processor sam

* nits
2025-07-18 17:27:16 +00:00
561a79a2f4 Fix BatchEncoding.to() for nested elements (#38985) 2025-07-18 14:14:45 +01:00
f4d076561f [gemma3] Fix do_convert_rgb in image processors. (#39438)
* [gemma3] Fix do_convert_rgb in image processors.

* [gemma3] Fix do_convert_rgb in image processors.
2025-07-18 12:33:00 +00:00
bcc0091937 [chat template] return assistant mask in processors (#38545)
* messed up the git history, squash commits

* raise error if slow and refine tests

* index was off by one

* fix the test
2025-07-18 12:23:20 +00:00
328ca9cf1d [dependencies] Update datasets pin (#39500)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?

* datasets pin

* comment
2025-07-18 12:05:28 +00:00
fb58377700 Slack CI bot: set default result for non-existing artifacts (#39499)
* Set default result for non-existing artifacts

* FMT

* Address review comments
2025-07-18 11:45:47 +00:00
4ded9a4113 🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try

* Update modeling_utils.py

* Update modeling_utils.py

* big refactor

* Update modeling_utils.py

* style

* docstrings and simplify inner workings of configs

* remove all trace of _internal

* Update modeling_utils.py

* fix logic error

* Update modeling_utils.py

* recursive on config

* Update configuration_utils.py

* fix

* Update configuration_dpt.py

* Update configuration_utils.py

* Update configuration_utils.py

* Update modeling_idefics.py

* Update modeling_utils.py

* fix for old models

* more old models fixup

* Update modeling_utils.py

* Update configuration_utils.py

* Remove outdated test

* remove the deepcopy!! 🥵🥵

* Update test_modeling_gpt_bigcode.py

* fix qwen dispatch

* restrict to only models supporting it

* style

* switch name

* Update modeling_utils.py

* Update modeling_utils.py

* add tests!

* fix

* rypo

* remove bad copies

* fix

* Update modeling_utils.py

* additional check

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* skip
2025-07-18 13:41:54 +02:00
2b819ba4e3 [dependencies] temporary pyarrow pin (#39496)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?
2025-07-18 10:05:40 +00:00
967045082f Add voxtral (#39429)
* draft

* draft update (conversion working)

* mend

* draft update

* draft update: working generate

* refactor

* VoxtralProcessor draft

* processor update

* update convert_tekken_tokenizer

* refactor processor

* update convert

* make style

* better handle prefil

* make style

* add tests

* add mistral_common audio loading

* processor update

* revert changes

* audio utils update

* add audio to apply chat template mistral update

* voxtral processor update

* fix

* udpate converstion script

* make mistral tokenier from pretrain work from local dir

* fix udpates

* add integration tests

* add batched version

* processor docstring

* make style

* revert convert_tekken_tokenizer changes

* revert processing_qwen2.5 changes

* add multi-turn test

* processor improvements

* address review changes

* Update src/transformers/tokenization_mistral_common.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* update audio utils

* nits

* integration test update

* correct _support

* update tests

* test update

* update integration tests

* fix

* fix

* fix

* add test_apply_chat_template_with_audio

* add model doc

* model doc

* nit

* doc uptade

* nit

* processor improvement

* ensure default is 3B

* nits

* make

* make

* convert modular

* update checkpoint

* fix test

* make

* make

* autos

* make

* make

* nit

* nit

* nit

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-18 00:02:04 +00:00
73869f2e81 Fix typing order (#39467)
* fix type order

* change all Union[str, dict] to Union[dict, str]

* add hf_parser test && fix test order

* add deepspeed dependency

* replace deepspeed with accelerator
2025-07-17 15:47:31 +00:00
bda75b4011 Add unified logits_to_keep support to LLMClass (#39472)
* add supports for logits_to_keep for qwen25vl and glm4v

* Update relevant modular files
2025-07-17 17:07:12 +02:00
bf6c997685 [serve] Add speech to text (/v1/audio/transcriptions) (#39434)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

* experiment: base model from typed dict

* audio working

* fix bad rebase

* load audio with librosa

* implement timed models

* almost working

* make fixup

* fix tests

* transcription request type

* tokenizer -> processor

* add example in docs

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-07-17 14:29:57 +00:00
8b3de61a65 Update integration_utils.py (#39469)
* Update integration_utils.py

sanitize mlflow upload metric

* Update integration_utils.py

change import order to pass CI

* Update integration_utils.py

add comments

* Update integration_utils.py

Remove whitespace from blank line
2025-07-17 13:57:49 +00:00
7fd60047c8 fix: ImageTextToTextPipeline handles user-defined generation_config (#39374)
fix: ImageTextToTextPipeline handles user-defined generation_config passed to the pipeline

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-07-17 13:23:29 +00:00
60b5471da3 Enable some ruff checks for performance and readability (#39383)
* Fix inefficient sequence tests

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PERF102

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC1802

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC0208

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:21:59 +00:00
fc700c2a26 Fix convert_and_export_with_cache failures for GPU models (#38976)
* Add the `device` option for `generate()`

* Add device for default tensors to avoid tensor mismatch

* [test] Enable test_static_cache_exportability for torch_device

* infer device from the prompt_token_ids

* Add device for generated tensor

* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU

* fix format

* infer device from the model
2025-07-17 13:12:32 +00:00
54680d75c9 Update GemmaIntegrationTest::test_model_2b_bf16_dola (#39362)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-17 14:06:23 +01:00
322400af58 fix a comment typo in utils.py (#39459) 2025-07-17 13:06:04 +00:00
43f07018cf Use newer typing notation (#38934)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:05:21 +00:00
565dd0bad7 Fix tests due to breaking change in accelerate (#39451)
* update values

* fix
2025-07-17 13:51:50 +01:00
26fed50460 fix max_length calculating using cu_seq_lens (#39341) 2025-07-17 10:54:23 +02:00
cdfe6164b3 fix(pipelines): QA pipeline returns fewer than top_k results in batch mode (#39193)
* fixing the bug

* Try a simpler approach

* make fixup

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-07-17 10:24:30 +02:00
b85ed49e0a Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings (#38822)
* Correcting PR #38642.  The PR removed references to the deprecated method "as_target_processor()" in the
__call__ and pad method docstrings, which is correct, but also removed all references to PreTrainedTokenizer,
which is incorrect.  This commit adds back the reference to PreTrainedTokenizer and also takes the
opportunity to enhance the docstrings with the invocation procedure post removal of "as_target_processor()"
and adds information on return values.

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: René Tio <tor@Jammer.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-16 14:13:07 -07:00
787a0128a9 create ijepa modelcard (ref : PR #36979 ). (#39354)
* wip: adding first version of the IJEPA model card.

* refactor based on the @stevhliu feedbacks

* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture

- general context updation.

* - changes of model for example quantization.
- merging the  quantization content.
2025-07-16 12:40:22 -07:00
48f2233cdf Improve grammar and clarity in perf_hardware.md (#39428) 2025-07-16 12:15:15 -07:00
e68ebb695f fix cached file error when repo type is dataset (#36909)
* fix cached file

* Update hub.py
2025-07-16 18:02:26 +02:00
35a416c400 Fix indentation bug in SmolVLM image processor causing KeyError (#39452)
Fix indentation bug in Idefics3 image processor

- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors

cc @yonigozlan

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
2025-07-16 11:59:28 -04:00
2c58705dc2 Updated Megatron conversion script for gpt2 checkpoints (#38969)
* update script to support new megatron gpt format

* fixed quality failures

---------

Co-authored-by: Luke Friedrichs <LckyLke>
2025-07-16 15:54:29 +00:00
26be7f717e [CI] Fix partially red CI (#39448)
fix
2025-07-16 15:53:43 +02:00
0a88751940 Fixes #39204: add fallback if get_base_model missing (#39226)
* Fixes #39204: add fallback if get_base_model missing

* Inline try_get_base_model logic as suggested in PR review

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-16 15:51:30 +02:00
ba506f87db make the loss context manager easier to extend (#39321) 2025-07-16 15:47:24 +02:00
9f1ac6f185 Remove something that should have never been there (#38254)
* what the hell

* update

* style

* style

* typing

* fix init issue

* fix granite moe hybrid as well
2025-07-16 15:22:44 +02:00
a7ca5b5d67 Fix processor tests (#39450)
fix
2025-07-16 15:01:35 +02:00
71818f570b [Bugfix] [Quantization] Remove unused init arg (#39324)
remove unused arg from ct config init

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 14:57:42 +02:00
cc24b0378e Better typing for model.config (#39132)
* Apply to all models config annotation

* Update modular to preserve order

* Apply modular

* fix define docstring

* fix dinov2 consistency (docs<->modular)

* fix InstructBlipVideoForConditionalGeneration docs<->modular consistency

* fixup

* remove duplicate code

* Delete config_class attribute from the modeling code

* Add config_class attribute in base model

* Update init sub class

* Deprecated models update

* Update new models

* Fix remote code BC issue

* fixup

* fixing more corner cases

* fix new models

* add test

* modular docs update

* fix comment a bit

* fix for py3.9
2025-07-16 14:50:35 +02:00
4b258454a7 Fix typo in generation configuration for Janus model weight conversion (#39432)
* Fix typo in generation configuration for Janus model weight conversion

* Fix typo

* Update Janus model generation configuration

* Update Janus model to use generation_kwargs
2025-07-16 14:28:02 +02:00
de5ca373ac Responses API in transformers serve (#39155)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Responses API (to be merged into #39155) (#39338)

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

---------

Co-authored-by: Lysandre <hi@lysand.re>

* Slight bugfixes

* PR comments from #39338

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2025-07-16 14:16:16 +02:00
c8524aeb07 [cache] make all classes cache compatible finally (#38635)
* dump

* push other models

* fix simple greedy generation

* xmod

* add fmst and clean up some mentions of old cache format

* gpt-bigcode now follows standards

* delete tuple cache reference in generation

* fix some models

* fix some models

* fix mambas and support cache in tapas

* fix some more tests

* fix copies

* delete `_reorder_cache`

* another fix copies

* fix typos and delete unnecessary test

* fix rag generate, needs special cache reordering

* fix tapas and superglue

* reformer create special cache

* recurrent gemma `reorder_cache` was a no-op, delete

* fix-copies

* fix blio and musicgen pipeline tests

* fix reformer

* fix reformer, again...

* delete `_supports_cache_class`

* delete `supports_quantized_cache`

* fix failing tests

* fix copies

* some minor clean up

* style

* style

* fix copies

* fix tests

* fix copies

* create causal mask now needs positions?

* fixc copies

* style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* clean-up of non-generative model after merging main

* check `is_decoder` for cache

* delete transpose for scores

* remove tuple cache from docs everywhere

* fix tests

* fix copies

* fix copies once more

* properly deprecate `encoder_attention_mask` in Bert-like models

* import `deprecate_kwarg` where needed

* fix copies again

* fix copies

* delete `nex_decoder_cache`

* fix copies asks to update for PLM

* fix copies

* rebasing had a few new models, fix them and merge asap!

* fix copies once more

* fix slow tests

* fix tests and updare PLM checkpoint

* add read token and revert accidentally removed line

* oh com -on, style

* just skip it, read token has no access to PLM yet

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-16 14:00:17 +02:00
6cb43defd0 docs: add missing numpy import to minimal example (#39444)
docs: add numpy import to minimal example
2025-07-16 11:57:13 +00:00
61163099f1 Remove runtime conditions for type checking (#37340)
Remove dynamic conditions for type checking

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 13:36:48 +02:00
bfc9ddf5c6 Add StableAdamW Optimizer (#39446)
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.

* Fixed issue with

* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers

---------

Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
2025-07-16 13:35:53 +02:00
b9ee528246 add test scanner (#39419)
* add test scanner

* add doc + license

* refactor for only 1 tree traversal

* add back test of only one method

* document single method scan

* format

* fixup generate tests

* minor fix

* fixup

* fixup doc
2025-07-16 12:45:46 +02:00
79941c61ce Fix missing definition of diff_file_url in notification service (#39445)
Fix missing definition of diff_file_url
2025-07-16 12:09:18 +02:00
e048d48bd0 Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870)
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer

* Update src/transformers/optimization.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update optimization.py

fix the error of the unclosed "("

* Update optimization.py

remove whitespace in line 402 in order to pass the quality test

* Update src/transformers/optimization.py

* Update src/transformers/optimization.py

* Apply style fixes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-16 12:01:08 +02:00
0cf08e90dd Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor (#39372)
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
2025-07-16 11:54:20 +02:00
ae4e306a40 Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358)
* Defaults to adamw_torch_fused for latest Pytorch

Signed-off-by: cyy <cyyever@outlook.com>

* Fix test

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-16 09:52:33 +00:00
4524a68c66 Fix L270 - hasattr("moe_args") returning False error (#38715)
* Fix L270 - hasattr("moe_args") returning False error

* Update src/transformers/models/llama4/convert_llama4_weights_to_hf.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 09:45:58 +00:00
d33a1c389f [chat template] add a testcase for kwargs (#39415)
add a testcase
2025-07-16 11:31:35 +02:00
99c9763398 Fixed a bug calculating cross entropy loss in JetMoeForCausalLM (#37830)
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM

In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 11:22:00 +02:00
667ad02374 Remove double soft-max in load-balancing loss. Fixes #39055 . (#39056)
Remove double soft-max in load-balancing loss. Fixes #39055
2025-07-16 09:20:23 +00:00
31d81943c9 [Core] [Offloading] Fix saving offloaded submodules (#39280)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add clarifying comment

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test_save_offloaded_model_with_direct_params

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix merge conflict, add decorators

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 08:44:40 +00:00
add43c4d09 [autodocstring] add video and audio inputs (#39420)
* add  video and audio inputs in auto docstring

* fix copies
2025-07-16 09:41:50 +02:00
0dc2df5dda CI workflow for performed test regressions (#39198)
* WIP script to compare test runs for models

* Update line normalitzation logic

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-16 04:20:02 +02:00
1bc9ac5107 docs: update LightGlue docs (#39407)
* docs: update LightGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:50 -07:00
d9574f2fe3 docs: update SuperGlue docs (#39406)
* docs: update SuperGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:26 -07:00
9f41f67135 [vlm] fix loading of retrieval VLMs (#39242)
* fix vlm with retrieval

* we can't use AutoModel because new ColQwen was released after refactor

* no need for colqwen

* tied weight keys are necessary, if using IMageTextToText

* need to apply renaming in tied weights, only for ColPali

* overwrite tied keys in ColPali

* fix copies, modular can't handle if-statements
2025-07-15 17:23:54 +02:00
b1d14086e4 handle training summary when creating modelcard but offline mode is set (#37095)
* handle training summary when creating modelcard but offline mode is set

* chore: lint
2025-07-15 17:21:15 +02:00
67f42928f0 Remove residual quantization attribute from dequantized models (#39373)
* fix: removing quantization trace attribute from dequantized model

Fixes #39295

* add: test `to(dtype=torch.float16)` after dequantization
2025-07-15 17:16:10 +02:00
30c508dbcb Remove deprecated audio utils functions (#39330)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-15 14:02:25 +00:00
d8e05951b8 Fix bugs in pytorch example run_clm when streaming is enabled (#39286) 2025-07-15 15:37:28 +02:00
a989bf8d84 Fix bugs from pipeline preprocessor overhaul (#39425)
* Correct load classes for VideoClassificationPipeline

* Correct load classes for the ASR pipeline
2025-07-15 14:28:59 +01:00
53c9dcd6fd refactor: remove set_tracer_provider and set_meter_provider calls (#39422) 2025-07-15 14:22:12 +02:00
f03b384149 Fix invalid property (#39384)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-15 12:11:37 +00:00
c4d41567fa set document_question_answering pipeline _load_tokenizer to True (#39411)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-15 12:05:49 +00:00
f56b49f48f Ignore extra position embeddings weights for ESM (#39063)
* Ignore extra position embeddings weights

* Slight name fix
2025-07-15 11:57:32 +00:00
2b79f14375 support loading qwen3 gguf (#38645)
* support loading qwen3 gguf

* Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion

* Add testcase

* Fix formatting
2025-07-15 09:53:41 +00:00
0e4b7938d0 Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967)
* working locally; need to style and test

* added docs and initial tests; need to debug and flesh out

* fixed tests

* working long context; batches

* working fa2 and eager

* update tests

* add missing confnigs

* remove default autoset

* fix spacing

* fix most tests

* fixed tests

* fix to init

* refactor to match new transformers updates

* remove static cache option

* fa2 fix

* fix docs

* in progress

* working on tests

* fixed issue with attn outputs

* remove debug

* fix local config attr

* update doc string

* fix docstring

* add docs to toc

* correct typo in toc

* add new updates from main w.r.t. ModernBERT RoPE

* fix local param

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
2025-07-15 10:40:41 +02:00
0b724114cf Fix typo in /v1/models output payload (#39414) 2025-07-15 08:59:25 +01:00
8d6259b0b8 [refactor] set attention implementation (#38974)
* update

* fix some tests

* init from config, changes it in-place, add deepcopy in tests

* fix modernbert

* don't delete thsi config attr

* update

* style and copies

* skip tests in generation

* fix style

* accidentally removed flash-attn-3, revert

* docs

* forgot about flags set to False

* fix copies

* address a few comments

* fix copies

* custom code BC
2025-07-15 09:34:06 +02:00
6017f5e8ed [siglip] fix pooling comment (#39378)
* feat(siglip2): add forward pass with pooled output logic in Siglip2TextModel

* test(siglip2): add test_text_model.py to verify pooled output behavior

* style(siglip2): fix formatting in test_text_model.py using Ruff

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* chore(siglip2): regenerate classic model after modular change

* Update
2025-07-14 17:47:19 +00:00
8d40ca5749 Update phi4_multimodal.md (#38830)
* Update phi4_multimodal.md

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-14 10:35:17 -07:00
3635415af2 [Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic (#39391)
Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
2025-07-14 09:25:06 -07:00
3a48e9534c Use np.pad instead of np.lib.pad. (#39346)
* Use np.pad instead of np.lib.pad.

* Update audio_utils.py

Formatting
2025-07-14 16:05:28 +00:00
3d8be20cd2 Totally rewrite how pipelines load preprocessors (#38947)
* Totally rewrite how pipelines load preprocessors

* Delete more mappings

* Fix conditionals, thanks Cyril!
2025-07-14 16:40:04 +01:00
903944a411 [examples] fix do_reduce_labels argument for run_semantic_segmentation_no_trainer (#39322)
* no use do_reduce_labels argument in model

* use do_reducer_labels in AutoImageProcessor
2025-07-14 10:16:49 +00:00
8165c703ab Fix Lfm2 and common tests (#39398)
* fix

* better fix

* typo
2025-07-14 12:02:59 +02:00
878d60a3cb Deprecate AutoModelForVision2Seq (#38900)
deprecate vision2seq
2025-07-14 11:42:06 +02:00
ad333d4852 [Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor (#39333)
* Update modeling_qwen2_5_vl.py

### 🐛 Bug Description

When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42cef21662c304583368ff6645116a45), the model crashes due to a type mismatch in the attention mask handling.

---

### 🔥 Error Traceback

* Fix dtype compatibility in attention mask processing

Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:

Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* # Fix: Use appropriate function based on dtype

* Update modular_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* Fix: Use appropriate function based on dtype

* Fix: Use appropriate function based on dtype

* Updatet modeling_glm4v.py

* Only apply conversion for floating point tensors (inverted masks)

* corrected the format issue

reformatted modeling_glm4v.py

All done!  🍰 
1 file reformatted

* Fix: Cast to float before applying torch.finfo

Corrected the format issue

* Fix torch.finfo() for integer attention mask

#39333

* Run make fix-copies and make style for CI compliance

- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC

* Fix torch.finfo() TypeError for

Fix torch.finfo() TypeError for integer attention_mask_tensor #39333

* Fix torch.finfo() TypeError for integer
2025-07-14 07:47:39 +00:00
c30af65521 [BLIP] remove cache from Qformer (#39335)
* remove cache from Qformer

* fix

* this was never correct...
2025-07-14 09:20:01 +02:00
66cd995618 [shieldgemma] fix checkpoint loading (#39348)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-14 08:34:58 +02:00
a1ad9197c5 Fix overriding Fast Image/Video Processors instance attributes affect other instances (#39363)
* fix and add tests

* nit
2025-07-12 23:39:06 +00:00
dc98fb3e5e update docker file to use latest timm (for perception_lm) (#39380)
update docker file for timm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-12 23:19:37 +02:00
5c30f7e390 Update Model Card for Encoder Decoder Model (#39272)
* update model card.

* add back the model contributors for mamba and mamba2.

* update the model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update batches with correct alignment.

* update examples and remove quantization example.

* update the examples.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example.

* correct the example.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-11 11:23:08 -07:00
0d7efe3e4b fix gpt2 usage doc (#39351)
fix typo of gpt2 doc usage
2025-07-11 10:59:41 -07:00
a646fd55fd Updated CamemBERT model card to new standardized format (#39227)
* Updated CamemBERT model card to new standardized format

* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution

* Updated CamemBERT usage examples, quantization, badges, and format

* Updated CamemBERT badges

* Fixed CLI Section
2025-07-11 10:59:09 -07:00
af74ec65a7 Update Readme to Run Multiple Choice Script from Example Directory (#39323)
* Update Readme to run in current place

* Update Readme files to execute PyTorch examples from their respective folders
2025-07-11 10:58:26 -07:00
70e57e4710 Add mistral common support (#38906)
* wip: correct docstrings

* Add mistral-common support.

* quality

* wip: add requested methods

* wip: fix tests

* wip: add internally some methods not being supported in mistral-common

* wip

* wip: add opencv dependency and update test list

* wip: add mistral-common to testing dependencies

* wip: revert some test changes

* wip: ci

* wip: ci

* clean

* check

* check

* check

* wip: add hf image format to apply_chat_template and return pixel_values

* wip: make mistral-common non-installed safe

* wip: clean zip

* fix: from_pretrained

* fix: path and base64

* fix: path and import root

* wip: add docs

* clean

* clean

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-11 16:26:58 +00:00
665418dacc Remove device check in HQQ quantizer (#39299)
* Remove device check in HQQ quantizer

Fix https://github.com/huggingface/transformers/issues/38439

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-11 14:59:51 +00:00
601bea2c4e Verbose error in fix mode for utils/check_docstrings.py (#38915)
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.
2025-07-11 14:36:10 +00:00
24f771a043 fix failing test_sdpa_can_dispatch_on_flash (#39259)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-11 16:30:56 +02:00
ee74397d20 update cb TP (#39361)
* update cb TP

* safety
2025-07-11 15:54:25 +02:00
9bc675b3b6 Fix link for testpypi (#39360)
fix link
2025-07-11 15:34:01 +02:00
bf607f6d3b PerceptionLM (#37878)
* plm template

* A working plm with fixed image features

* hacked processor

* First version that reproduced PLM output using PE from timm.

* Simplify and fix tie_word_embeddings

* Use PIL resize. Simplify converstion.

* First version that works with video input.

* simplifed image preprocessing (not batched)

* Minor fixes after rebasing on main.

* Video processor based on new API.

* Revert to use _preprocess for image processor.

* refactor with modular

* fix tie_word_embedding

* Testing with timm PE

* check in missed converstion from modular to model.py

* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.

* address review comments

* Fixed batching if video and image examples mixed.

* Simplify PE configuration.

* Enable AutoModel for PerceptionEncoder.

* Update PE config style.

* update all headers

* Minor fixes.

* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.

* Fix for testing_modeling_perception_lm.py

* Image processing refactoring to use more common parts.

* Fix processor test.

* update tests to use model from hub

* More test fixes.

* integration test GT update after rebasing; probably due to video preprocessing

* update test media path to hub

* Stop tracking local scripts

* address some review comments

* refactor image processing.

* small fixes

* update documentation and minor fixes

* remove scripts

* Minor fix for CI

* Fix image processing

* CI and doc fix

* CI formatting fix

* ruff fix

* ruff formatting

* ran utils/sort_auto_mappings.py

* update docstring

* more docstring udpates

* add vision_input_type default fallback for image processing

* more verbose variable naming

* test update

* Remove PE and PEConfig use AutoModel(TimmWrapper) instead

* Minor cleanup.

* Minor Fix: remove any ref to PE. Ruff format and check.

* fix docstring

* Fix modular/model consistency.Improvex docstringfor  .

* Fix PerceptionLMForConditionalGenerationModelTest

* ruff fix

* fix for check_repo

* minor formatting

* dummy size arg to fix for processor test.

* Update docstring for PerceptionLMConfig

* Minor fixes from review feedback.

* Revert some minor changes per reviewer feedback.

* update base_model_prefix

* address reviewer feedback

* fix comment in modeling file

* address reviewer feedback

* ruff format

* Pre-merge test update.

* reapply modular and fix checkpoint name

* processor test path

* use modular a bit more

* remove dead code

* add token decorator

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-11 11:07:32 +02:00
4b47b2b8ea Updated Switch Transformers model card with standardized format (Issue #36979) (#39305)
* Updated Switch Transformers model card with standardized format (Issue #36979)

* Apply reviewer suggestions to the new standardised Switch Transformer's model card

* Update switch_transformers.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-10 15:34:10 -07:00
fe1a5b73e6 [modular] speedup check_modular_conversion with multiprocessing (#37456)
* Change topological sort to return level-based output (lists of lists)

* Update main for modular converter

* Update test

* update check_modular_conversion

* Update gitignore

* Fix missing conversion for glm4

* Update

* Fix error msg

* Fixup

* fix docstring

* update docs

* Add comment

* delete qwen3_moe
2025-07-10 19:07:59 +01:00
571a8c2131 Add a default value for position_ids in masking_utils (#39310)
* set default

* Update masking_utils.py

* add small test
2025-07-10 18:53:40 +02:00
bdc8028cb3 [Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-10 18:33:30 +02:00
df49b399dc [tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?

* add todo
2025-07-10 15:40:08 +00:00
36e80a18da [modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342)
* [modeling][lfm2] remove deprecated seen_tokens

* [modular][lfm2] remove deprecated seen_tokens from modular file
2025-07-10 17:27:55 +02:00
9682d07f92 LFM2 (#39340)
* [modeling][lfm2] LFM2 model on 4.53.0 interface

* [configuration] hook in LFM2 keys

* [modeling][lfm2] update modeling interface for 4.53.1

* [modeling][lfm2] apply mask to hidden conv states

* [misc] ruff format/lint

* [modeling][lfm2] minor: NotImplemented legacy cache conversion

* Create lfm2.md

* create nice modular

* style

* Update modeling_auto.py

* clean and start adding tests

* style

* Update test_modeling_lfm2.py

* Update __init__.py

* small test model size

* config

* small fix

* fix

* remove useless config attrs -> block_dim and conv_dim are hiden_size

* fix prepare inputs

* fix config

* test

* typo

* skip tests accordingly

* config docstrings

* add doc to .md

* skip config docstring check

---------

Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-10 16:07:33 +02:00
38c3931362 [server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config

* tool integration test

* add install step

* add accelerate as dep to serving

* add todo
2025-07-10 13:41:38 +00:00
6b09c8eab0 Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393)
* Update convert_dac_checkpoint.py

* Update convert_dac_checkpoint.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-10 10:36:58 +00:00
92043bde29 fix phi3 tests (#39312)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-10 11:51:55 +02:00
520b9dcb42 fix Glm4v batch videos forward (#39172)
* changes for video

* update modular

* change get_video_features

* update video token replacement

* update modular

* add test and fix typo

* lint

* fix order

* lint

* fix

* remove dependency

* lint

* lint

* remove todo

* resize video for test

* lint..

* fix test

* new a processor for video_test

* fix test
2025-07-10 10:44:28 +02:00
bc161d5d06 Delete deprecated stuff (#38838)
* delete deprecated stuff

* fix copies

* remove unused tests

* fix modernbert and fuyu

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* bye bye `seen_tokens`

* address comments

* update typings

* ecnoder decoder models follow same pattern as whisper

* fix copies

* why is it set to False?

* fix switch transformers

* fix encoder decoder models shared weight

* fix copies and RAG

* remove `next_cache`

* fix gptj/git

* fix copies

* fix copies

* style...

* another forgotten docsrting

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-10 05:18:44 +00:00
c6ee0b1da8 Fix broken SAM after #39120 (#39289)
fix
2025-07-09 17:46:22 -04:00
aff7df8436 enable static cache on TP model (#39164)
* enable static cache on TP model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check tp size before init kv cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix docstring

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tp tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix other cache head size

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-09 21:14:45 +00:00
2ef59646b8 Fix max_length_q and max_length_k types to flash_attn_varlen_func (#37206)
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-09 23:12:39 +02:00
2d600a4363 Granite speech speedups (#39197)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum

* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.

* support audio.shape=(1,L)
2025-07-09 23:09:50 +02:00
5111c8ea2f Fix typo: langauge -> language (#39317) 2025-07-09 12:06:46 -07:00
2781ad092d docs: update LLaVA-NeXT model card (#38894)
* docs: update LLaVA-NeXT model card

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [docs] Updated llava_next model card

* Update docs/source/en/model_doc/llava_next.md remove image sources

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [fix] Change Flash Attention to SDPA badge

* [doc] fixed quantization example

* docs: updated contribution details and badges

* Update llava_next.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 11:32:40 -07:00
16dd7f48d0 skip files in src/ for doctest (for now) (#39316)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:36:48 +02:00
d61c0d087c Updated the Model docs - for the MARIAN model (#39138)
* Update marian.md

This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:

- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.

This update improves the readability, usability, and consistency of the Marian model documentation for users.

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 10:23:03 -07:00
161cf3415e add stevhliu to the list in self-comment-ci.yml (#39315)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:07:44 +02:00
3be10c6d19 Fix consistency and a few docstrings warnings (#39314)
* Update modeling_deepseek_v2.py

* fix docstrings

* fix

* fix
2025-07-09 18:40:37 +02:00
4652677c89 🌐 [i18n-KO] Translated quark.md to Korean (#39268)
* initial translation

* removed english parts

* maintain consistency

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* add toctree

* fixed indentation

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-07-09 09:29:51 -07:00
c980904204 Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
accbd8e0fe [sliding window] revert and deprecate (#39301)
* bring back and deprecate

* oops

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-09 16:10:38 +02:00
1cefb5d788 [modular] Allow method with the same name in case of @property decorator (#39308)
* fix

* add example

* fix

* Update modular_model_converter.py
2025-07-09 15:46:53 +02:00
4798c05c64 skip test_torchscript_* for now until the majority of the community ask for it (#39307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 15:35:48 +02:00
fe5f3c85d2 fix aria tests (#39277)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 13:49:33 +02:00
0687d481e2 [flash attn 3] bring back flags (#39294)
* flash attn 3 flag

* fix copies
2025-07-09 09:45:01 +02:00
25343aafee Fix SDPA attention precision issue in Qwen2.5-VL (#37363)
* solve conflicts and remove  redundant attention_mask in qwenvit

* update decoded text check

* remove trailing whitespace
2025-07-09 07:03:44 +02:00
0e1c281745 [Tests] Update model_id in AIMv2 Tests (#39281)
* Update model_id in tests

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 21:46:32 +02:00
7ef592c96c Update T5gemma (#39210)
* bug fix: add vocab_size to t5gemmaconfig for pipeline.

* Update checkpoint placeholder

* minor change

* minor change

* minor change: update example.

* fix: add vocab_size as an explict arg.

* buf fix:

remove vocab_size verification; instead, re-set encoder/decoder vocab size.

Note, in t5gemma, vocab size of encoder/decoder shoud be always the same.

* add `add_generation_prompt` for message preprocessing.
2025-07-08 19:08:48 +02:00
1ecd52e50a Add torchcodec in docstrings/tests for datasets 4.0 (#39156)
* fix dataset run_object_detection

* bump version

* keep same dataset actually

* torchcodec in docstrings and testing utils

* torchcodec in dockerfiles and requirements

* remove duplicate

* add torchocodec to all the remaining docker files

* fix tests

* support torchcodec in audio classification and ASR

* [commit to revert] build ci-dev images

* [commit to revert] trigger circleci

* [commit to revert] build ci-dev images

* fix

* fix modeling_hubert

* backward compatible run_object_detection

* revert ci trigger commits

* fix mono conversion and support torch tensor as input

* revert map_to_array docs + fix it

* revert mono

* nit in docstring

* style

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 17:06:12 +02:00
1255480fd2 [lightglue] add support for remote code DISK keypoint detector (#39253)
* feat: add trust_remote_code in LightGlueConfig

* fix: made sure trust_remote_code is provided only when necessary

* fix: make style

* docs: added missing trust_remote_code docstring

* refactor: refactored LightGlue config init

* fix: removed unnecessary argument
2025-07-08 15:03:04 +00:00
838a0268b8 fix flaky test_generate_compile_model_forward (#39276)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 15:36:05 +02:00
29d0030e23 Refactor PretrainedConfig.__init__ method to make it more explicit (#39158)
* cleanup

* fix no `__init__` test

* fix missing inits
2025-07-08 14:24:39 +01:00
1580f64653 [smollm3] add tokenizer mapping for smollm3 (#39271)
add tok mapping to smollm3
2025-07-08 10:44:01 +00:00
db05e4ff33 [pagged-attention] fix off-by-1 error in pagged attention generation (#39258)
* fix off-by-1 error in pagged attention generation

* formatting

* use update_with_token
2025-07-08 12:34:22 +02:00
6f1a43896c [CI] fix docs (#39273)
* fix docs

* add ko gloassary file to toctree
2025-07-08 11:31:03 +01:00
fbdaa7b099 Add Aimv2 model (#36625)
* Model skelton

* changes

* temp push

* changes

* Added support for aimv2-native

* More changes

* More changes

* Stupid mistake correction

* Added config and refactor

* Added vison model

* update

* Refactor for lit variant

* Added Text Model

* Minor fixes

* nits

* update

* Preliminary tests

* More fixes

* Updated tests 🤗

* Refactor

* Updated testcase

* Updated config

* make fixup

* more fixes

* Bug fix and updates

* deadcode

* Fixes

* nit

* up

* Happy CI 

* Reduce LOC

* nit

* nit

* make style

* return_dict refactor

* bug fix

* fix

* doc update

* nit

* make fixup

* Minor update

* _init_weigths modifcation

* update tests

* Minor fixes post review

* Update w.r.t GradientCheckpointingLayer

* docs update

* update

* nit

* Use more Modular 😉

* Change name from AIMv2 to Aimv2

* Nit

* make style

* Add model doc pointer

* make style

* Update model doc section

* updates

* Modify attn mask and interface

* update test

* Final change

* Utilize flash and flex attn

* keep attn mask

* camelcase model name in test file

* Fix docstring

* Fix config warning finally and create_causal_mask

* disable torchscript

* remove unused arg

* remove from tests

* balance model size for tests

* fix device

* tests

* tests

* flaky test

* fix import

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:53:21 +02:00
d8590b4b0c Add Doge model (#35891)
* Add Doge Model

* Fix code quality

* Rollback an error commit

* Fix config for open-source weights

* Revert "Fix config for open-source weights"

This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76.

* Add modular_doge

* Update Doge inherits from Llama

* Fix import bug

* [docs] Add usage of doge model

* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils

* [docs] remove trust remote code from doge

* Fix dynamo bug in doge model

* Update docstrings

* Import apply_rotary_pos_emb and repeat_kv from Llama

* Fix all nits

* Fix code quality

* Fix some bugs

* Fix code quality

* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.

* Fix the wrong tensor orderings in DogeCDMoE

* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation

* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM

* Modify CDMoE for batch efficient implementation

* Uniform MoE configuration names, just like QwenMoE

* Fix code quality

* Fix code quality

* Fix code quality

* Add tp plan of CDMoE Module

* Hybird DMA with sliding window

* Update valid tokens greater than window size

* Fix code quality

* Add `convert_doge_weights_to_hf`

* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py

* Fix nits in modular_doge

* Fix code quality

* Fix all nits

* Fix all nits

* Make sure the attention function is updated inside the class

* Fix code quality issues in the Doge model and add a test for it

* Fix `test_generate`

* Fix code quality

* Fix nits fllowing suggestions

* Fix code quality

* Fix code quality issues

* Fix nits

* Fix code quality nits

* Fix the missing parameters in the configuration.

* Fix the missing parameters in the configuration.

* Fix nits

* Add initialization of attention

* Fix last nits

* Simplify dynamic mask generation logic

* Rename router_logits to gate_logits for matching latest changes of MixtralModel

* Rename typings for matching latest changes of MixtralModel

* Fixes typo in comment

* Update src/transformers/models/doge/modular_doge.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Fix code quality issues to match other modular

* Fix code quality issues to match other modular

* Fix the static compilation errors

* Update model weights link

* Fix code quality issues to match other modular

* reapply modular and support for new outputs

* style

* simplify a lot

* fix import location

* reapply modular

* fix

* fix integration test

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-08 11:44:29 +02:00
d370bc64c6 Fix errors when use verl to train GLM4.1v model (#39199)
* Fix errors when use verl to train GLM4.1v model

* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}

* Update modeling_auto.py
2025-07-08 09:39:31 +00:00
5fb8bb3e1a fix recompiles due to instance key, and deepcopy issues (#39270)
* fix recompiles due to instance key, and deepcopy issues

* dict
2025-07-08 11:38:11 +02:00
356fd68109 fix(generation): stop beam search per-instance when heuristic satisfied (#38778)
* fix(decoding): stop beam search per-instance when heuristic satisfied

Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.

Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.

* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic

* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied

* Add comments for early stop heuristic

* Update src/transformers/generation/utils.py

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-08 08:59:37 +00:00
0b0ede8b2b remove broken block (#39255)
* remove broken block

* fixup
2025-07-08 10:41:44 +02:00
a21557fa3e Skip test_eager_matches sdpa generate and update an integration test for blip-like models (#39248)
* skip

* skip

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-08 10:38:25 +02:00
ea3c2c0277 Fix license text, duplicate assignment, and typo in constant names (#39250)
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
2025-07-08 10:20:52 +02:00
b2816da802 fix xpu failures on PT 2.7 and 2.8 w/o IPEX and enable hqq cases on XPU (#39187)
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* enable hqq uts on XPU, all passed

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix comment

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-07-08 10:18:26 +02:00
17b3c96c00 Glm 4 doc (#39247)
* update the glm4 model readme

* update test

* update GLM-4.1V model

* update as format

* update

* fix some tests

* fix the rest

* fix on a10, not t4

* nit: dummy import

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-07-08 08:22:04 +02:00
bbca9782ca Update LED model card (#39233)
* Update LED model card

* Remove extra arguments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-07 15:56:57 -07:00
41e865bb8d fix some flaky tests in tests/generation/test_utils.py (#39254)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 19:49:41 +02:00
93747d89ea Simplify Mixtral and its modular children (#39252)
* simplify mixtral a lot

* fix

* other moes

* mixtral

* qwen3

* back

* Update modular_qwen3_moe.py
2025-07-07 19:40:41 +02:00
3993ee1e98 Add segmentation_maps support to MobileNetV2ImageProcessor (#37312)
* Add `segmentation_maps` support to mobilenet_v2 image processor and `reduce_labels` to mobilevit

* Changed mobilenetv2 tests to support fastimageprocessor

* added `segmentation_maps` support to fast image processor

* reverted to upstream/main

* Add optional

* Use autodocstring

* Changed docs

* Docs fix

* Changed fp to match beit fp

* Change typing imports

* Fixed repo inconsistency

* Added fast-slow equivalence tests

* Removed unnecessary call

* Add `reduce_labels` to Mobilevit fast processor

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-07 13:34:59 -04:00
b96f213fcf Clarify per_device_train_batch_size scaling in TrainingArguments (#38… (#38857)
Clarify global batch size calculation in TrainingArguments (#38484)
2025-07-07 16:57:42 +00:00
9698052560 Add Korean translation for glossary.md (#38804)
* Add Korean translation for glossary.md

* Update docs/source/ko/glossary.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/glossary.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Joosun40 <77312900+Joosun40@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-07 09:12:55 -07:00
bf203aa9da Update tiny-agents example (#39245) 2025-07-07 15:58:36 +02:00
c4e39ee59c adjust input and output texts for test_modeling_recurrent_gemma.py (#39190)
* adjust input and output texts for test_modeling_recurrent_gemma.py

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update Expectation match

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:13:25 +02:00
14cba7ad33 enable xpu on kv-cache and hqq doc (#39246)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-07 13:12:02 +00:00
32db48db73 Fix patch helper (#39216)
remove -1
2025-07-07 15:11:48 +02:00
a3618d485a RotaryEmbeddings change is not None -> isinstance(..., dict) (#39145)
is None -> isinstance dict
2025-07-07 14:05:28 +01:00
9b09fe479f fix fastspeech2_conformer tests (#39229)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 15:04:26 +02:00
00e9efceab [bugfix] fix flash attention 2 unavailable error on Ascend NPU (#39166)
[bugfix] fix flash attention 2 error on Ascend NPU
2025-07-07 13:03:39 +00:00
056fa73fae [modular] Simplify logic and docstring handling (#39185)
* simplify a lot

* Update modular_model_converter.py

* finalize

* remove outdated functions

* apply it

* and examples
2025-07-07 14:52:57 +02:00
f16fbfb89a Make _compute_dynamic_ntk_parameters exportable (#39171)
* Make _compute_dynamic_ntk_parameters exportable

* add unit test
2025-07-07 14:48:31 +02:00
4243bb844d fix bug using FSDP V1 will lead to model device not properly set (#39177)
* fix bug using FSDP V1 will lead to model device not properly set

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update the code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-07-07 14:47:04 +02:00
34c16167eb Don't send new comment if the previous one is less than 30 minutes (unless the content is changed) (#39170)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-07 14:43:50 +02:00
b8f397e456 fix typo in Gemma3n notes (#39196) 2025-07-07 14:41:33 +02:00
5348fbc005 [modular] Follow global indexing and attribute setting, and their dependencies (#39180)
* export global indexing statements

* add example

* style

* examples
2025-07-07 14:36:43 +02:00
8570bc29f3 Fix missing fast tokenizer/image_processor in whisper/qwen2.5-omni processor (#39244)
* fix missing fast tokenizer in whisper processor

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix processor test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix qwen2.5 omni processor

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-07 13:54:18 +02:00
b283d52f7f [vjepa2] replace einsum with unsqueeze (#39234) 2025-07-07 11:14:08 +01:00
a325409a50 Expectations re-order and corrected FA3 skip (#39195)
* Fix Expectations and a FA3 skip

* Fixed docstring

* Added context for Default expectation
2025-07-07 11:42:33 +02:00
b0a8e0b8d7 [video processors] Support float fps for precise frame sampling (#39134)
* [video processors] Support float fps for precise frame sampling

Enable fractional fps values (e.g., 1.5, 29.97) in video processors
for more precise frame sampling control.

- Change fps type from int to float across all video processors
- Maintain backward compatibility with integer values

Extends: #38105

* [video processors] Refine fps typing to Union[int, float]

Change fps type from Optional[float] to Optional[Union[int, float]]
for more explicit type information about supporting both integer
and floating-point frame rates.

- Update type hints and docstrings across 8 files
- Maintain backward compatibility
- Clarify support for both int and float values

Extends: #38105

* Revert "[video processors] Support float fps for precise frame sampling"

This reverts commit 7360d6e661b413ca0239e5ef61f9b1abbeab8e65.
2025-07-07 03:43:43 +00:00
ca7e1a3756 Refactor the way we handle outputs for new llamas and new models (#39120)
* just update 2 files

* update other models as well just making fix-copies

* also add the changes needed to modeling utils

* put this on the pretrained model instead

* nits and fixes

* update generic, fix to use config value

* update other modelings

* use transformers kwargs instead

* update

* update

* update other models

* update

* updates

* update

* update

* update

* fix

* finally

* very small nits

* this fixes more tests

* fix other models as well!

* update modularqwen2

* update models based on qwen2

* update

* update

* remove the **flash stuff in favor of noraml kwargs

* update

* propagate gemma?

* remove output attentions

* propagate

* support cross attention edge case

* same

* test this

* fixes

* more fix

* update

* update

* fix conflicts

* update

* fix emu3

* fix emu3

* move the fix a bit

* quel enfer

* some fixes, loss_kwargs should never had been

* finish fixing gemma3n

* fix small lm3

* fix another one

* fix csm now

* fux csm and mistral

* fix mistral now

* small fixes

* fix janusss

* only for some models

* fixup

* phix phi3

* more fixes?

* dose this fix it?

* update

* holy shit it was just graph breaks

* protect torch

* updates

* fix samhq?

* fix moonshine

* more moonshine fixes, 3 failures left!

* nits

* generic needs to support more

* more fixes to moonshine!

* fix cross attention outputs!

* fix csm!

* nits

* fix stupid kosmos2

* current updates

* fixes

* use output recorder?

* nicer!

* a little bit of magic

* update

* fix protect

* fix

* small fixes

* protect import

* fix a bunch of more models

* fix fixups

* fix some of the last ones

* nit

* partly fix phi

* update

* fix import path

* make something that is fullgraph compatible just to be sure

* typing was wrong on llama so the rest was wrong as well

* fucking ugly but at least it is still exportable

* syle

* supposed to fix moonshine, it still breaks

* fix some default

* fix the last bits of sam

* update samhq

* more fixes to am hq

* nit

* fix all output+hidden states and output_attentions!

* fix?

* fix diffllama

* updates to fix initialization on the sam pips

* ups there was a bug

* fix the last sam hq test

* fix gotocr

* fix gotocr2!

* fixes

* skip stupid tests

* there was one left :)

* fixup

* fix fix copies issues with this test file

* fix copies for sam_hq

* rm some comments

* skip 2 more failing tests

* fix

* fix everything

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* add more doc!

* fix public init

* fix modular qwen3

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-07-05 11:34:28 +02:00
e6a8063ef1 Update expected values (after switching to A10) - part 8 - Final (#39220)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 13:35:53 +02:00
cd8a041a4f Update expected values (after switching to A10) - part 7 (#39218)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-04 12:48:10 +02:00
0cf27916f0 Add packed tensor format support for flex/sdpa/eager through the mask! (#39194)
* Add the necesary logic to mask_utils

* add it everywhere

* Update masking_utils.py

* style

* Update masking_utils.py

* Update modeling_mimi.py

* Update masking_utils.py

* add support for more than batch size 1

* Update masking_utils.py

* add test

* style

* Update test_masking_utils.py

* Update masking_utils.py

* add require_token

* fix tests

* fix
2025-07-04 09:01:56 +02:00
037755ed54 Update expected values (after switching to A10) - part 6 (#39207)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 22:45:30 +02:00
1168f57abf Update expected values (after switching to A10) - part 5 (#39205)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 19:56:02 +02:00
7d9e52f376 Fix continuous batching in transformers serve (#39149)
* Fix CB

* Nit

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add todos

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-03 18:15:31 +02:00
85d93cc6e3 [serve] Cursor support, move docs into separate page, add more examples (#39133)
* jan docs

* rm

* [cursor] tmp commit

* Cursor working :D

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/commands/serving.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* cursor docs

* try to fix agents/tools docs?

* try to fix agents/tools docs?

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* add transformers chat example with transformers serve

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-07-03 17:04:16 +01:00
e15b06d8dc [typing] better return typehints for from_pretrained (#39184)
* config

* processor

* feature-extractor

* jukebox

* fixup

* update other methods in config

* remove "PretrainedConfig" annotations
2025-07-03 14:22:47 +00:00
a25fc3592e Update expected values (after switching to A10) - part 4 (#39189)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-03 15:13:06 +02:00
b31e9d19a6 [Dia] Change ckpt path in docs (#39181)
fix ckpt path
2025-07-03 10:02:58 +00:00
18e0cae207 Fix many HPU failures in the CI (#39066)
* more torch.hpu patches

* increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early.

* remove temporal fix

* fix scatter operation when input and src are the same

* trigger

* fix and reduce

* skip finding batch size as it makes the hpu go loco

* fix fsdp (yay all are passing)

* fix checking equal nan values

* style

* remove models list

* order

* rename to cuda_extensions

* Update src/transformers/trainer.py
2025-07-03 11:17:27 +02:00
bff964c429 Decouple device_map='auto' and tp_plan='auto' (#38942)
* dissociate

* better place

* fix
2025-07-03 11:07:11 +02:00
8178c43112 when delaying optimizer creation only prepare the model (#39152) 2025-07-03 09:04:16 +02:00
91221da2f1 [glm4v] fix video inference (#39174)
fix video inference
2025-07-03 05:20:41 +00:00
ebfbcd42da Test fixes for Aria (and some Expectation for llava_next_video) (#39131)
* Expectations for llava_next_video

* Updated image src in aria

* Fix test_small_model_integration_test

* Fix small model integration llama

* Fix a bunch of tests

* Style

* Shortened generation in test from 900 to 90
2025-07-02 23:41:14 +02:00
37a239ca50 Update expected values (after switching to A10) - part 3 (#39179)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:48:30 +02:00
9326fc332d Update expected values (after switching to A10) - part 2 (#39165)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* [skip ci]

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-02 22:47:55 +02:00
25cd65ac43 Random serve fixes (#39176)
* Fix index out of bounds exception on wrong kv reuse

* Prevent loading same model twice

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2025-07-02 22:09:58 +02:00
548794b886 [serve] Model name or path should be required (#39178)
* Model name or path should be required

* Fix + add tests

* Change print to log so it doesn't display in transformers chat
2025-07-02 22:06:47 +02:00
2d561713f8 [generate] document non-canonical beam search default behavior (#39000) 2025-07-02 18:29:16 +01:00
df12d87d18 [docs] ViTPose (#38630)
* vitpose

* fix?

* fix?

* feedback

* fix

* feedback

* feedback

* update sample image
2025-07-02 07:56:29 -07:00
2b4a12b5bf Reduce Glm4v model test size significantly (#39173)
* fix test size

* Update test_modeling_glm4v.py
2025-07-02 15:55:05 +02:00
e355c0a11c Fix missing initializations for models created in 2024 (#38987)
* fix GroundingDino

* fix SuperGlue

* fix GroundingDino

* fix MambaModel

* fix OmDetTurbo

* fix SegGpt

* fix Qwen2Audio

* fix Mamba2

* fix DabDetr

* fix Dac

* fix FalconMamba

* skip timm initialization

* fix Encodec and MusicgenMelody

* fix Musicgen

* skip timm initialization test

* fix OmDetTurbo

* clean the code

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* add reviewed changes

* add back timm

* style

* better check for parametrizations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-02 15:03:57 +02:00
1125513a8d Blip2 fixes (#39080)
* Fixed some devices errors

* Fixed other device issues and more expectations

* Reverted support flags

* style

* More granular support

* Fixed some rebase stuff

* add a not None check before .to
2025-07-02 14:39:39 +02:00
28df7f854a Fix multimodal processor get duplicate arguments when receive kwargs for initialization (#39125)
* fix processor tokenizer override

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* check image processor same

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-02 19:57:15 +08:00
b61023a1b7 🚨🚨🚨 [eomt] make EoMT compatible with pipeline (#39122)
* Make EoMT compatible with pipeline

* Implicit patch offsets

* remove patch offsets from arg

* Modify tests

* Update example

* fix proc testcase

* Add few more args

* add pipeline test suite

* fix

* docstring fixes

* add pipeline test

* changes w.r.t review

* 🙈 MB

* should fix device mismatch

* debug

* Fixes device mismatch

* use decorator

* we can split mlp

* expected values update

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2025-07-02 12:25:26 +01:00
4d5822e65d [smolvlm] fix video inference (#39147)
* fix smolvlm

* better do as before, set sampling params in overwritten `apply_chat_template`

* style

* update with `setdefault`
2025-07-02 12:05:10 +02:00
9b2f5b66d8 fix default value of config to match checkpionts in LLaVa-OV models (#39163) 2025-07-02 09:45:50 +00:00
e8e0c76162 Add activation sparsity reference in gemma3n doc (#39160)
Add activation sparsity reference in the description of gemma3n
2025-07-02 04:11:03 +02:00
8e87adc45f fix llama tests (#39161)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 23:27:22 +02:00
4c1715b610 Update expected values (after switching to A10) (#39157)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* empty

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:54:31 +02:00
ab59cc27fe Suggest jobs to use in run-slow (#39100)
* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

* pr

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 20:19:06 +02:00
db2f535443 update bnb ground truth (#39117)
* update bnb resulte

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* set seed to avoid sampling different results

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix int8 tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 20:06:37 +02:00
260846efad fix: remove undefined variable (#39146) 2025-07-01 19:10:29 +02:00
cdfe49a4d0 Change @lru_cache() to @lru_cache to match styles from #38883. (#39093)
Match styles in #38883
2025-07-01 18:29:16 +02:00
f46798193e Fix: Ensure wandb logs config in offline mode (#38992)
* Fix: Ensure wandb logs config in offline mode

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-01 16:17:58 +00:00
fe838d6631 Fix missing fsdp & trainer jobs in daily CI (#39153)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-01 18:10:30 +02:00
1283877571 [superglue] fix wrong concatenation which made batching results wrong (#38850) 2025-07-01 12:14:44 +00:00
f8b88866f5 [VLMs] support passing embeds along with pixels (#38467)
* VLMs can work with embeds now

* update more models

* fix tests

* fix copies

* fixup

* fix

* style

* unskip tests

* fix copies

* fix tests

* style

* omni modality models

* qwen models had extra indentation

* fix some other tests

* fix copies

* fix test last time

* unrelated changes revert

* we can't rely only on embeds

* delete file

* de-flake mistral3

* fix qwen models

* fix style

* fix tests

* fix copies

* deflake the test

* modular reverted by fixes, fix again

* flaky test, overwritten

* fix copies

* style
2025-07-01 11:33:20 +00:00
20901f1d68 [typing] LlamaAttention return typehint (#38998)
* helo llama

* helo llama

* helo llama

* apply modular

* fix dia

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-07-01 11:29:52 +01:00
7a25f8dfdb [qwen2-vl] fix FA2 inference (#39121)
* fix FA2

* update is causal flag and remove mask for FA2

* update for FA2 with varlen path

* how the tests were passing with different devices?

* add comment and ref to the PR

* move mask preparation to base pretrained model

* seq len is the first dim, not second

* fix copies to fix GLM4V
2025-07-01 10:18:37 +00:00
def9663239 feat: support indivisible shards for TP model loading and TPlizing. (#37220)
* feat: support uneven loading and sharding
resolve merge conflicts
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: allow for empty tensor computations

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* test: add llama1b test case

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* due to q_proj colwise it has to be multi of 2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* refactor: use slice API

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-07-01 10:03:22 +00:00
06c4a4d499 fix caching_allocator_warmup with tie weights (#39070)
* fix caching_allocator_warmup with tie weights

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-01 11:32:20 +02:00
e435574721 🚨 Don't use cache in non-generative models (#38751)
* deprecate for 1 version

* style

* fix some tests

* fix esm

* skip for now, GC requires positional args but we have keyword args

* remove transpose for scores in modified models only

* skip fx trace tests
2025-07-01 09:08:21 +00:00
dbc98328da Several fixes for Gemma3n (#39135)
* remove the skips

* fix the epsilon to a small value (does not make sense otherwise)

* safeguard

* overload test_eager_matches_sdpa

* Update test_modeling_common.py

* skip appropriate tests

* correct no_split_layer

* fix all devices issue

* fix backward

* fix
2025-07-01 10:34:53 +02:00
d53518c5f2 Fix key mapping for VLMs (#39029)
* fix key mapping for VLMs

* use __mro__ instead

* update key mapping in save_pretrained
2025-07-01 09:47:53 +02:00
3457e8e73e [Whisper] update token timestamps tests (#39126)
* fixes

* update comment

* update for A10

* all a10

* all a10

* all a10

* all a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 21:55:36 +02:00
fe35eca7bd Update BigBirdPegasus model card (#39104)
* Update igbird_pegasus.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 10:42:56 -07:00
29a3f5ed8c switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8 (#39024)
* switch default xpu tp backend to pytorch built-in XCCL from pytorch 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update docs/source/en/perf_infer_gpu_multi.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

* Update perf_infer_gpu_multi.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-30 08:54:05 -07:00
9e0c865b8b docs: correct two typos in awesome-transformers.md (#39102)
* docs(awesome-projects): fix typo “Itt leverages” → “It leverages” (#39101)

closes #39101

* docs(awesome-projects): fix grammar “We provides” → “We provide” (#39101)

closes #39101
2025-06-30 08:53:43 -07:00
03db2700ab Enable XPU doc (#38929)
* fix example with dataset

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix device type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert torchao change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu torchao doc

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update chat_templating_multimodal.md

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* use full name for int8

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert int8 title

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-30 07:56:55 -07:00
ea0ea392e5 Fix chat (#39128) 2025-06-30 13:47:48 +00:00
ed36f8490e Licenses (#39127)
* Licenses

* Licenses
2025-06-30 15:25:36 +02:00
e8f90b5397 Split transformers chat and transformers serve (#38443)
* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* [chat] Split chat/serve (built on top of lysandre's PR) (#39031)

* Next token

* Split chat and serve

* Support both generation methods

* Style

* Generation Config

* temp

* temp

* Finalize serving.py

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>

* Finalize chat.py

* Update src/transformers/commands/serving.py

Co-authored-by: célina <hanouticelina@gmail.com>

* Lucain's comments

Co-authored-by: Lucain <lucain@huggingface.co>

* Update

* Last comments on PR

* Better error handling

* Better error handling

* CI errors

* CI errors

* Add tests

* Fix tests

* Fix tests

* streaming tool call

* abstract tool state; set tool start as eos

* todos

* server working on models without tools

* rm chat's deprecated flags

* chat defaults

* kv cache persists across calls

* add server docs

* link

* Update src/transformers/commands/serving.py

* Apply suggestions from code review

* i love merge conflicts

* solve multi turn with tiny-agents

* On the fly switching of the models

* Remove required positional arg

---------

Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>

* Protect names

* Fix tests

---------

Co-authored-by: =?UTF-8?q?c=C3=A9lina?= <hanouticelina@gmail.com>
Co-authored-by: Lucain <lucain@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-06-30 15:10:53 +02:00
539c6c2fa8 All CI jobs with A10 (#39119)
all a10

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-30 14:23:27 +02:00
ed9f252608 docs: Gemma 3n audio encoder (#39087)
Updating Gemma 3n docs and docstrings to clarify the relationship
between the newly trained audio encoder used in Gemma 3n and the USM
model from the original paper.
2025-06-30 14:10:51 +02:00
4a79bf947d Fix some bug for finetune and batch infer For GLM-4.1V (#39090)
* update

* 1
2025-06-30 12:16:22 +02:00
2100ee6545 fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8 (#39116)
* fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* zamba2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* internvl

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tp cases

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-30 11:49:03 +02:00
ccf2ca162e skip some test_sdpa_can_dispatch_on_flash (#39092)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 23:08:14 +02:00
a11f692895 Fixes the failing test test_is_split_into_words in test_pipelines_token_classification.py (#39079)
* Fix test pipelines token classification for is_split_into_words

* Fix incorrect import format
2025-06-27 19:25:32 +01:00
18143c76bf Sandeepyadav1478/2025 06 19 deberta v2 model card update (#38895)
* [docs]: update deberta-v2.md model card

* chore: req updates

* chore: address code review feedback and update docs

* chore: review feedback and updates

* chore: model selection updates

* chores: quantizations review updates
2025-06-27 10:35:30 -07:00
02a769b058 [fix] Add FastSpeech2ConformerWithHifiGan (#38207)
* add to mapping

* oops

* oops

* add to config_mapping_names

* revert

* fix?

* config-mapping-names

* fix?

* fix?
2025-06-27 09:38:21 -07:00
c2dc72bb5f TST Fix PEFT integration test bitsandbytes config (#39082)
TST Fix PEFT integration test bitsandbytes config

The PEFT integration tests still used load_in_{4,8}_bit, which is
deprecated, moving to properly setting BitsAndBytesConfig. For 4bit,
also ensure that nf4 is being used to prevent

> RuntimeError: quant_type must be nf4 on CPU, got fp4
2025-06-27 18:33:11 +02:00
c8064bea9a Fix: unprotected import of tp plugin (#39083) 2025-06-27 17:28:05 +02:00
dd7dc4a4a2 Add Fast Image Processor for Chameleon (#37140)
* Add Fast Image Processor for Chameleon

* add warning to resize and move blend_rgba to convert_to_rgb

* Remove unrelated files

* Update image_processing_chameleon_fast to use auto_docstring

* fix equivalence test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 15:26:57 +00:00
6d773fc3bc fix dots1 tests (#39088)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 16:54:11 +02:00
c8764ab935 guard torch distributed check (#39057)
* guard torch distributed check

* Update src/transformers/pipelines/base.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-06-27 14:49:47 +00:00
49d9fd49bd Add Fast Image Processor for mobileViT (#37143)
* Add image_processing_mobilevit_fast.py

* Fix copies

* update _preprocess for channel_flip

* Update for batched image processing

* Resolve merge conflicts with main

* Fix import order and remove trailing whitespace (ruff clean-up)

* Fix copy inconsistencies

* Add NotImplementedError for post_process_semantic_segmentation to satisfy repo checks

* Add auto_docstring

* Adjust style

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/mobilevit/image_processing_mobilevit_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Delete not used function

* test: add missing tests for  and

* Add post_process_semantic_segmentation to mobilevit_fast.py

* Add preprocess function to image_processing_mobilebit_fast.py

* ruff check for formatting

* fix: modify preprocess method to handle BatchFeature correctly

* Remove logic for default value assignment

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove normalization adn RGB conversion logic not used in slow processor

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Simplify return_tensors logic using one-liner conditional expression

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove unused normalization and format parameters

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add **kwargs and remove default values in _preprocess

* add slow_fast equivalence tests for segmentation

* style: autoformat code with ruff

* Fix slow_fast equivalence test

* merge + remove skipped test

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-27 14:40:24 +00:00
4336ecd1ea add fast image processor nougat (#37661)
* add fast image processor nougat

* test fixes

* docstring white space

* last fixes

* docstring_type

* tolerance unit test

* fix tolerance

* fix rtol

* remove traling white space

* remove white space

* note for tolerance unit test

* fix tests

* remove print

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-06-27 14:39:43 +00:00
0c35280e58 TST PEFT integration tests with pipeline generate (#39086)
Some PEFT integration tests involving text generation pipelines were
failing since #38129 because the base model is too small to generate
longer sequences. Setting max_new_tokens fixes this.
2025-06-27 15:58:10 +02:00
993665a5ff fixed typo for docstring in prepare_inputs method (#39071) 2025-06-27 13:57:56 +00:00
839893c86b fix mistral3 tests (#38989)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-27 15:44:10 +02:00
2b85b6ce19 [Whisper] 🚨 Fix pipeline word timestamp: timestamp token is end of token time !!! (#36632)
* timestamp token is end of token time !!!

* ensure correct alignment between tokens and timestamp tokens

* ignore input tokens for DTW computation

* use num_frames to avoid token timestamp hallucinations

* token timestamps test updates !

* num_frames: deprecate and use attention_mask instead

* avoid breaking change

* fix the pipeline usage for chunk approach

* make style

* better logging

* better logging

* make style

* update tests with correct values
2025-06-27 12:51:43 +00:00
9c8d3a70b8 Pipeline: fix unnecessary warnings (#35753)
* return attention mask

* use correct model input name

* fix

* make
2025-06-27 14:32:03 +02:00
1750c518dd Add EoMT Model || 🚨 Fix Mask2Former loss calculation (#37610)
* Initial Commit

* up

* More changes

* up

* Only mask_logits mismatch

* close enough logits debug later

* fixes

* format

* Add dummy loss

* Close enough processing for semantic seg

* nit

* Added panoptic postprocessor

* refactor

* refactor

* finally fixed panoptic postprocessor

* temp update

* Refactor ForUniversalSegmentation class

* nits and config update

* Few fixes and inference matches

* change mapping

* Added training support but loss slightly off 🥲

* Loss is matching 😀

* update

* Initial tests skelton

* changes

* tests update

* more modular

* initial tests

* updates

* better docstrings

* changes

* proc tests passing :)

* Image processor update

* tiny change

* QOL changes

* Update test w.r.t latest attn refactor

* repo-consistency fixes

* up

* Image proc fix and integration tests :)

* docs update

* integration tests

* fix

* docs update 🥰

* minor fix

* Happy CI

* fix

* obvious refactoring

* refactoring w.r.t review

* Add fask image proc skelton

* Fast Image proc and cleanups

* Use more modular

* tests update

* Add more tests

* Nit

* QOL updates

* change init_weights to torch default

* add eager func coz of make style

* up

* changes

* typo fix

* Updates

* More deterministic tests

* More modular

* go more modular 🚀

* up

* dump

* add supprot for giant ckpts

* overhaul

* modular

* refactor

* instace seg is ready

* cleanup

* forgot this

* docs cleanup

* minor changes

* EoMT - > Eomt

* Happy CI

* remove redundant comment

* Change model references

* final change

* check annealing per block

* My other PR changes 😂

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-27 14:18:18 +02:00
0106a50a6b fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8 (#39069)
* fix a bunch of XPU UT failures on stock PyTorch 2.7 and 2.8

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* qwen3

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* quanto

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* models

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* idefics2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-27 14:01:53 +02:00
cb17103bd5 Uninstallling Flash attention from quantization docker (#39078)
* update

* revert
2025-06-27 13:51:46 +02:00
371c471113 Fix initialization of OneFormer (#38901)
* fix initialization of OneFormer

* remove redundant initializations

* remove redundant initializations

* remove redundant initializations

* keep BC
2025-06-27 12:39:37 +02:00
540a10848c fix Gemma3nProcessorTest (#39068)
* fix

* fix

* oups forgot style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-27 12:28:10 +02:00
0d66ef7792 Cleanup Attention class for Siglip and dependent models (#39040)
* cleanup attention class

* More models

* more models

* Changes

* make style

* Should fix CI

* This should work 🙏
2025-06-27 12:14:09 +02:00
1ccc73dee9 [Whisper] fix shape mismatch in tests (#39074)
fix shape mismatch
2025-06-27 09:27:42 +00:00
a52478253b [docs] Tensor parallelism (#38241)
* updates

* feedback

* badges

* fix?

* fix?

* fix?

* fix?
2025-06-26 14:40:45 -07:00
84e8696cae [docs] @auto_docstring (#39011)
* refactor

* feedback
2025-06-26 14:21:54 -07:00
018855de63 Update PEGASUS-X model card (#38971)
* Update PEGASUS-X model card

* Add cache_implementation argument in quantization code example

* Update CLI example

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove TensorFlow and Flax badges

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 13:54:48 -07:00
757c26fb40 [docs] Model contribution (#38995)
improve
2025-06-26 12:25:14 -07:00
b372bb5ed1 fix layoutlmv3 tests (#39050)
* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 20:07:17 +02:00
f171e7e884 Update SuperPoint model card (#38896)
* docs: first draft to more standard SuperPoint documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs: reverted changes on Auto classes

* docs: addressed the rest of the comments

* docs: remove outdated reference to keypoint detection task guide in SuperPoint documentation

* Update superpoint.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-26 10:13:06 -07:00
2f50230c59 fix t5gemma tests (#39052)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:48:14 +02:00
23b7e73f05 fix test_compare_unprocessed_logit_scores (#39053)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 18:36:56 +02:00
58c7689226 [Flex Attn] Fix torch 2.5.1 incompatibilities (#37406)
* remove compile on mask creation, ensure kv blocks do not explode on indices

* trigger ci

* switch dynamic compilation to false

* patch new masking functions as well

* add len check

* i was wrong

* last comment
2025-06-26 18:23:55 +02:00
5154497607 Dev version 2025-06-26 18:04:36 +02:00
0a8081b03d [Modeling] Fix encoder CPU offloading for whisper (#38994)
* fix cpu offloading for whisper

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* unskip offloading tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert small change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-26 15:56:33 +00:00
c63cfd6a83 Gemma 3n (#39059)
* Gemma 3n

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3p5RMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)

* Adding gemma3p5 text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* regenerating modeling file after syncing to HEAD

* Use torch.std(..., unbiased=False) for activation sparsity (#8)

* Refactoring to a single QVK Norm (#13)

* AltUp: support scale_corrected_output (#14)

* Converts einsums to nn.Linear (#7)

* Converts einsums to nn.Linear

* Removing unused variables

* Aligning SharedKVCache with HybridCache (#11)

* Alinging SharedKVStore with HybridCache

* Remove KVStore. Refactor apply_rotary_pos_emb for sharing

* Addressing review comments

* Supporting split modality embeddings in Gemma3n (#10)

* Adding the Embedder class

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation

* Apply suggestions from code review

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, prop drilling audio and vision configs to the text config

* Removing TODO's that have been addressed

* Simplify Embedder init and add audio embeddings

* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder

* Refactoring vision and audio embeddings into ConditionalGeneration model

---------

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating attention mask for Gemma 3.5 (#15)

* xxx_token_index to xxx_token_id

* remvoing deprecated last_cache_position

* Removing references to SigLIP

* Always init per-layer inputs

* Using torch.finfo().min for epsilon_tensor

* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas

* fix modular GEMMA3N_INPUTS_DOCSTRING

* Gemma3nAttention inherits from Gemma3Attention

* Modular inheritance fixes

* CausalLM conversion script for 4B model (#16)

* Add Gemma3n Audio Encoder (#6)

* initial commit of Gemma 3.5 scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)

* Adding gemma3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3.5 (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* CausalLM conversion script for 4B model

* inv_timescales to non-persistent buffer

* Addressing audio encoder Attention feedback

* Addressing Gemma3nAudioSSCPConvBlock feedback

* Addressing Gemma3nAudioConformerAttention feedback

* Addressing padding feedback

* Weights conversion loads audio state dict

* Always use vision_config so saving works

* Token id updates for configs

* Stubs for interleaving audio embs

* Addressing reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Fixing cache access error

* Removing duplicate code from a bad merge

* Gemma 3n Text + Vision Part 1 (#17)

* testing utilities for numerics comparisons

* Corrected einsum to nn.Linear weights conversion

* Inherit scaled word embs from Gemma3 not Bart

* Fixing transposes for collapsed linears

* More transpose fixes

* numpy api fix

* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True

* Force AltUp  to float32

* Updating debugging script for AudioEncoder debugging

* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs

* Correcting attention einsum conversions

* RMSNorm in type of x

* Fixing douplicate laurel norm/gating

* KV sharing using the right previous indices

* Refactor kv shared index computation. Correct frac_shared_layers

* Use num_shared_layers instead of inferring from a fraction

* fixing a bug for logging

* Fix shared data_ptrs in altup inits

* rope: adjust proj -> norm -> rope to preserve computation (#20)

* rope: adjust proj -> norm -> rope to preserve computation

* Removing some breaking language model fluff in ConditionalGeneration

* Consolidate query_states transforms

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Vectorize the loops in AltUp (#19)

* Vectorize the loops in AltUp

* fix typo

* Expanding to support batched inputs

* remove extra debug script

* Fix AltUp.forward

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel

* Convert norm to 1/sqrt (#21)

* Convert norm to 1/sqrt

* Scale shift change per Phil's rec

* Adding default activation sparsity

* Fixing 2B config in weights conversion script

* Fixing RMSNorm parameters - adding scale_shift and with_scale

* Correcting query pre-attention scaling

* Adding query_rescale_scalar to text config

* Adding layer_idx to MLP

* Permafix for input_layernorm

* Use 1/sqrt instead of rsqrt in DecoderLayer

* Fix o_proj conversion

* Conversion script update for vision encoder

* Removing logging for debugging timm model

* Fixing bugs in Gemma3nForConditionalGeneration for text generation

* Generating the modeling_gemma3n.py file

* Removing the addition of an erroneous line in the modeling file

* Adding gemma3n text model to modeling_auto

* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds

* Updating the modeling file with the latest bugfix changes

* Updating models/auto for Gemma 3n

* using AutoTokenizer in forward test

* Adding processing_gemma3n.py

* Gemma 3n configured for AutoModel. Conversion script updated.

* Removing errant merge artifacts

---------

Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Removing errant debugging statements from Gemma 3

* Gemma3n audio model (#18)

* testing utilities for numerics comparisons

* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock

* Add audio version of forward script based on RyanMullins' implementation

* Updating to match encoder tests. WIP: config question needs resolving

* Updates to audio classes to enable end-to-end running

* Removing vestigial classes, cleaning up print statements

* Adding SiLU / Swish to audio conformer feed forward block

* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio

* Adding outputs to audio test

* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model

* Update forward test to load from local weights

* Update conversion to process / output audio layers

* Update __all__ to export audio encoder

* AutoModel registration for Gemma 3n Audio

* Use AutoModel for ConditionalGeneration.audio_tower

* Fixing input_proj_linear transpose

* Fixing Gemma3NanoAudioConformerAttention.post conversion

* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion

* Correcting indentation issue on Gemma3p5RMSNorm

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Text + Vision Part 2 (#23)

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3p5.py

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Updating configs for the 2B variant in the conversion script

* Using final generation config in conversion script

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Audio Integration (#12)

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)

* Adding Gemma 3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemma3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3n

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* Converting sl.Frontend to FeatureExtractor

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3n.py

* Update modular

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Draft of audio data in chat template

* Removing image processing. Using SigLIP instead.

* Audio input going end-to-end

* Fixing dtype issues in audio encoder

* x-lib formatting consistency

* Adding example data

* Save preprocessor_config.json from conversion script

* Instrumentaiton for debugging

* Additional instrumentation for preprocessing debugging

* Updates to preprocessor, padding; produces correct end-to-end results on sample

* Tackling configuraiton TODOs

* Start of feature extractor refatcor

* Adds Numpy version of USM extractor, removes Torch version and dependencies

* Fixing AltUp.correct coef permute

* Supporting batches of single audio segment inputs

* Docstrings updates for config

* In-lining audio feature extraction

* Adjustments to conversion script and smoke test script

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>

* Gemma 3n renaming

* Removing test data and utilities

* Renaming test files

* Gemma 3n refactor

* Fix tokenizer config in conversion script

* Address reviewer feedback

* FeatureExtractor returns float32 by default

* Adding basic tests for audio, and input name for audio encoder

* Audio integration test, updates to model_id for other integration tests

* Use scales for q and k norms (#26)

* Update audio integration test to use HF dataset

* Reviewer feedback

* Expand embedding table to full vocab size in weights conversion

* Mix-n-match MatFormers for Gemma 3n (#25)

* Remove in-place operations (#30)

* chore: removing inplace ops

* remove [tensor] * n pattern

* chore: reviewer feedback in AudioEncoder and AltUp

* More grad clipping

* Dynamo compatibility

* fix: cache slicing error

* chore: simplify shared kv cache slicing

* chore: vision encoder rename in timm

* fix: image processor do_normalize=False

* fixup: style

* chore: model_doc

* fix: docs for code quality

* chore: repo consistency

* fix: RMSNorm in float as in prior Gemmas

* fix: per_layer_inputs = None

* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint

* chore: repo consistency

* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)

* Add initial unit tests for Gemma3nAudioFeatureExtractor

* Add basic unit tests for Gemma3nProcessor (#28)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* parameterize tests

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* chore: code style

* fix: test cases

* style and consistency

* fix config in the test to be coherent with layer cache sharing

* fix hidden states in tests and code

* inits and mappings

* fix modality prefixes

* test order and prefixes

* fix test exception

* fix class order and reduce model size for faster tests

* restore _checkpoint_conversion_mapping to load Caual from Conditional

* fix config mapping!

* fix: reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix import test

* add model args

* auto_docstring

* replace test path

* consistency

* skip tests for now

* fix docstring for doc builder

* skip unused attr

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-26 17:55:47 +02:00
3e5cc12855 [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051)
* rm tf/flax tests

* more flax deletions

* revert fixture change

* reverted test that should not be deleted; rm tf/flax test

* revert

* fix a few add-model-like tests

* fix add-model-like checkpoint source

* a few more

* test_get_model_files_only_pt fix

* fix test_retrieve_info_for_model_with_xxx

* fix test_retrieve_model_classes

* relative paths are the devil

* add todo
2025-06-26 16:25:00 +01:00
cfff7ca9a2 [Whisper] Pipeline: handle long form generation (#35750)
* handle long form generation

* add warning

* correct incorrect in place token change

* update test to catch edge case

* make style

* update warning

* add doc
2025-06-26 14:33:31 +00:00
02ecdcfc0f add _keep_in_fp32_modules_strict (#39058)
* add _keep_in_fp32_modules_strict

* complete test
2025-06-26 13:55:28 +00:00
vb
d973e62fdd fix condition where torch_dtype auto collides with model_kwargs. (#39054)
* fix condition where torch_dtype auto collides with model_kwargs.

* update tests

* update comment

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 14:52:57 +02:00
44b231671d [qwen2-vl] fix vision attention scaling (#39043)
scale lost its `-` when refactoring
2025-06-26 14:06:52 +02:00
ae15715df1 polishing docs: error fixes for clarity (#39042)
* fix duplicate deprecate_models.py

* fix duplicate modular_model_converter.py
2025-06-26 11:56:31 +00:00
3abeaba7e5 Create test for #38916 (custom generate from local dir with imports) (#39015)
* create test for #38916 (custom generate from local dir with imports)
2025-06-26 13:54:36 +02:00
25c44d4b68 Internvl fix (#38946)
* Image processor compile fix (#38540)

* Added a compile-friendly versiom of resize to BaseImgProcessorFast

* Changed qwen2 processor to use its parent class .resize

* Style

* underlined issue only happens on AMD w/ comment and bool check

* Fixed some utils functions

* Fixed the same issue for bridgetower

* Fixed the same issue for llava_next

* Repo consistency for llava onevision

* Update src/transformers/image_processing_utils_fast.py

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

* Added an Expectation to an internvl test

* Made qwen2_vl use the resize method of its parent clas

* Changed to torch.where

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
2025-06-26 13:44:59 +02:00
f85b47d1b8 [Generate] Fix no grad on some models (#39008)
fixes on torch no grad for generate
2025-06-26 13:06:09 +02:00
583db52bc6 Add Dia model (#38405)
* add dia model

* add tokenizer files

* cleanup some stuff

* brut copy paste code

* rough cleanup of the modeling code

* nuke some stuff

* more nuking

* more cleanups

* updates

* add mulitLayerEmbedding vectorization

* nits

* more modeling simplifications

* updates

* update rope

* update rope

* just fixup

* update configuration files

* more cleanup!

* default config values

* update

* forgotten comma

* another comma!

* update, more cleanups

* just more nits

* more config cleanups

* time for the encoder

* fix

* sa=mall nit

* nits

* n

* refacto a bit

* cleanup

* update cv scipt

* fix last issues

* fix last nits

* styling

* small fixes

* just run 1 generation

* fixes

* nits

* fix conversion

* fix

* more fixes

* full generate

* ouf!

* fixes!

* updates

* fix

* fix cvrt

* fixup

* nits

* delete wrong test

* update

* update

* test tokenization

* let's start changing things bit by bit - fix encoder step

* removing custom generation, moving to GenerationMixin

* add encoder decoder attention masks for generation

* mask changes, correctness checked against ad29837 in dia repo

* refactor a bit already --> next cache

* too important not to push :)

* minimal cleanup + more todos

* make main overwrite modeling utils

* add cfg filter & eos filter

* add eos countdown & delay pattern

* update eos countdown

* add max step eos countdown

* fix tests

* fix some things

* fix generation with testing

* move cfg & eos stuff to logits processor

* make RepetitionPenaltyLogitsProcessor flexible

- can accept 3D scores like (batch_size, channel, vocab)

* fix input_ids concatenation dimension in GenerationMixin for flexibility

* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.

* Add stopping criteria

* refactor

* move delay pattern from processor to modeling like musicgen.

- add docs
- change eos countdown to eos delay pattern

* fix processor & fix tests

* refactor types

* refactor imports

* format code

* fix docstring to pass ci

* add docstring to DiaConfig & add DiaModel to test

* fix docstring

* add docstring

* fix some bugs

* check

* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first

* experimental testing of left padding for first channel

* whoops

* Fix merge to make generation work

* fix cfg filter

* add position ids

* add todos, break things

* revert changes to generation --> we will force 2d but go 3d on custom stuff

* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos

* some first fixes to get to 10. in generation

* some more generation fixes / adjustment

* style + rope fixes

* move cfg out, simplify a few things, more todos

* nit

* start working on custom logit processors

* nit

* quick fixes

* cfg top k

* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar

* lets keep changes to core code minimal, only eos scaling is questionable atm

* simpler eos delay logits processor

* that was for debugging :D

* proof of concept rope

* small fix on device mismatch

* cfg fixes + delay logits max len

* transformers rope

* modular dia

* more cleanup

* keep modeling consistently 3D, generate handles 2D internally

* decoder starts with bos if nothing

* post processing prototype

* style

* lol

* force sample / greedy + fixes on padding

* style

* fixup tokenization

* nits

* revert

* start working on dia tests

* fix a lot of tests

* more test fixes

* nit

* more test fixes + some features to simplify code more

* more cleanup

* forgot that one

* autodocs

* small consistency fixes

* fix regression

* small fixes

* dia feature extraction

* docs

* wip processor

* fix processor order

* processing goes brrr

* transpose before

* small fix

* fix major bug but needs now a closer look into the custom processors esp cfg

* small thing on logits

* nits

* simplify indices and shifts

* add simpler version of padding tests back (temporarily)

* add logit processor tests

* starting tests on processor

* fix mask application during generation

* some fixes on the weights conversion

* style + fixup logits order

* simplify conversion

* nit

* remove padding tests

* nits on modeling

* hmm

* fix tests

* trigger

* probably gonna be reverted, just a quick design around audio tokenizer

* fixup typing

* post merge + more typing

* initial design for audio tokenizer

* more design changes

* nit

* more processor tests and style related things

* add to init

* protect import

* not sure why tbh

* add another protect

* more fixes

* wow

* it aint stopping :D

* another missed type issue

* ...

* change design around audio tokenizer to prioritize init and go for auto - in regards to the review

* change to new causal mask function + docstrings

* change ternary

* docs

* remove todo, i dont think its essential tbh

* remove pipeline as current pipelines do not fit in the current scheme, same as csm

* closer to wrapping up the processor

* text to audio, just for demo purposes (will likely be reverted)

* check if it's this

* save audio function

* ensure no grad

* fixes on prefixed audio, hop length is used via preprocess dac, device fixes

* integration tests (tested locally on a100) + some processor utils / fixes

* style

* nits

* another round of smaller things

* docs + some fixes (generate one might be big)

* msytery solved

* small fix on conversion

* add abstract audio tokenizer, change init check to abstract class

* nits

* update docs + fix some processing :D

* change inheritance scheme for audio tokenizer

* delete dead / unnecessary code in copied generate loop

* last nits on new pipeline behavior (+ todo on tests) + style

* trigger

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
2025-06-26 11:04:23 +00:00
5995cfa0a0 Fix Bad Outputs in Fast Path for GraniteMoeHybrid (#39033)
Fix bug in previous state setting
2025-06-26 09:45:57 +02:00
22b0a89878 Granite speech speedup + model saving bugfix (#39028)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum
2025-06-26 09:44:17 +02:00
1d45d90e5d [tests] remove TF tests (uses of require_tf) (#38944)
* remove uses of require_tf

* remove redundant import guards

* this class has no tests

* nits

* del tf rng comment
2025-06-25 17:29:10 +00:00
d37f751797 Two ReDOS fixes (#39013)
* two_redos_fixes

* Fix two redos issues

* Just don't use RE at all
2025-06-25 17:31:26 +01:00
551e48f182 [Kyutai-STT] correct model type + model id (#39035)
* correct model type + model id

* udpate doc

* init fix

* style !!!
2025-06-25 16:09:00 +00:00
dad0e87c79 Add SmolLM3 (#38755)
* init smollm3

* integration tests

* config quirks

* docs stub

* rests round 2

* tests round 3

* tests round 4

* bring SWA back

* config checker pls

* final checkpoint

* style and copies

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-25 15:12:15 +00:00
3233e9b7c3 refactor: remove custom BarkLayerNorm (#39003)
`nn.LayerNorm` supports `bias=False` since Pytorch 2.1
2025-06-25 16:07:52 +01:00
3c1d4dfbac Fix grammatical error in models documentation (#39019) 2025-06-25 14:55:22 +00:00
858f9b71a8 Remove script datasets in tests (#38940)
* remove trust_remote_code

* again

* Revert "Skip some tests for now (#38931)"

This reverts commit 31d30b72245aacfdf70249165964b53790d9c4d8.

* again

* style

* again

* again

* style

* fix integration test

* fix tests

* style

* fix

* fix

* fix the last ones

* style

* last one

* fix last

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-25 14:31:20 +00:00
3c322c9cdf fix gemma3 grad acc (#37208)
* fix gemma3 grad acc

* fix

* fix

* fix

* fix

* rmv print

* rm

* Update setup.py

* Apply style fixes

* propagate the changes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:28:44 +02:00
860b898d03 fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) (#38983)
* fix: astronomical loss with ModernBERT when using gradient checkpointing

* update the modling fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:11:18 +02:00
a2eb75c891 Support for Flash Attention 3 (#38972)
* Support `flash_attn_3`
Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper

- Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...`

An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated

Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged

* Add tests for Flash Attention 2 and 3 parity

* ci fix

* FA2 compatibiity
- `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids`
- Remove bettertransformer check in Flash Attention 3
- Merge tests
- Add licensing

* ci fix

* Test naming consistency

* ci fix

* Deprecation warning for `prepare_fa2_from_position_ids`

* ci fix
2025-06-25 14:39:27 +02:00
de98fb25a3 Fix the seamless_m4t cannot work on Gaudi (#38363)
* Fix the seamless_m4t cannot work on Gaudi

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Refine the patch

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix seamless_m4t_v2 crash

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Use the patched_gather

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove debug logs

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove useless modifications

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add hpu check

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-06-25 12:40:01 +02:00
7503cb9113 [Model] add dots1 (#38143)
* add dots1

* address comments

* fix

* add link to dots1 doc

* format

---------

Co-authored-by: taishan <rgtjf1@163.com>
2025-06-25 11:38:25 +02:00
3ef8896906 Encoder-Decoder Gemma (#38332)
* Initial submit

* Fix bugs:
1. add __init__ file
2. tied word embedding
3. support flash/flex attention
4. model saving and loading

* Code refactor:
* Rename encdecgemma to t5gemma.
* Split attention into self- and cross-attention
* Split stack into encoder and decoder
* Add test cases
* Add auto configuration

* Update configurations.

* Fix bugs related to copy and attribute checks

* Fix type union

* Fix merge errors

* run ruff format

* Run make style and update tests.

* Add t5gemma model doc.

* ruff and style formatting.

* Add missed module config.

* Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.).

* Update model doc.

* Minor updates following Arthur's comments:
* replace docstrings with auto_docstrings
* remove checkpoint layers
* remove deprecate_kwargs

* fix rebase errors

* Fix docstring issues.

* fix t5gemma doc issue.

* run ruff format

* Updates:
* split encoder-only model out
* make t5gemmamodel encoder-decoder only
* update token and sequence classification
* update tests
2025-06-25 09:05:10 +00:00
af9870265e GLM-4.1V Model support (#38431)
* 20250508 Model Architecture

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* update 1447

* 0526

* update

* format

* problem

* update

* update with only image embed diff

* Final

* upload

* update

* 1

* upload with ruff

* update

* update

* work

* 1

* 1

* update with new note

* 2

* Update convert_glm4v_mgt_weights_to_hf.py

* Update tokenization_auto.py

* update with new format

* remove rmsnrom

* draft with videos

* draft

* update

* update

* fix for review problem

* try to remove min_pixel

* update

* for test

* remove timestamps

* remove item

* update with remove

* change

* update 2200

* update

* Delete app.py

* format

* update

* Update test_video_processing_glm4v.py

* 1

* 2

* use new name

* Update test_video_processing_glm4v.py

* remove docs

* change

* update for image processors update

* 2108

* 2128

* Update modular_glm4v.py

* 1

* update some

* update

* rename

* 1

* remove tests output

* 2

* add configuration

* update

* Update test_video_processing_glm4v.py

* fix simple forward tests

* update with modular

* 1

* fix more tests

* fix generation test

* fix beam search and init

* modular changed

* fix beam search in case of single-image/video. Fails if multiple visuals per text

* update processor

* update test

* pass

* fix beam search

* update

* param correct

* Update convert_glm4v_mgt_weights_to_hf.py

* 1

* Update test_modeling_glm4v.py

* 4

* 2

* 2123 video process

* 2

* revert

* 1

* 2

* revert processing

* update preprocesor

* changed

* 1

* update

* update

* 6

* update

* update

* update

* Delete tmp.txt

* config

* Update video_processing_glm4v.py

* apply modular correctly

* move functions

* fix order

* update the longest_edge

* style

* simplify a lot

* fix random order of classes

* skip integration tests

* correctly fix the tests

* fix TP plan

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-25 10:43:05 +02:00
7b3807387b Drop unnecessary tokens in GPT2Model generation (#39016)
Drop unnecessary tokens in GPT2Model generation.

Co-authored-by: Yi Pan <conlesspan@outlook.com>
2025-06-25 08:29:00 +00:00
e212ff9e6a [video processor] support torchcodec and decrease cuda memory usage (#38880)
* don't move the whole video to GPU

* add torchcodec

* add tests

* make style

* instrucblip as well

* consistency

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/video_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-25 08:23:37 +00:00
11d0feacce [AutoModelForMaskGeneration] Remove duplicate code (#38622)
Remove duplicate code
2025-06-25 10:00:13 +02:00
3ee72af6b6 Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332)
* Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1

* fix code format

* add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1

* add assert loss is not nan
2025-06-25 07:58:34 +00:00
ae32f1ad11 Add zero dim tensor check when using flash_attention (#38280)
* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

---------

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
2025-06-25 09:48:50 +02:00
ca402e2116 [LightGlue] Fixed attribute usage from descriptor_dim to keypoint_detector_descriptor_dim (#39021)
fix: fix descriptor dimension handling in LightGlue model
2025-06-24 23:32:07 +01:00
48b6ef0238 Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (#38954)
* Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.)

* Update quicktour.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-24 11:48:15 -07:00
ea9a30923e [HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing (#39002)
* ThreadPool instead of Pool for parallel pre-processing

* ThreadPool only if hpu available
2025-06-24 20:24:50 +02:00
995666edb5 Skip sdpa dispatch on flash test due to unsupported head dims (#39010) 2025-06-24 20:16:56 +02:00
f367c6337d Update self-comment-ci.yml user list (#39014)
add ivarflakstad to self-comment-ci.yml
2025-06-24 20:13:36 +02:00
67d36dc1d7 Fix bugs in DynamicCache (#37880)
* Fix bugs in DynamicCache

* Updarte

* Update

* Lint

* lint

* Rename test

* update

* update
2025-06-24 19:43:40 +02:00
6bdd4ec952 Add kyutai stt (#38909)
* first draft

* cleaner version

* udpate tests + modeling

* add tests

* init

* udpate test_modeling_common

* fix tests

* csm Processor draft

* convertion update

* mimi cache padding convolutions draft

* mimi streaming udpates

* update mimi padding cache test

* udpate cache padding mimi test

* make style mimi

* updates generate moshi asr

* moshi asr integration tests (single + batched)

* update tests

* update conversion script

* good default sliding window value

* udpdate generate

* update test checkpoint

* nit

* fix mimi

* fix codec prefix

* revert

* revert

* update config

* update config

* unnecessary mimi input restriction

* remove delay in tokens

* remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask

* test update

* modular update

* make style

* nit

* rename

* create codec model generation config at init

* remove delay

* max_new_tokens/length warning

* correct conv1 padding cache import for modular

* nit

* fix on encoder_past_key_values

* convert modular

* move frame_size to config

* move frame_size to config

* update test name

* handle first token is bos

* better handling of max_new_tokens

* fix

* fix batch size in test input prep

* update docstring

* convert modular

* make style

* make style

* add feature extractor

* correct modular convention name for feature_extraction file

* update convertion script

* doc processor

* update doc

* udpate init

* update model type

* fixes

* update tests

* fix

* make

* add doc

* nit

* fix

* doc

* auto mappings

* doc

* nit

* convert modular

* doc

* nit

* extend _keep_in_fp32_modules to enforce fp32

* renaming to stt

* doc update + test update

* doc fixes

* doc fix

* doc fix

* fix musicgen tests

* fix musicgen tests

* make style

* fix musicgen tests

* correct frame_rate config param for mimi

* update mimi test

* revert update mimi test

* enforce cpu test

* move cache init in cache class

* convert modular

* docstring update

* update model id

* feature_extractor -> feature_extraction (SEW)

* convert modular

* update model id
2025-06-24 18:01:15 +02:00
08bf7f1afe Add kernelize to transformers (#38205)
* fix

* fix

* fix flow

* remove non compiling path

* change

* style

* fix

* update

* update pin

* revert
2025-06-24 17:38:54 +02:00
be10d4df60 Granite speech - minor fixes to support training with the HF trainer (#38833)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property
2025-06-24 17:06:52 +02:00
e1e11b0299 Fix undeterministic order in modular dependencies (#39005)
* sort correctly

* Update modeling_minimax.py

* Update modular_model_converter.py
2025-06-24 17:04:33 +02:00
bdf5fb70aa Skip non-selected experts for qwen3_moe (#38133)
* fix(qwen3moe): skip experts with no workload

* avoid tolist and also update other moe models

* fix: should squeeze 0-dim only
2025-06-24 16:33:48 +02:00
719058c625 Update attention_visualizer.py (#37860) 2025-06-24 16:21:36 +02:00
9f42c1f192 Added scikit-learn to the example image-classification requirements.txt (#37506)
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-24 15:24:02 +02:00
1636a7bcb9 Fixes for Arcee model (#39001)
* fix modular

* Update modular_arcee.py

* fix
2025-06-24 15:23:52 +02:00
71de20b818 Add Arcee model support (#38621)
* Add Arcee model support to transformers

- Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer
- Use LlamaTokenizer for tokenization
- Add FX graph support for Arcee models
- Create lazy loading module structure for Arcee

* feat: update YARN scaling and RoPE validation for Arcee model

* feat: add auto_docstring checkpoint config to Arcee model classes

* docs: add pre-trained model weights reference to Arcee configuration files

* refactor: move RoPE utilities to dedicated modeling_rope_utils module

* Add comprehensive test suite for Arcee model

- Add test_modeling_arcee.py following standard transformers test patterns
- Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add specific test for ReLU² activation in ArceeMLP
- Add RoPE scaling tests including YARN support
- Follow CausalLMModelTest pattern used by similar models

* Add documentation for Arcee model

- Add comprehensive model documentation with usage examples
- Include all model variants in autodoc
- Add to table of contents in proper alphabetical order
- Fixes documentation coverage for Arcee model classes

* Make style/fixup

* fix copyright year

* Sync modular conversion

* revert in legacy supported models in src/transformers/utils/fx

* cleaned redundant code in modular_arcee.py

* cleaned testing

* removed pretraining tp

* fix styles

* integration testing

---------

Co-authored-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>
2025-06-24 15:05:29 +02:00
23c89a6732 [Attention] Small fix on output attentions (#38948)
small fix
2025-06-24 14:42:10 +02:00
4f650040a6 Removing extra space in large command for speech-pretraining example (#38705)
Removing extra space in Large command
2025-06-24 12:24:56 +00:00
d3d835d4fc [qwen] refactor attentions for vision/audio (#38930)
* refactor attentions in vision/audio

* remove fa2 import

* make config the only args

* pass along kwargs from modality encoders

* style
2025-06-24 10:53:52 +02:00
vb
2e4c045540 🔴 Update default dtype for pipelines to auto (#38882)
* check typing

* Fallback to fp32 if auto not supported.

* up.

* feedback from review.

* make style.
2025-06-24 10:39:18 +02:00
21cb353b7b [docs] Typos - Single GPU efficient training features (#38964)
* Typos

- corrected bf16 training argument
- corrected header for SDPA

* improved readability for SDPA suggested by @stevhliu

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-23 12:33:10 -07:00
f9be71b34d Fix rag (#38585)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:42:46 +02:00
9eac19eb59 [Feature] Support is_split_into_words in the TokenClassificationPipeline. (#38818)
* some fixes

* some fixes

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* solving test problems

* some fixes

* some fixes

* modify tests

* aligning start and end correctly

* adding tests

* some formatting

* some formatting

* some fixes

* some fixes

* some fixes

* resolve conflicts

* removing unimportant lines

* removing unimportant lines

* generalize to other languages

* generalize to other languages

* generalize to other languages

* generalize to other languages
2025-06-23 15:31:32 +00:00
2ce02b98bf fix mistral and mistral3 tests (#38978)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:07:18 +02:00
b6b4d43d6d Add support for auto_docstring with model outputs (#38242)
* experiment auto_docstring model outputs

* Fix PatchTSMixer

* Add check model output docstring to check_auto_docstring and fix all model outputs docstring

* add reordering of docstring in check_docstrings

* add check for redundant docstring in check_docstrings, remove redundant docstrings

* refactor check_auto_docstring

* make style

* fix copies

* remove commented code

* change List-> list Tuple-> tuple in docstrings

* fix modular

* make style

* Fix modular vipllava

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-23 10:39:41 -04:00
0c98f24889 fix: add __bool__ operator to tokenizer to avoid bloated asserts (#38899)
* fix: add __bool__ operator to tokenizer to avoid bloated asserts

When a user does 'assert tokenizer' to ensure that the tokenizer is not None, they inadvertently set off a rather expensive process in the '__len__()' operator. This fix adds a trivial '__bool__()' that returns True, so that a None tokenizer asserts and an actual tokenizer returns True when asserted, without calling length op.

* typo
2025-06-23 14:32:16 +00:00
d29482cc91 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157)
* add working idefics2 fast and improvements for fast nested images processing

* add fast image processors idefics 3 and smolvlm

* cleanup tests

* fic doc idefics2

* PR review and fix issues after merge

* Force providing disable_grouping to group_images_by_shape

* simplify group_images_by_shape

* fix modular

* Fix nits after review
2025-06-23 14:17:25 +00:00
1a96127e46 Break tie in Expectations and gemma3 fixes (#38943)
* Added major / minor version to Expectations ordering

* Added fixes to gemma3

* Style
2025-06-23 15:13:27 +02:00
84d19be41e Apply GradientCheckpointingLayer to the whole repo (#38913)
* first batch (4)

* align

* altclip

* beit

* bert

* yolos

* dino, pvt_v2

* bark, bart, bert_generation

* big_bird, biogpt

* blnderbot, bloom

* bridgetower

* camambert, canine, chameleon

* chinese clip, clap, clip

* codegen, conditional detr, convbert

* dab_detr, data2vec

* dbrx, deberta

* deberta, decicion_tranformer, deformable_detr

* deit, deta, mctct

* detr, dinov2, distilbert

* donut, dpt, electra

* ernie, esm, falcon

* flava, fnet, falcon_mamba

* focalnet, git, gpt2

* gpt - bigcode, neo, neox

* gptj, groupvit

* idefics2, idefics3

* ijepa, imagegpt, internvl

* jetmoe, kosmos2, layoutlm

* layoutlm2-3, led

* lilt, longformer, longt5, luke

* m2m, mamba1-2

* marian, markuplm, mask2former

* maskformer

* mbart, megatron_bert, mimi

* mixtral, mlcd

* mobilevit1-2, modernbert

* moshi, mpt, mra

* mt5, musicgen

* mvp, nemotron

* nllb_moe

* nystromformer, omdet_turbo

* opt, owlvit, owlv2

* pegasus, pegasus_x, presimmon

* phimoe, pix2struct, pixtral

* plbart, pop2piano, prophetnet

* qwen2*

* qwen2, qwen3 moe,  rec gemma

* rembert

* roberta

* roberta prelayernorm

* roc_bert, roformer, rwkv

* sam, sam_hq

* seggpt, smolvlm, speech_to_text

* splinter, stablelm, swin

* swin2sr, switch_transformer, t5, table_transformer

* tapas, time_series_tranformer, timesformer

* trocr, tvp, umt5

* videomae, vilt, visual_bert

* vit, vit_mae, vit_msn

* vitpose_backbone, vits, vivit

* whisper. x_clip, xglm

* xlm_roberta, xmod

* yoso

* zamba

* vitdet, wav2vec2, wav2vec2_bert

* unispeech, wav2vec_conformer

* wavlm

* speecht5

* swinv2

* sew / _d

* seamless_mt4 / _v2

* deprecated models update

* bros

* gemma2, gemma3

* got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer

* fixup

* Add use_cache=False and past_key_value=None to  GradientCheckpointingLayer

* fixup

* fix prophetnet

* fix bigbird_pegasus

* fix blenderbot

* fix mbart

* fix mvp

* fix zamba2

* fix bart

* fix blenderbot_small

* fix codegen

* Update gradient checkpointing layer to support more past_key_values arg names

* fix data2vec vision

* fix deformable_detr

* fix gptj

* fix led

* fix m2m_100

* add comment

* fix nnlb_moe

* Fix pegasus_x

* fix plbart

* udop

* fix-copies: beit, wav2vec2

* fix gpt_bigcode

* fixup

* fix t5

* fix switch_transformers

* fix longt5

* fix mt5

* update tapas

* fix blip2

* update blip

* fix musicgen

* fix gpt2, trocr

* fix copies

* !!! Revert zamba, mllama

* update autoformer

* update bros

* update args / kwargs for BERT and copies

* 2nd round of updates

* update conditional detr

* Pass encoder_hidden_states as positional arg

* Update to pass encoder_decoder_position_bias as positional arg

* fixup

* biogpt modular

* modular gemma2

* modular gemma3

* modular gpt_neox

* modular informer

* modular internvl

* modular mixtral

* modular mlcd

* modular modernbert

* modular phi

* modular qwen2_5_omni

* modular qwen2_5_vl

* modular sam_hq

* modular sew

* wav2vec2_bert

* modular wav2vec2_conformer

* modular wavlm

* fixup

* Update by modular instructblipvideo

* modular data2vec_audio

* nit modular mistral

* apply modular minimax

* fix modular moonshine

* revert zamba2

* fix mask2former

* refactor idefics
2025-06-23 14:24:48 +02:00
07aab1af1e Remove dead protected imports (#38980)
* remove them

* more
2025-06-23 13:44:50 +02:00
74f5e4a1fa [modular] CLI allows positional arguments, and more defaults names for the optional arg (#38979)
* More defaults

* Update modular_model_converter.py
2025-06-23 12:40:01 +02:00
334bf913dc Fix(informer): Correct tensor shape for input_size=1 (#38856)
* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-06-23 11:50:51 +02:00
c184550daf Fix DTensor import compatibility for PyTorch < 2.5 (#38836) 2025-06-23 11:25:56 +02:00
984ff89e73 Gaudi3 CI (#38790) 2025-06-23 10:56:51 +02:00
2166b6b4ff Update blip model card (#38513)
* Update docs/source/en/model_doc/blip.md

* fix(docs/source/en/model_doc/blip.md): fix redundent typo error

* fix (docs/source/en/model_doc/blip.md): modify of review contents

* fix(docs/source/en/model_doc/blip.md): modify code block

* Update blip.md

---------

Co-authored-by: devkade <mouseku@moana-master>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-20 13:46:19 -07:00
166e823f77 Fix custom generate from local directory (#38916)
Fix custom generate from local directory:
1. Create parent dirs before copying files (custom_generate dir)
2. Correctly copy relative imports to the submodule file.
3. Update docs.
2025-06-20 17:36:57 +01:00
3d34b92116 Switch to use A10 progressively (#38936)
* try

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 16:10:35 +00:00
b8059e1f8f Fix more flaky test_initialization (#38932)
* try

* try

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 17:28:32 +02:00
5ee60f970a Correctly raise error for awq quantization (#38945)
fix warning
2025-06-20 17:18:06 +02:00
8ac2d75353 Pin PyTorch extras for AMD containers (#38941)
* Pin additional Torch packages

* Remove unused def

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-20 12:17:21 +00:00
9120567b02 Add kwargs for timm.create_model in TimmWrapper (#38860)
* Add init kwargs for timm wrapper

* model_init_kwargs -> model_args

* add save-load test

* fixup
2025-06-20 12:00:09 +00:00
ff95974bc6 [static cache] fix device map per layer in VLMs (#38488)
return lm as decoder
2025-06-20 13:49:29 +02:00
aa42987c1e Remove ALL_LAYERNORM_LAYERS (#38922)
* remove it everywhere

* Update trainer_pt_utils.py

* Update trainer_pt_utils.py

* style

* sort list in test

* CIs

* use recursion same way as before (for intermediate layer names)
2025-06-20 12:06:48 +02:00
38a9b70786 add pytorch-xpu Dockerfile (#38875)
* first commit

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* use rls pytorch

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-20 11:42:44 +02:00
9bcdd5cde9 Modernbert fixes (#38912)
* Removed deprecated argument in modernbert RotaryEmbedding

* Skip test_sdpa_can_dispatch_on_flash for modernbert

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-20 11:22:32 +02:00
31d30b7224 Skip some tests for now (#38931)
* try

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 11:05:49 +02:00
0725cd6953 Remove deprecated classes in modeling_utils.py (#38919)
* remove deprecated classes

* style
2025-06-19 19:25:20 +02:00
797860c68c feat: add flexible Liger Kernel configuration to TrainingArguments (#38911)
* feat: add flexible Liger Kernel configuration to TrainingArguments

Add support for granular Liger Kernel configuration through a new
`liger_kernel_config` parameter in TrainingArguments. This allows users
to selectively enable/disable specific kernels (rope, swiglu, cross_entropy,
etc.) instead of the current approach that rely on default configuration.

Features:
- Add `liger_kernel_config` dict parameter to TrainingArguments
- Support selective kernel application for all supported models
- Maintain full backward compatibility with existing `use_liger_kernel` flag

Example usage:
```python
TrainingArguments(
    use_liger_kernel=True,
    liger_kernel_config={
        "rope": True,
        "swiglu": True,
        "cross_entropy": False,
        "fused_linear_cross_entropy": True
    }
)
Closes #38905

* Address comments and update Liger section in Trainer docs
2025-06-19 15:54:08 +00:00
89b35be618 Allow make-fixup on main branch, albeit slowly (#38892)
* Allow make-fixup on main branch, albeit slowly

* Make the other style checks work correctly on main too

* More update

* More makefile update
2025-06-19 15:22:59 +01:00
9a02e7602d feat: Add granite architectures to auto tokenizer name mappings (#38802)
Branch: GraniteTokenizerMapping

Signed-off-by: Gabe Goodhart <ghart@us.ibm.com>
2025-06-19 15:20:42 +01:00
54a02160eb Fix ReDOS in tokenizer digit substitution (#38844)
* Fix regexes vulnerable to ReDOS

* Let's just use regex

* Import regex/re correctly
2025-06-19 14:53:52 +01:00
af6120b3eb Skip sdpa tests if submodule does not support sdpa (#38907) 2025-06-19 13:11:01 +00:00
5d26a38735 Fix FalconMambaIntegrationTests (#38566)
* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-19 13:50:33 +02:00
a9ce8c69c9 align xpu's autocast behavior w/ cuda by using device agnostic torch APIs (#38284)
* siwtch to device agnostic autocast in nemotron to align xpu behavior w/
cuda

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix issue

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* use torch.cast as other modeling code for decision_transformer&gpt2&imagegpt

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* refine

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update get_autocast_gpu_dtype to device agnostic one

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-19 11:48:23 +00:00
0a53df1a77 Fix unnecessary super calls (#38897)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-19 11:45:51 +00:00
b949747b54 Fix fsmt tests (#38904)
* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-19 10:56:34 +02:00
11738f8537 [phi-4] use mel filters from audio utils (#36966)
* use mel_filter_bank from audio utils

* Apply style fixes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-19 12:35:32 +09:00
f7b21822e3 Use raise from e in hub.py utility (#37241)
Use raise from e

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-19 03:06:25 +00:00
3756bf192c Add support for specifying revisions when pushing to Hub via internal Trainer call (#36852)
* Update training_args.py

* Update trainer.py

* fixes

* fix

* remove extraneous comments

* explicit revision arg

* add msg

* fixup

* fix field name

* rename field revision to hub_revision

* restore gradient_checkpointing doc

* fix ws

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-19 02:35:33 +00:00
458e0b376c Update bamba model card (#38853)
* Update bamba model card

* Update the doc for bamba

* Update docs/source/en/model_doc/bamba.md

Bamba paragraph

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Bamba collection url

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Update Padding-Free Training to Notes heading

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

update examples

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

Update additional info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

consistent casing

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

simplify sentences

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include pipeline and cli examples + fix formatting

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bamba.md

update cli id

* Update quantization example

* Fix auto code formatter changes

* Update cli command + include BambaModel

* Update docs/source/en/model_doc/bamba.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-18 16:01:25 -07:00
ea01334873 [video processor] fix slow tests (#38881)
* we need to check against mapping to be safe

* need to check only when inferring from image type, otherwise messes custom code

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-18 22:39:56 +02:00
b922b22ec2 36978 | Fast image processor for DPT model (#37481)
* chore: ran codegen script

* test: test_image_processor_properties

* test: test_image_processor_from_dict_with_kwargs

* test: wip - test_padding

* test: test_padding

* test: test_keep_aspect_ratio

* wip

* test

* test: wip

* test: wip

* test: test_call_segmentation_maps, wip

* chore: tidy up

* test: test_call_segmentation_maps

* fix: test_save_load_fast_slow

* test: reduce labels

* chore: make fixup

* chore: rm comment

* chore: tidy

* chore remove comment

* refactor: no need to infer channel dimesnion

* refactor: encapsulate logic for preparing segmentation maps

* refactor: improve readability of segmentation_map preparation

* improvement: batched version of pad_image

* chore: fixup

* docs

* chore: make quality

* chore: remove unecessary comment

* fix: add SemanticSegmentationMixin

* feat: add post_process_depth_estimation to fast dpt image processor

* chore: fix formatting

* remove max_height, max_width

* fix: better way of processin segmentation maps
- copied from Beit Fast processor

* chore: formatting + remove TODO

* chore: fixup styles

* chore: remove unecessary line break

* chore: core review suggestion to remove autodocstring

* fix: add do_reduce_labels logic + refactor
- refactor preprocess logic to make it consistent with other processors
- add missing reduce labels logic

* refactor: remove deprecated mixin

* chore: fixup

* use modular for dpt + final nit changes

* fix style

---------

Co-authored-by: Samuel Rae <samuelrae@Samuels-Air.fritz.box>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-18 17:33:29 +00:00
c27f628e98 Docs: Add custom fine-tuning tutorial to TrOCR model page (#38847)
* Update trocr.md

Docs: add community fine‑tuning notebook link to TrOCR page

* apply suggested changes from PR review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/trocr.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-18 09:38:58 -07:00
0a289d1630 log: Add logging when using split_batches and per_device_train_batch_size (#38633)
* log: Add logging when user uses split_batches and per_device_train_batch_size

* refactor: remove whitespace from blank line

* Update src/transformers/training_args.py

Change logging level to info

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-18 16:26:46 +00:00
c55d806355 [bugfix] fix ATTN_MASK_NPU device mismatch error on multi-device NPU … (#38876)
[bugfix] fix ATTN_MASK_NPU device mismatch error on multi-device NPU setups
2025-06-18 16:26:22 +00:00
9cd7570f34 Fix loop var naming (#38885) 2025-06-18 13:45:01 +00:00
1fc67a25c6 More PYUP fixes (#38883)
More pyup fixes

Signed-off-by: cyy <cyyever@outlook.com>
2025-06-18 14:38:08 +01:00
12d4c5b66f null deepspeed_plugin in args for wandb callback fake trainer (#38867) 2025-06-18 13:10:22 +00:00
3620b32cc8 Fixed markdown for BertTokenizer's '[CLS]' token. (#38506) 2025-06-18 13:09:58 +00:00
cb0f604192 Fix HQQ model param device transfer issue (#38466)
* Fix HQQ model param device transfer issue

* modify a comment

* clear the code and add test for hqq device/dtype

* fix test hqq code quality of imports

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-18 15:09:00 +02:00
c77bcd889f Fix qwen3_moe tests (#38865)
* try 1

* try 2

* try 3

* try 4

* try 5

* try 6

* try 7

* try 8

* try 9

* try 10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-18 14:36:03 +02:00
5a95ed5ca0 🚨🚨 Fix initialization of Mask2Former (#38864)
* Correctly fix init

Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>

* add back the block, breaking BC but this is correct author's code

* override the test for params needing it

---------

Co-authored-by: BUI Van Tuan <buivantuan07@gmail.com>
2025-06-18 09:46:22 +02:00
309e8c96f2 Fix phi4_multimodal tests (#38816)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-18 09:39:17 +02:00
3526e25d3d enable misc test cases on XPU (#38852)
* enable misc test cases on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* tweak bamba ground truth on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* remove print

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* one more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-18 09:20:49 +02:00
d058f81e5b Post-PR fixes! (#38868)
* Post-PR fixes!

* make fix-copies
2025-06-17 19:58:47 +01:00
508a704055 No more Tuple, List, Dict (#38797)
* No more Tuple, List, Dict

* make fixup

* More style fixes

* Docstring fixes with regex replacement

* Trigger tests

* Redo fixes after rebase

* Fix copies

* [test all]

* update

* [test all]

* update

* [test all]

* make style after rebase

* Patch the hf_argparser test

* Patch the hf_argparser test

* style fixes

* style fixes

* style fixes

* Fix docstrings in Cohere test

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 19:37:18 +01:00
a396f4324b Update roc bert docs (#38835)
* Moved the sources to the right

* small Changes

* Some Changes to moonshine

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Changes to the roc_bert

* Final Update to the branch

* Adds Quantizaiton to the model

* Finsihed Fixing the Roc_bert docs

* Fixed Moshi

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Fixed Problems

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Fixed the problems

* Final Fix

* Final Fix

* Final Fix

* Update roc_bert.md

---------

Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-17 11:02:18 -07:00
3ae52cc312 Update CvT documentation with improved usage examples and additional … (#38731)
* Update CvT documentation with improved usage examples and additional notes

* initial update

* cvt

* Update docs/source/en/model_doc/cvt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update cvt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-17 10:30:03 -07:00
e5a9ce48f7 Add LightGlue model (#31718)
* init

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* chore: various changes to LightGlue

* Fixed dynamo bug and image padding tests

* refactor: applied refactoring changes from SuperGlue's concat, batch and stack functions to LightGlue file

* tests: removed sdpa support and changed expected values

* chore: added some docs and refactoring

* chore: fixed copy to superpoint.image_processing_superpoint.convert_to_grayscale

* feat: adding batch implementation

* feat: added validation for preprocess and post process method to LightGlueImageProcessor

* chore: changed convert_lightglue_to_hf script to comply with new standard

* chore: changed lightglue test values to match new lightglue config pushed to hub

* chore: simplified convert_lightglue_to_hf conversion map

* feat: adding batching implementation

* chore: make style

* feat: added threshold to post_process_keypoint_matching method

* fix: added missing instructions that turns keypoints back to absolute coordinate before matching forward

* fix: added typehint and docs

* chore: make style

* [run-slow] lightglue

* fix: add matches different from -1 to compute valid matches in post_process_keypoint_matching

* tests: added CUDA proof tests similar to SuperGlue

* chore: various changes to modeling_lightglue.py

- Added "Copies from" statements for copied functions from modeling_superglue.py
- Added missing docstrings
- Removed unused functions or classes
- Removed unnecessary statements
- Added missing typehints
- Added comments to the main forward method

* chore: various changes to convert_lightglue_to_hf.py

- Added model saving
- Added model reloading

* chore: fixed imports in lightglue files

* [run-slow] lightglue

* chore: make style

* [run-slow] lightglue

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* [run-slow] lightglue

* chore: Applied some suggestions from review

- Added missing typehints
- Refactor "cuda" to device variable
- Variable renaming
- LightGlue output order changed
- Make style

* fix: added missing grayscale argument in image processor in case use of SuperPoint keypoint detector

* fix: changed lightglue HF repo to lightglue_superpoint with grayscale default to True

* refactor: make keypoints `(batch_size, num_keypoints, keypoint_dim)` through forward and unsqueeze only before attention layer

* refactor: refactor do_layer_keypoint_pruning

* tests: added tests with no early stop and keypoint pruning

* refactor: various refactoring to modeling_lightglue.py

- Removed unused functions
- Renamed variables for consistency
- Added comments for clarity
- Set methods to private in LightGlueForKeypointMatching
- Replaced tensor initialization to list then concatenation
- Used more pythonic list comprehension for repetitive instructions

* refactor: added comments and renamed filter_matches to get_matches_from_scores

* tests: added copied from statement with superglue tests

* docs: added comment to prepare_keypoint_matching_output function in tests

* [run-slow] lightglue

* refactor: reordered _concat_early_stopped_outputs in LightGlue class

* [run-slow] lightglue

* docs: added lightglue.md model doc

* docs: added Optional typehint to LightGlueKeypointMatchingOutput

* chore: removed pad_images function

* chore: set do_grayscale default value to True in LightGlueImageProcessor

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* docs: added missing LightGlueConfig typehint in nn.Module __init__ methods

* docs: removed unnecessary code in docs

* docs: import SuperPointConfig only from a TYPE_CHECKING context

* chore: use PretrainedConfig arguments `num_hidden_layers` and `num_attention_heads` instead of `num_layers` and `num_heads`

* chore: added organization as arg in convert_lightglue_to_hf.py script

* refactor: set device variable

* chore: added "gelu" in LightGlueConfig as hidden_act parameter

* docs: added comments to reshape.flip.reshape instruction to perform cross attention

* refactor: used batched inference for keypoint detector forward pass

* fix: added fix for SDPA tests

* docs: fixed docstring for LightGlueImageProcessor

* [run-slow] lightglue

* refactor: removed unused line

* refactor: added missing arguments in LightGlueConfig init method

* docs: added missing LightGlueConfig typehint in init methods

* refactor: added checkpoint url as default variable to verify models output only if it is the default url

* fix: moved print message inside if statement

* fix: added log assignment r removal in convert script

* fix: got rid of confidence_thresholds as registered buffers

* refactor: applied suggestions from SuperGlue PR

* docs: changed copyright to 2025

* refactor: modular LightGlue

* fix: removed unnecessary import

* feat: added plot_keypoint_matching method to LightGlueImageProcessor with matplotlib soft dependency

* fix: added missing import error for matplotlib

* Updated convert script to push on ETH org

* fix: added missing licence

* fix: make fix-copies

* refactor: use cohere apply_rotary_pos_emb function

* fix: update model references to use ETH-CVG/lightglue_superpoint

* refactor: add and use intermediate_size attribute in config to inherit CLIPMLP for LightGlueMLP

* refactor: explicit variables instead of slicing

* refactor: use can_return_tuple decorator in LightGlue model

* fix: make fix-copies

* docs: Update model references in `lightglue.md` to use the correct pretrained model from ETH-CVG

* Refactor LightGlue configuration and processing classes

- Updated type hints for `keypoint_detector_config` in `LightGlueConfig` to use `SuperPointConfig` directly.
- Changed `size` parameter in `LightGlueImageProcessor` to be optional.
- Modified `position_embeddings` in `LightGlueAttention` and `LightGlueAttentionBlock` to be optional tuples.
- Cleaned up import statements across multiple files for better readability and consistency.

* refactor: Update LightGlue configuration to enforce eager attention implementation

- Added `attn_implementation="eager"` to `keypoint_detector_config` in `LightGlueConfig` and `LightGlueAttention` classes.
- Removed unnecessary logging related to attention implementation fallback.
- Cleaned up import statements for better readability.

* refactor: renamed message into attention_output

* fix: ensure device compatibility in LightGlueMatchAssignmentLayer descriptor normalization

- Updated the normalization of `m_descriptors` to use the correct device for the tensor, ensuring compatibility across different hardware setups.

* refactor: removed Conv layers from init_weights since LightGlue doesn't have any

* refactor: replace add_start_docstrings with auto_docstring in LightGlue models

- Updated LightGlue model classes to utilize the new auto_docstring utility for automatic documentation generation.
- Removed legacy docstring handling to streamline the code and improve maintainability.

* refactor: simplify LightGlue image processing tests by inheriting from SuperGlue

- Refactored `LightGlueImageProcessingTester` and `LightGlueImageProcessingTest` to inherit from their SuperGlue counterparts, reducing code duplication.
- Removed redundant methods and properties, streamlining the test setup and improving maintainability.

* test: forced eager attention implementation to LightGlue model tests

- Updated `LightGlueModelTester` to include `attn_implementation="eager"` in the model configuration.
- This change aligns the test setup with the recent updates in LightGlue configuration for eager attention.

* refactor: update LightGlue model references

* fix: import error

* test: enhance LightGlue image processing tests with setup method

- Added a setup method in `LightGlueImageProcessingTest` to initialize `LightGlueImageProcessingTester`.
- Included a docstring for `LightGlueImageProcessingTester` to clarify its purpose.

* refactor: added LightGlue image processing implementation to modular file

* refactor: moved attention blocks into the transformer layer

* fix: added missing import

* fix: added missing import in __all__ variable

* doc: added comment about enforcing eager attention because of SuperPoint

* refactor: added SuperPoint eager attention comment and moved functions to the closest they are used

---------

Co-authored-by: Steven Bucaille <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-17 18:10:23 +02:00
2507169bf6 Fix qwen3 tests (#38862)
* fix

* update

* update

* update

* update

* update

* update

* format

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 15:21:36 +02:00
41e0c921cb Improve auxiliary_in_channels default behavior in UperNet (#37540)
Improve auxiliary_in_channels behavior in UperNet

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-17 12:56:46 +00:00
c61ca64aaa Fix qwen2_5_vl tests (#38845)
* fix

* breakpoint()

* breakpoint()

* update

* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-17 10:55:24 +02:00
37367c7d9f Allow customization of sdpa in executorch.py (#38827)
Earlier PR put executorch specific sdpa and mask function in the export function. This prevent any customization that can be done to sdpa, prior to export. By moving this to __init__, we still keep the original behavior but allow users like optimum-executorch to override sdpa by setting model.config._attn_implementation.
2025-06-17 10:38:20 +02:00
9c878d2f64 Fix incorrect width ratio calculation in Llama4 image processor (#38842) 2025-06-17 07:33:36 +00:00
bf370e446b [video processor] fix BC when no video config if found (#38840)
fix auto video processor
2025-06-17 09:20:16 +02:00
e61160c5db Remove merge conflict artifacts in Albert model doc (#38849) 2025-06-16 14:21:18 -07:00
64e9b049d9 Updated aya_vision.md (#38749)
* Update aya_vision.md

* Suggested changes made to aya_vision.md

* Quantization Example added - aya_vision.md

* Polished - aya_vision.md

* Update aya_vision.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-16 10:46:30 -07:00
5ab0f447ab GraniteMoeHybrid: Allow for only shared expert case. (#38801)
* Allow for only shared expert case.

* Style
2025-06-16 16:15:42 +01:00
a7593a1d1f [BugFix] QA pipeline edge case: align_to_words=True in QuestionAnsweringPipeline can lead to duplicate answers (#38761)
* fixing the problem align_to_words=True leading to duplicate solutions

* adding tests

* some fixes

* some fixes

* changing the handle_duplicate_answers=False by default

* some fixese

* some fixes

* make the duplicate handling the default behaviour and merge duplicates

* make the duplicate handling the default behaviour
2025-06-16 15:01:22 +00:00
18c7f32daa Fix broken tag in Longformer model card (#38828) 2025-06-16 07:44:40 -07:00
b44b04ee9a Fix broken notebooks link in Italian training docs (#38834) 2025-06-16 07:38:51 -07:00
9300728665 Fix peft integration (#38841)
Update peft.py
2025-06-16 10:39:25 +02:00
608884960e add default mapping to peft integration 2025-06-16 10:23:51 +02:00
ce6ac53ac1 bugfix: propage weight key_mapping to peft to fix 3.52 VLM renaming (#38627)
* propage key mapping to peft

* propage key mapping to peft

* make requested changes

* revert
2025-06-16 10:10:23 +02:00
925da8ac56 Fix redundant code in Janus (#38826)
* minor mistake

* modify return statements
2025-06-16 06:53:59 +00:00
d2fd3868bb [internvl] fix video inference (#38811)
fix
2025-06-16 08:37:30 +02:00
d5d007a1a0 Updated Albert model Card (#37753)
* Updated Albert model Card

* Update docs/source/en/model_doc/albert.md

added the quotes in <hfoption id="Pipeline">

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated checkpoints

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

changed !Tips description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

updated transformer-cli implementation

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

changed text

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

removed repeated description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update albert.md

removed lines

* Update albert.md

updated pipeline code

* Update albert.md

updated auto model code, removed quantization as model size is not large, removed the attention visualizer part

* Update docs/source/en/model_doc/albert.md

updated notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update albert.md

reduced a  repeating point in notes

* Update docs/source/en/model_doc/albert.md

updated transformer-CLI

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/albert.md

removed extra notes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 14:58:06 -07:00
443aafd3d6 [docs] updated roberta model card (#38777)
* updated roberta model card

* fixes suggested after reviewing

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 12:02:44 -07:00
fdb5da59dd [docs] Update docs moved to the course (#38800)
* update

* update

* update not_doctested.txt

* slow_documentation_tests.txt
2025-06-13 12:02:27 -07:00
8b73799500 fixed docstring in modular_qwen2_5_vl.py (#38798)
* fixed docstring in modular_qwen2_5_vl.py

* Regenerate file to match docstring update
2025-06-13 11:09:51 -07:00
9bec2654ed Add V-JEPA for video classification model (#38788)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* Add ForVideoClassification model

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* attention head

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

* Initial cls head setup

* remove rop attention from head (not needed)

* remove swigluffn - not needed

* Add siglip layer

* Replace with siglip layer

* Rename Siglip - VJEPA2

* remove unused modules

* remove siglip mlp

* nit

* remove MLP

* Refactor head cross attention

* refactor VJEPA2HeadCrossAttentionLayer

* nit renaming

* fixup

* remove commented code

* Add cls head params to config

* depth from config

* move pooler + classifier  to the model

* Update for cls model signature

* move layers, rename a bit

* fix docs

* update weights init

* remove typehint for init

* add to auto-mapping

* enable tests

* Add conversion script

* fixup

* add to docs

* fix docs

* nit

* refactor for mapping

* clean

* Add integration test

* Fixing multi gpu test

* update not-split-modules

* update video cls test tolerance

* Increase test_inference_image tolerance

* Update no-split modules for multi gpu

* Apply suggestions from code review

* fixing multi-gpu

* fix docstring

* Add cls snippet to docs

* Update checkpoint
2025-06-13 17:56:15 +01:00
2ff964bcb4 Fix trainer.py not showing signature columns (#38465)
Fix trainer.py not showing signature columns
2025-06-13 15:39:29 +00:00
4c3c177ecf Fix a minor security issue (#38815)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-13 17:37:46 +02:00
93445aed06 change fsdp_strategy to fsdp in TrainingArguments in accelerate doc (#38807) 2025-06-13 15:32:40 +00:00
b82a45b3b4 Refactor DBRX tests to use CausalLMModelTest base classes (#38475)
* Refactor DBRX tests to use CausalLMModelTest base classes

- Changed DbrxModelTester to inherit from CausalLMModelTester
- Changed DbrxModelTest to inherit from CausalLMModelTest
- Removed duplicate methods that are already in base classes
- Added required class attributes for model classes
- Updated pipeline_model_mapping to include feature-extraction
- Kept DBRX-specific configuration and test methods
- Disabled RoPE tests as DBRX's rotary embedding doesn't accept config parameter

This refactoring reduces code duplication and follows the pattern established
in other causal LM model tests like Gemma.

* Apply style fixes

* Trigger tests

* Refactor DBRX test

* Make sure the DBRX-specific settings are handled

* Use the attribute_map

* Fix attribute map

---------

Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-13 16:22:12 +01:00
64041694a8 Use wandb.run.url instead of wandb.run.get_url() (deprecated) (#38817) 2025-06-13 15:20:04 +00:00
9ff246db00 Expectation fixes and added AMD expectations (#38729) 2025-06-13 16:14:58 +02:00
e39172ecab Fix llava_next tests (#38813)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-13 15:19:41 +02:00
b3b7789cbc Better pipeline type hints (#38049)
* image-classification

* depth-estimation

* zero-shot-image-classification

* image-feature-extraction

* image-segmentation

* mask-generation

* object-detection

* zero-shot-object-detection

* image-to-image

* image-text-to-text

* image-to-text

* text-classification

* text-generation

* text-to-audio

* text2text_generation

* fixup

* token-classification

* document-qa

* video-classification

* audio-classification

* automatic-speech-recognition

* feature-extraction

* fill-mask

* zero-shot-audio-classification

* Add pipeline function typing

* Add code generator and checker for pipeline types

* Add to makefile

* style

* Add to CI

* Style
2025-06-13 13:44:07 +01:00
c989ddd294 Simplify and update trl examples (#38772)
* Simplify and update trl examples

* Remove optim_args from SFTConfig in Trainer documentation

* Update docs/source/en/trainer.md

* Apply suggestions from code review

* Update docs/source/en/trainer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Quentin Gallouédec <qgallouedec@Quentins-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 12:03:49 +00:00
de24fb63ed Use HF papers (#38184)
* Use hf papers

* Hugging Face papers

* doi to hf papers

* style
2025-06-13 11:07:09 +00:00
1031ed5166 Disable custom MRA kernels for ROCm (#38738)
* Disable custom MRA kernels for ROCm

* Move platform check code to utils

* Ruff

* Ruff again

* Fix querying HIP version

* Revert some changes

* Add missing return statement

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-13 12:25:28 +02:00
7f00b325f8 Unbreak optimum-executorch (#38646)
* Unbreak optimum-executorch

* use static cache if has layer_types but no sliding_window

* revert view on kv_arange

---------

Co-authored-by: Guang Yang <guangyang@fb.com>
2025-06-13 11:13:32 +02:00
5f59a9b439 Fix configs and doc for the Qwens (#38808)
fix doc and configs
2025-06-13 11:10:55 +02:00
8222a9325d Fix erroneous docstring for the ordering of SWA layers (#38794) 2025-06-13 10:46:44 +02:00
e26ae89281 [docs] update cache docs with new info (#38775)
* update docs with new info

* Update docs/source/en/kv_cache.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-13 07:10:56 +00:00
324cc77dc3 refactor create_token_type_ids_from_sequences (#37681)
* rm build_input.. from old file

* refactor create_token_type_ids_from_sequences

* handle when cls_token_id is None

* updated fix

* markuplm

* refactoring rest of models

* copies

* revert funnel

* rm incorrect file

* ruff

* ruff
2025-06-12 23:24:43 +02:00
85f060e9b0 Updated moonshine modelcard (#38711)
* Moved the sources to the right

* small Changes

* Some Changes to moonshine

* Added the install to pipline

* updated the monshine model card

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/moonshine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated Documentation According to changes

* Fixed the model with the commits

* Update moonshine.md

* Update moshi.md

---------

Co-authored-by: Your Name <sohamprabhu@Mac.fios-router.home>
Co-authored-by: Your Name <sohamprabhu@Sohams-MacBook-Air.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:27:17 -07:00
645cf297cc Add missing div in Pegasus model card (#38773)
Add missing div
2025-06-12 10:27:07 -07:00
346f341630 [Docs] New DiT model card (#38721)
* documenation finished

* Update dit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-12 10:26:50 -07:00
4b8ec667e9 Remove all traces of low_cpu_mem_usage (#38792)
* remove it from all py files

* remove it from the doc

* remove it from examples

* style

* remove traces of _fast_init

* Update test_peft_integration.py

* CIs
2025-06-12 16:39:33 +02:00
3542e0b844 build: 📌 Remove upper bound on PyTorch (#38789)
build: 📌 remove upper bound on torch dependency as issue which originally resulted in the pin has been released in torch 2.7.1
2025-06-12 16:34:13 +02:00
eea35a15b0 Fix mllama (#38704)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 16:15:35 +02:00
038a59e2cd Initialize flash attn flag (#38768)
_flash_supports_window_size is used further down in this file and relied on by e.g. [ring-flash-attention](https://github.com/zhuzilin/ring-flash-attention/blob/123f924/ring_flash_attn/adapters/hf_adapter.py#L9-L11). Even though it is an unexported name, it still makes sense to keep the state of `globals()` in this file consistent.
2025-06-12 14:06:13 +00:00
910355a010 Fix Typos in Comments: "quantitation" → "quantization", "averege" → "average" (#38766)
* Update convert_llama4_weights_to_hf.py

* Update modeling_visual_bert.py
2025-06-12 14:04:39 +00:00
6a5fd0c6d2 Reword README in light of model definitions (#38762)
* Slight readme reword

* reword

* reword

* reword

* Slight readme reword
2025-06-12 14:43:31 +01:00
c87058beb8 Fix llava_onevision tests (#38791)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 15:06:49 +02:00
d4e7aa5526 Fix qwen_2_5 omni (#38658)
* fix

* fix

* break style

* break style

* Apply style fixes

* break style

* Apply style fixes

* fix modular

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-12 14:43:54 +02:00
e1812864ab [docs] Add int4wo + 2:4 sparsity example to TorchAO README (#38592)
* update quantization readme

* update

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-12 12:17:07 +00:00
bc68defcac Update PULL_REQUEST_TEMPLATE.md (#38770) 2025-06-12 14:03:33 +02:00
960fda25d1 Reduce verbosity for average_tokens_across_devices=True and world size = 1 (#38785)
* Warning to info for average_tokens_across_devices and world size = 1

* Update src/transformers/training_args.py
2025-06-12 14:02:53 +02:00
89c46b648d Skip some export tests on torch 2.7 (#38677)
* skip

* fix

* better check

* Update import_utils.py

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-12 12:47:15 +02:00
27459025b8 [video processors] support frame sampling within processors (#38105)
* apply updates smolVLM (still needs workaround for chat template)

* add other models

* dump qwen omni for now, come back later

* port qwen omni from their impl

* wait, all qwens sample videos in same way!

* clean up

* make smolvlm backwards compatible and fix padding

* dix some tests

* fox smolvlm tests

* more clean up and test fixing

* delete unused arg

* fix

* address comments

* style

* fix test
2025-06-12 09:34:30 +00:00
887054c714 Fix masking utils (#38783)
* fix

* Update masking_utils.py

* Update masking_utils.py
2025-06-12 11:00:46 +02:00
7c58336949 [Hotfix] Fix style bot (#38779)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-12 10:20:36 +02:00
7c6b1707c3 [masking utils] check None instead of try/except (#38561)
* fix vllm's compile backend

* fix the test

* apply the same changes in other masking strategies
2025-06-12 06:50:28 +00:00
9487765f07 Add Qwen2 MoE model card (#38649)
* Add Qwen2 MoE model card

* Revisions to qwen2 moe model card

* Add Qwen2 MoE model card
2025-06-11 15:14:01 -07:00
32dbf4bddb Update altCLIP model card (#38306)
* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

* Rename altclip.md to altclip.mdx

* Rename altclip.mdx to altclip.md

* Update altclip.md

* Update altclip.md

* Update altclip.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 14:48:34 -07:00
1dcb022e8f chore(pixtral): emit block attention mask when using flash attention (#38741)
* chore(pixtral): emit block attention mask when using flash attention

Since flash_attention_2 relies solely on position_ids, emitting the block attention mask avoids unnecessary memory usage and prevents OOM on large inputs.

* remove unnecessary attention_mask assignment
2025-06-11 18:55:23 +00:00
60d4b35b20 Make style bot trigger CI after push (#38754)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-11 20:40:04 +02:00
bb44d2a0f6 Update pegasus model card (#38675)
* Update Pegasus model card

* Fix transformers-cli command

* Update code examples to use bfloat16

* Reverted code examples to use float16

* Fix typo, update checkpoints link

* Update str formatting in code examples

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

* Remove inaccurate badges

* Revert badge removal

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Include cache_implementation argument in quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-11 10:56:25 -07:00
L
b84ebb7f3c fix(qwen3_moe): pass kwargs to self_attn (#38691)
This is needed to avoid `.item()` calls in `_flash_attention_forward`.
2025-06-11 19:26:08 +02:00
9f563ada70 Deprecate TF + JAX (#38758)
* Scatter deprecation warnings around

* Delete the tests

* Make logging work properly!
2025-06-11 17:28:06 +01:00
337757cbd5 Update repo consistency check (#38763) 2025-06-11 17:02:03 +01:00
e2bdc13375 Remove IPEX requirement for bitsandbytes on CPU (#38594)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 17:46:34 +02:00
063bef0865 Prepare for TF+Jax deprecation (#38760)
* Prepare for TF+Jax deprecation

* Remove .circleci jobs
2025-06-11 16:03:31 +01:00
11ad9be153 Better typing for num_items_in_batch (#38728)
* fix

* style

* type checking ?

* maybe this ?

* fix

* can't be an int anymore

* fix
2025-06-11 16:26:41 +02:00
84710a4291 Add V-JEPA 2 (#38746)
* adding model and conversion scripts

* add imports to test vjepa conversion

* fix imports and make conversion work

* fix computation for short side

* replace attention with library attention function

* cleanup more attention classes

* remove config overrides

* add test cases, fix some of the failing ones

* fix the model outputs

* fix outputs of the model per review

* fix too big model test case

* fix styling __init__.py

* fix initialization test

* remove all asserts per review

* update sorting unsorting logic as per feedback

* remove is_video per review

* remove another is_video segment

* remove unwanted stuff

* small fixes

* add docstrings for the model

* revert adding vjepa2 config here

* update styling

* add config docstrings (wip)

* fix dpr issue

* removed test failing issues

* update styles

* merge predictor configs into main config

* remove processing code, add video processor

* remove permute which is not necessary now

* fix styles

* updated vjepa2 to be in video_processing_auto

* update comment for preprocessing

* test integration test and fix the outputs

* update test values, change test to look at repeated frames for a given image

* add a simple video processing test

* refactoring pixel_values_videos and upload ckpts to original

* fix torch_fx test cases

* remove unused config

* add all config docstrings

* add more integration tests

* add basic doc

* revert unwanted styling changes

* working make fixup

* Fix model_type in config

* update attention implementation to fit new hf standards

* fix the preprocessing logic, ensure it matches the original model

* remove use_rope logic, cleanup

* fix docstrings

* Further cleanup, update doc

* Fix model prefix

* fix get_vision_features

* VJEPA2Embeddings style refactor

* nit, style comment

* change modules default values

* Only `str` activation in config

* GradientCheckpointingLayer

* fixup

* fix conversion script

* Remove return_dict

* remove None return typehint

* Refactor VJEPA2Layer, remove use_SiLU

* Fix fx tests

* dpr -> drop_path_rates

* move *ModelOutput on top

* format docs bit

* update docs

* update docs

* update doc example

* remove prune_heads from model

* remove unused config params

* refactor embed signature

* Add vjepa to docs

* Fix config docstring

* update defaults

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/model_doc/vjepa2.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Fix import

* Min refactoring

* Update HUB_SOURCE and HUB_REPO in conversion script

* Add missing headers

* VJEPA -> V-JEPA in docs

* Add image to doc

* fix style

* fix init weights

* change checkpoint name in modeling tests

---------

Co-authored-by: Koustuv Sinha <koustuv.sinha@mail.mcgill.ca>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Koustuv Sinha <koustuvsinha@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-06-11 15:00:08 +01:00
a6f0e2b64a Add z-loss to Bamba for v2 (#37842)
* Remove const

* Fix arg ref

* Sharded save

* Add z_loss flag

* Add modeling zloss

* Demodularize clm forward for zloss

* Also demodularize init for z_loss flag

* PR comments (mostly modularizing right)

* Demodularize forward

* Better name zloss and explain typematch

* Fully propagate coeff name

* style fixes

* zloss default float

* Remove conflicting annotations

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-11 15:29:17 +02:00
6b610d89f1 Revert "Trigger doc-builder job after style bot" (#38735)
Revert "Trigger doc-builder job after style bot (#38398)"

This reverts commit 51e0fac29fc3994d49dfbfd1c8d085d29360d393.
2025-06-11 14:56:39 +02:00
0bf53e69e2 [DeepSeek-V3] implement when q_lora_rank is None (#38743)
* implement when q_lora_rank is None

* make style and quality
2025-06-11 13:35:10 +01:00
ye
b426c2b313 fix: bf16 with TPU is allowed in configuration (#38670)
* fix: tpu bf16

* fix: style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:35:01 +00:00
c8c1e525ed from 1.11.0, torchao.prototype.low_bit_optim is promoted to torchao.optim (#38689)
* since 1.11.0, torchao.prototype.low_bit_optim is promoted to
torchao.optim

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-11 12:16:25 +00:00
56a7cf5546 fix: Add method to get image features in PaliGemmaForConditionalGeneration (#38730)
* fix: Add method to retrieve image features in PaliGemmaForConditionalGeneration

* feat: Add get_image_features method to multiple models for image feature extraction

* fix: reformat the files with ruff.

* feat: Add methods for packing and retrieving image and video features across multiple models

modified:
- modeling_chameleon.py
- modeling_llava_next.py
- modular_llava_next_video.py
- modeling_qwen2_vl.py

and generate the:
- modeling_llava_next_video.py
- modeling_llava_onevision.py
- modeling_qwen2_5_vl.py

* feat: Implement get_image_features method in Aria, Mistral3, and VipLlava models with updated parameters

* fix: reformatted the code with fix-style
2025-06-11 10:26:31 +00:00
380e6ea406 [llava] fix integration tests with Siglip (#38732)
fix llava siglip test
2025-06-11 08:09:16 +00:00
f1849eab22 Fixed a multiple-devices issue in SmolVLM model (#38736)
Fixed a multiple-devices issue in SmolVLMModel (#38557)

* Fixed a multiple-devices issue in SmolVLMModel

* Changed the modular to reflect changes
2025-06-11 10:08:01 +02:00
aa798b7ac9 New canine model card (#38631)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* commit for new canine model card.

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/canine.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* implemented suggestion by @stevhliu.

* Update canine.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-10 09:30:05 -07:00
e28fb26e7d Add AGENTS.md (#38734)
* More name sync

* repeatedly underlining "WRITE LESS, ROBOT"

* fewer, commas, please

* Clarify "copied from"

* Clarify "copied from"

* Mention test dependencies

* Added a line on preferring `modular` style
2025-06-10 16:27:37 +00:00
cb4c56ce0d Fix typo in Language Modeling example scripts and update TPU type (#38652)
* Fix typo that prevents the examples to be run correctly

* return .TPU in accelerator.distributedtype comparison
2025-06-10 13:43:35 +00:00
8ff22e9d3b [add-new-model-like] Robust search & proper outer '),' in tokenizer mapping (#38703)
* [add-new-model-like] Robust search & proper outer '),' in tokenizer mapping

* code-style: arrange the importation in add_new_model_like.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-10 12:25:12 +00:00
8340e8746e Use OSError (#38712)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-10 12:13:49 +00:00
8257734b5f Fix llava tests (#38722)
* update

* fix 1

* fix 2

* fix 3

* fix 4

* fix 5

* fix 6

* fix 7

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 13:53:17 +02:00
71f7385942 Logging message for `` is_bitsandbytes_available() `` (#38528)
* bnb import log

* bnb import log

* log mesage change

* moved error issue in qunatizer_bnb_4_bit.py

* ruff

* arg added for bnb check

* required changes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-10 10:15:01 +00:00
04cdf83244 Update some tests for torch 2.7.1 (#38701)
* fix 1

* fix 2

* fix 3

* fix 4

* fp16

* break

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-10 11:46:52 +02:00
afdb821318 Fix smart resize (#38706)
* Fix smart_resize bug

* Add smart_resize test

* Remove unnecessary error checking

* Fix smart_resize tests

---------

Co-authored-by: Richard Dong <rdong@rdong.c.groq-143208.internal>
2025-06-10 08:59:22 +00:00
81799d8b55 Standardize ByT5 model card format (#38699)
* Standardize ByT5 model card format

* Apply review feedback from @stevhliu

* Fix Notes formatting and wording

* Fix `aya_vision` test (#38674)

* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix autodoc formatting for ByT5Tokenizer

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 15:02:50 -07:00
e55983e2b9 Fix aya_vision test (#38674)
* fix 1: load_in_4bit=True,

* fix 2: decorateor

* fixfix 2: breakpoint

* fixfix 3: update

* fixfix 4: fast

* fixfix 5: cond

* fixfix 5: cond

* fixfix 6: cuda 8

* ruff

* breakpoint

* dtype

* a10

* a10

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-09 22:18:52 +02:00
b61c47f5a5 Created model card for xlm-roberta-xl (#38597)
* Created model card for xlm-roberta-xl

* Update XLM-RoBERTa-XL model card with improved descriptions and usage examples

* Minor option labeling fix

* Added MaskedLM version of XLM RoBERTa XL to model card

* Added quantization example for XLM RoBERTa XL model card

* minor fixes to xlm roberta xl model card

* Minor fixes to mask format in xlm roberta xl model card
2025-06-09 13:00:38 -07:00
e594e75f1b Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout (#38596)
* Update XLM-RoBERTa model documentation with enhanced usage examples and improved layout

* Added CLI command example and quantization example for XLM RoBERTa model card.

* Minor change to transformers CLI and quantization example for XLM roberta model card
2025-06-09 12:26:31 -07:00
29ca043856 Created model card for XLM model (#38595)
* Created model card for XLM model

* Revised model card structure and content of XLM model

* Update XLM model documentation with improved examples and code snippets for predicting <mask> tokens using Pipeline and AutoModel.
2025-06-09 12:26:23 -07:00
25f711aa89 Drop as_target_processor from the _call_ and pad methods (#38642)
Drop as_target_processor from _call_ and pad methods; reformat docstrings for readability
2025-06-09 12:26:09 -07:00
837ddac1ec Docs: update bitsandbytes torch.compile compatibility (#38651) 2025-06-09 14:51:57 -04:00
b9faf2f930 Fix TypeError: 'NoneType' object is not iterable for esm (#38667) (#38668)
Add post_init() calls to EsmForMaskedLM, EsmForTokenClassification and EsmForSequenceClassification.
2025-06-09 15:23:20 +00:00
11dca07a10 Fix retrieve function signature and remove faiss requirement (#38624)
Signed-off-by: Fiona Waters <fiwaters6@gmail.com>
2025-06-09 15:17:33 +00:00
b31d462c61 Fix some models import (#38694)
Fix models import
2025-06-09 16:09:24 +01:00
282d6684dc Fix attention mask expansion when converting to executorch (#38637) 2025-06-09 15:00:55 +00:00
19224c3642 fix: "check out" as verb (#38678)
"check out" as verb
2025-06-09 14:07:31 +00:00
237ff80387 Fixed modeling_auto.py MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variable (#38664)
fix: grouped the two MODEL_FOR_MASK_GENERATION_MAPPING_NAMES variables
2025-06-09 13:40:46 +00:00
d7b87b415a Fix qwen2-audio chat template audio placeholder insertion (#38640)
* fix qwen2-audio template

Signed-off-by: Isotr0py <2037008807@qq.com>

* add message['type'] back

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-09 09:56:42 +00:00
10627c1a0f Use torch 2.7.1 on daily CI (#38620)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-08 14:37:45 +02:00
ebeec13609 Fix InternVL integration test (#38612)
* fix

* fix

* fix OOM

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-07 08:30:47 +02:00
3fb7e7bc01 Skip torchscript tests for 2 models (#38643)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 20:17:37 +02:00
dc76eff12b remove ipex_optimize_model usage (#38632)
* remove ipex_optimize_model usage

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* update Dockerfile

Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Signed-off-by: root <root@a4bf01945cfe.jf.intel.com>
Co-authored-by: root <root@a4bf01945cfe.jf.intel.com>
2025-06-06 20:04:44 +02:00
5009252a05 Better CI (#38552)
better CI

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 17:59:14 +02:00
2e889c18e1 fix torch_dtype on awq (#38463)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 17:14:00 +02:00
871901cb3d fix total batch size calculation in trainer (#38286)
* fix total batch size calculation

* update

Signed-off-by: inkcherry <mingzhi.liu@intel.com>

* Update src/transformers/trainer.py

---------

Signed-off-by: inkcherry <mingzhi.liu@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 14:54:00 +00:00
02f946a038 Don't run AriaForConditionalGenerationModelTest on CircleCI (#38615)
git rid of this model

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 11:30:31 +02:00
3d15606e64 fix: support grad clipping for TP through replicating non-sharded modules (#36132)
* feat: fix tp grad norm:

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: use implicit replication

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-06 11:07:22 +02:00
fca6748246 Improve test_initialization for SwiftFormer (#38636)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:47:10 +02:00
92a87134ea update ColQwen2ModelIntegrationTest (#38583)
* update

* update

* update

* update

* 4 bit

* 8 bit

* final

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:41:17 +02:00
dbfc79c17c [generation] bring back tests on vision models (#38603)
* bring back geenration tests on VLMs

* remove head mask tests overwritten
2025-06-06 08:23:15 +00:00
90c4b90a10 Use torch 2.7.1 on CircleCI jobs (#37856)
2.7.1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:16:57 +02:00
3e35ea1782 Improve test_initialization (#38607)
* fix flaky init tests

* fix flaky init tests

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-06 10:08:05 +02:00
89542fb81c enable more test cases on xpu (#38572)
* enable glm4 integration cases on XPU, set xpu expectation for blip2

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* refine wording

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* refine test case names

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* run

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* add gemma2 and chameleon

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix review comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-06-06 09:29:51 +02:00
31023b6909 Fix MiniMax (docs and integration tests checkpoint) (#38575)
* update checkpoints for integration tests

* minor fixes in docs
2025-06-06 08:43:11 +02:00
593e29c5e2 Updated Aria model card (#38472)
* Update aria.md

* Update aria.md

* Suggested Updates - aria.md
2025-06-05 14:36:54 -07:00
77cf4936fe [Nit] Add Note on SigOpt being in Public Archive Mode (#38610)
* add note on sigopt

* update

* Update docs/source/en/hpo_train.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-05 14:07:23 -07:00
c75bf2c36e Fix typo in LLaVa documentation (#38618)
* Fix typo in LLaVa documentation

In exactly one section, LlavaImageProcessor was spelt wrongly as LLavaImageProcessor, which throws off copy-pasting the section.

* Fix LlavaImageProcessor url to make it valid (and copypaste-able)

Earlier, the URL contained the entire HF prefix. This commit removes that to ensure that the code block can be copied and run as is.
2025-06-05 13:25:07 -07:00
5399c1d670 docs: fix dark mode logo display. (#38586) 2025-06-05 13:06:59 -07:00
481b953170 Fix return_dict=False giving errors in a few VLM models (#38519)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 21:19:07 +02:00
88912b8e95 Remove isort from dependencies (#38616)
Removed isort as a dependency
2025-06-05 16:42:49 +00:00
fa921ad854 fix spelling errors (#38608)
* fix errors test_modeling_mllama.py

* fix error test_modeling_video_llava.py

* fix errors test_processing_common.py
2025-06-05 13:57:23 +01:00
0f833528c9 Avoid overwrite existing local implementation when loading remote custom model (#38474)
* avoid overwrite existing local implementation when loading custom remote model

Signed-off-by: Isotr0py <2037008807@qq.com>

* update comments

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-06-05 13:54:40 +01:00
8f630651b0 Allow mlm_probability to be set to None when mlm=False in DataCollatorForLanguageModeling (#38522) (#38537)
* mlm_probability in DataCollatorForLanguageModeling should be validated only when mlm is True (#38522)

* Change mlm_probability to Optional in DataCollatorForLanguageModeling (#38537)

---------

Co-authored-by: eak <eak@ivalua.com>
2025-06-05 13:54:12 +01:00
65f5fa71cd Bump torch from 2.6.0 to 2.7.1 in /examples/flax/vision (#38606)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.6.0 to 2.7.1.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.6.0...v2.7.1)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.7.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-06-05 13:38:02 +01:00
8c59cdb3f8 pin pandas (#38605)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 11:33:06 +02:00
8cfcfe58c0 Remove custom pytest and pluggy (#38589)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-05 10:23:40 +02:00
0d69fa6dcd [qwen-omni] fix sliding window (#38525)
fix
2025-06-05 10:11:58 +02:00
1fed6166c0 added fast image processor for ZoeDepth and expanded tests accordingly (#38515)
* added fast image processor for ZoeDepth and expanded tests accordingly

* added fast image processor for ZoeDepth and expanded tests accordingly, hopefully fixed repo consistency issue too now

* final edits for zoedept fast image processor

* final minor edit for zoedepth fast imate procesor
2025-06-04 22:59:17 +00:00
a510be20f3 Updated deprecated typing imports with equivalents for Python 3.9+ (#38546)
* Replace deprecated typing imports with collections.abc equivalents for Python 3.9+

* Fixed code quality

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-06-04 16:57:23 +00:00
8e1266de2b New gpt neo model card (#38505)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Commit for new_gpt_model_card.

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/gpt_neo.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-04 09:56:47 -07:00
8046aff520 tests/roformer: fix couple roformer tests on gpus (#38570)
Fix "RuntimeError: Expected all tensors to be on the same device,
but found at least two devices, cuda:0 and cpu" error running the
following roformer tests on GPUs (CUDA or XPU):

```
tests/models/roformer/test_modeling_roformer.py::RoFormerSinusoidalPositionalEmbeddingTest::test_basic
tests/models/roformer/test_modeling_roformer.py::RoFormerSelfAttentionRotaryPositionEmbeddingTest::test_apply_rotary_position_embeddings
```

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
2025-06-04 18:45:56 +02:00
b9c17c5dc0 [Dinov2] Enable device_map="auto" support (#38487)
* Fix: resolve import order and duplicate import (ruff I001, F811)

* Format: clean up Dinov2 test file with ruff formatter

* Add _no_split_modules = ['Dinov2Layer'] to enable device_map='auto'

* Revert dinov2_with_registers _no_split_modules to original state

* Remove redundant device_map test as suggested

* Remove unused import after deleting test

* removed import  torch and the redundant test function

* Update tests/models/dinov2/test_modeling_dinov2.py

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-06-04 15:42:40 +00:00
ae3733f06e feat: add repository field to benchmarks table (#38582)
* feat: add `repository` field to benchmarks table

* fix: remove unwanted `,`
2025-06-04 15:40:52 +02:00
1285aec4cc Docs: fix code formatting in torchao docs (#38504) 2025-06-04 12:35:21 +00:00
6c5d4b1dd2 allow custom head_dim for qwen2_moe (#37188)
allow custom head_dim

Co-authored-by: ryan.agile <ryan.agile@kakaobrain.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-04 12:27:30 +00:00
82fa68ca14 fix(attention_visualizer): add default value for image_seq_length (#38577) 2025-06-04 12:20:31 +00:00
1dc619e59f [FlexAttn] Fix models with unique characteristics (#38433)
* fix

* style

* check

* check 2

* add deepseek workaround
2025-06-04 13:37:28 +02:00
ff3fad61e3 Fix deepseekv3 (#38562)
* fix 1

* fix 2

* fix 3

* fix 4

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 11:40:14 +02:00
6085cded38 update utils/notification_service.py for AMD vs Nvidia (#38563)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 11:38:25 +02:00
3c995c1fdc Fix chameleon tests (#38565)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-04 10:13:35 +02:00
55736eea99 Add support for MiniMax's MiniMax-Text-01 (#35831)
* end-to-end architecture

* lightning-attn: refactor, clean, optimize

* put minimax_text_01 in other files

* use latest __init__ standards and auto-generate modular

* support attention_mask for lightning-attn

* Revert "use latest __init__ standards and auto-generate modular"

This reverts commit d8d3c409d89e335c98a8cd36f47304a76eac7493.

* fix modular conversion

* pass both attention masks instead of tuple

* formatting

* Updated Dynamic Cache

* created MiniMaxText01Cache

* fix hardcoded slope_rate

* update attn_type_list in config

* fix lightning when use_cache=False

* copy tests from mixtral

* (checkpoint) all tests pass for normal attention

* fix all unittests

* fix import sorting

* fix consistency and formatting tests

* fix config

* update tests, since changes in main

* fix seq_len error

* create dummy docs

* fix checkpoint

* add checkpoint in config docstring

* run modular_conversion

* update docs

* fix checkpoint path and update tests

* fix ruff

* remove repeated expected_slice

* update docs

* rename "minimax-text-01" to "minimax"

* inherit config from mixtral

* remove from docs in other languages

* undo files that should be untouched

* move minimax to end in conversation docs

* use MiniMaxForCausalLM as it is

* ruff fixes

* run modular

* fix docstring example in causallm

* refactor attention loop and decay factors

* refactor config in modular

* run modular

* refactor cache

* rename static_cache to linear_cache

* make positional embeddings necessary

* remove unnecessary layernorms declarations

* fix import in tests

* refactor attention in next tokens

* remove outdated code

* formatting and modular

* update tests

* rename layernorm alpha/beta factors

* register decay factors as buffers

* remove unused declarations of decay factors

* update config for alpha/beta factors

* run modular

* remove head_dim in tests

* remove minimax from fx.py

* remove stuff that is not really needed

* update __init__

* update qkv torch.split

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix qkv torch.split

* quality fixes

* remove mistakenly added dummy

* purge unused ModelTester code

* fix-copies

* run fix-copies

* fix head_dim

* write cache formatting tests

* remove postnorm

* avoid contiguous in attention current states

* update expected_slice

* add generation test for integration

* fix dtype in generation test

* update authors

* update with changes in main

* update graident checkpointing and minor fixes

* fix mutable attn_type_list

* rename: attn_type -> layer_type

* update for layer_types

* update integration tests

* update checkpoint

* clean overview in docs

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00
037acf1d10 [janus] Fix failing tests on mi3XX (#38426)
* Fix multiple devices error on Janus

* Fix AttributeError on Janus BOI token

* Initialize lm first in Janus to get correct device map

* Added expectations for Janus test_model_generate_images

* Fixed JanusVisionEncoderLayer being split across devices

* Code formatting

* Adding modeling file

* Reverted changes out of scope for this PR
2025-06-04 09:38:10 +02:00
78d771c3c2 [docs] Format fix (#38414)
fix table
2025-06-03 09:53:23 -07:00
0f41c41a46 Fix hqq issue (#38551)
* bc

* style
2025-06-03 17:58:31 +02:00
279000bb70 Name change AOPermod -> ModuleFqn (#38456)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-06-03 15:43:31 +00:00
e8b292e35f Fix utils/notification_service.py (#38556)
* fix

* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 13:59:31 +00:00
8cb96787a6 Explicitly setting encoding in tokenization_utils_base.py (#38553)
Update tokenization_utils_base.py

Add encoding explicitly
2025-06-03 12:08:35 +00:00
caf708da1b [TP] Change command in tests to python3 (#38555)
* Fix: change to `python3`

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 11:03:33 +00:00
fdf86fb440 [bugfix] [WIP] fix apply_rotary_emb error on Ascend NPU (#38491)
[bugfix] fix apply_rotary_emb error on Ascend NPU
2025-06-03 09:31:49 +00:00
ca0a682796 Update docker image to use av (#38548)
* Update

* Update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-03 11:04:41 +02:00
814432423c update emu3 test (#38543)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-06-03 11:02:01 +02:00
55ec319de6 Don't use default attn if pre-set in sub-config (#38526)
* don't use default attn if pre-set in sib-config

* style

* add a test maybe
2025-06-03 07:53:07 +00:00
bf68dd9e6e [tests] expand flex-attn test for vision models (#38434)
* expand the test for VLMs

* typo

* mark models `supports_flex` + expand test for additional kwargs

* flex attn for refactored vision models

* fix copies

* fix

* unskip

* style

* address comments
2025-06-03 07:40:44 +00:00
de4cf5a38e Fix blip2 tests (#38510)
* fix 1: not sure

* fix 2: _supports_flex_attn = False

* fix 3: embedding_output = self.layernorm(query_embeds.to(self.layernorm.weight.dtype))

* fix 4: query_embeds = query_embeds.to(self.layernorm.weight.dtype)

* fix 5: text_embeds = text_embeds.to(dtype=torch.float16)

* fix 5: question_embeds.to(dtype=torch.float16)

* fix 6: text_embeds = text_embeds.to(dtype=self.itm_head.weight.dtype)

* fix 7: image_embeds and question_embeds

* fix 8: fix other 2 fp16 tests

* fix 9: fix T5 OOM

* fix 10: fix T5 OOM

* fix 11: fix T5

* fix 11: fix T5 beam

* fix 12: _supports_sdpa=False

* fix 12: style and expect

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-02 22:46:35 +02:00
ccc859620a Fix Gemma2IntegrationTest (#38492)
* fix

* fix

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* skip-ci

* update

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-02 22:45:09 +02:00
1094dd34f7 Remove type annotation in Siglip Attention Module (#38503)
* Remove type annotation

* remove print statement
2025-06-02 17:51:07 +02:00
afb35a10ed Num parameters in model.safetensors.index.json (#38531)
Num parameters in index.json
2025-06-02 17:16:31 +02:00
cceab972ba [flax/mistral] support sliding_window: null in config (#37402)
flax/mistral: Allow sliding_window to be set to none
2025-06-02 16:45:02 +02:00
1a25fd2f6d Fix amp deprecation issue (#38100)
apex amp is deprecated
2025-06-02 16:15:41 +02:00
05ad826002 remove unhandled parameter (#38145) 2025-06-02 15:57:32 +02:00
c72ba69441 Add ColQwen2 to 🤗 transformers (#35778)
* feat: add colqwen2 (wip)

* tests: fix test_attention_outputs

* tests: reduce hidden size to accelerate tests

* tests: fix `test_attention_outputs` 🥳

* fix: fix wrong parent class for `ColQwen2ForRetrievalOutput`

* fix: minor typing and style changes

* chore: run `make style`

* feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor`

* tests: tweak comments

* style: apply ruff formatter

* feat: move default values for `visual_prompt_prefix` and `query_prefix`

* docs: update ColQwen2 model card

* docs: tweak model cards

* docs: add required example config checkpoint

* tests: update expected scores in integration test

* docs: tweak quickstart snippets

* fix: address PR comments

* tests: fix colqwen2 tests + tweak comment in colpali test

* tests: unskip useful tests

* fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string

* fix: fix ColPali outputs when `return_dict == False`

* fix: fix issue with PaliGemma output not being a dict

* docs: set default dtype to bfloat16 in quickstart snippets

* fix: fix error when `return_dict=False` in ColPali and ColQwen2

* tests: fix special tokens not being replaced in input_ids

* style: fix lint

* fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json`

* fix: remove unused `padding_side` in ColQwen2 model

* docs: update ColQwen2's model doc

* fix: fix harcoded vlm backbone class in ColQwen2Config

* fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs

* docs: fix typo in model docstring

* docs: add illuin mention in model docs

* fix: let `padding_size` be handled by `tokenizer_config.json`

* docs: add colpali reference url in colqwen2's model doc

* docs: add Hf mention in model docs

* docs: add late interaction mention in model docs

* docs: tweak colqwen2 model doc

* docs: update reference checkpoint for ColPali to v1.3

* docs: simplify quickstart snippets

* docs: remove redundant `.eval()`

* refactor:  use `can_return_tuple` decorator for ColPali and ColQwen2

* docs: fix copyright date

* docs: add missing copyright in tests

* fix: raise error when `initializer_range` is not in config

* docs: remove redundant `.eval()` in colpali doc

* fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute

See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config.

* fix: add missing `initializer_range` attribute in `ColQwen2Config`

* fix: use `get_text_config` in `resize_token_embeddings`

* update colwen2 with auto_docstring

* docs: fix wrong copyright year

* chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config`

* refactor: merge `inner_forward` into `forward`

* Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code

* protect torch import in modular to protect in processing

* protect torch import in modular to protect in processing

* tests: fix hf model path in ColQwen2 integration test

* docs: clarify `attn_implementation` and add comments

* docs: add fallback snippet for using offline PIL dummy images

* docs: temporarily revert attn_implementation to `None` while sdpa is not fixed

* docs: tweaks in colpali/colqwen2 quick start snippets

* fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model

* fix: add missing changes in modular file

* fix modeling tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-02 12:58:01 +00:00
beaed8ce01 [generate] move SinkCache to a custom_generate repo (#38399)
remove sink cache
2025-06-02 12:13:30 +02:00
fe5bfaa4b5 [generate] add soft deprecations on custom generation methods (#38406)
soft deprecations
2025-06-02 12:11:46 +02:00
a75b9ffb5c Update Loss Functions to Accept Tensor num_items_in_batch (#38029)
* Update Loss Functions to Accept Tensor num_items_in_batch

* Fix device mismatch by moving num_items_in_batch to loss device in fixed_cross_entropy

* fix the ruff check

* delete the unused if stat

* fix the type problem
2025-06-02 11:31:44 +02:00
493cf1554b [seamless_m4t] Skip some tests when speech is not available (#38430)
* Added the require_speech decorator

* Added require_speecj to some seamless_m4t tests

* Changed skip message
2025-06-02 09:17:28 +00:00
64d14ef28d Fix setting FLASH_ATTENTION_DETERMINISTIC after importing (#37185)
transformers.enable_full_determinism enables deterministic
flash attention using `FLASH_ATTENTION_DETERMINISTIC`
800510c67b/src/transformers/trainer_utils.py (L79)

However, current checks use a global variable `deterministic_g`,
which will do the environment variable check as soon as importing,
this will cause issues as users can call
`transformers.enable_full_determinism` after
`transformers.modeling_flash_attention_utils` is imported. This
behavior is introduced in
https://github.com/huggingface/transformers/pull/33932/files#r1806668579
to fix the graph break.

As a result, this PR implement fixes by delaying the environment variable
check to the first time when `_flash_attention_forward` is executed, so
that we can fix this issue and we won't introduce a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-06-02 11:08:20 +02:00
fde1120b6c Remove deprecated use_flash_attention_2 parameter (#37131)
Signed-off-by: cyy <cyyever@outlook.com>
2025-06-02 11:06:25 +02:00
51d732709e [docs] add xpu environment variable for gpu selection (#38194)
* squash commits

* rename gpu

* rename accelerator

* change _toctree.yml

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: sdp <sdp@a4bf01943ff7.jf.intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-05-30 16:05:07 +00:00
c7f2b79dd8 protect dtensor import (#38496)
protect
2025-05-30 17:36:00 +02:00
051a8acc9a Align TP check (#38328)
align tp check
2025-05-30 17:15:39 +02:00
e0545ef0b8 [Tests] Reduced model size for albert-test model (#38480)
* Reduced model size for albert-test model

* Run checks

* Removed test_save_load

* Removed test skipping functions
2025-05-30 14:22:32 +00:00
f962c862ff Bump torch from 2.2.0 to 2.6.0 in /examples/flax/vision (#37618)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.2.0 to 2.6.0.
- [Release notes](https://github.com/pytorch/pytorch/releases)
- [Changelog](https://github.com/pytorch/pytorch/blob/main/RELEASE.md)
- [Commits](https://github.com/pytorch/pytorch/compare/v2.2.0...v2.6.0)

---
updated-dependencies:
- dependency-name: torch
  dependency-version: 2.6.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-05-30 14:04:52 +01:00
98568d1e25 Fix incorrect bbox_embed initialization when decoder_bbox_embed_share=False in GroundingDINO (#38238)
* A shallow copy in groundingdino
Fixes #37333

* Supprimer une ligne vide dans la classe GroundingDinoForObjectDetection

* Translate comments in the GroundingDinoForObjectDetection class from French to English
2025-05-30 15:02:18 +02:00
d0fccbf7ef Fix convert_internvl_weights_to_hf.py to support local paths (#38264)
fix(internvl): add local path support to convert_internvl_weights_to_hf.py
2025-05-30 14:56:32 +02:00
858ce6879a make it go brrrr (#38409)
* make it go brrrr

* date time

* update

* fix

* up

* uppp

* up

* no number i

* udpate

* fix

* [paligemma] fix processor with suffix (#38365)

fix pg processor

* [video utils] group and reorder by number of frames (#38374)

fix

* Fix convert to original state dict for VLMs (#38385)

* fix convert to original state dict

* fix

* lint

* Update modeling_utils.py

* update

* warn

* no verbose

* fginal

* ouft

* style

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: hoshi-hiyouga <hiyouga@buaa.edu.cn>
2025-05-30 11:19:42 +02:00
ab5067e7fd fix: handle no scheduler passed by user (#38407) 2025-05-30 11:00:44 +02:00
42ef218b58 [Qwen2.5-Omni] Fix dtype of cos,sin when used with flash attention (#38453)
* Fix dtype of cos,sin when used with flash attention

* Fix dtype of cos,sin when used with flash attention
2025-05-29 18:24:40 +00:00
81cff7ad34 Fix Gemma3IntegrationTest (#38471)
* check

* check

* check

* check

* check

* check

* check

* test style bot

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-29 16:51:12 +02:00
e508965df7 Cleanup BatchFeature and BatchEncoding (#38459)
* Use dict comprehension to create dict

* Fix type annotation

Union[Any] doesn't really make any sense

* Remove methods that are already implemented in the `UserDict` parent
class
2025-05-29 14:13:43 +00:00
8e5cefcb1e Fix TypeError in save_pretrained error handling (fixes #38422) (#38449) 2025-05-29 13:58:16 +00:00
ad9dd3d17b 🔴 [VLM] modeling updates (#38317)
* updates

* fixup

* fix tests

* fix test

* fix

* let it be here for now, till monday

* two more fixes

* persimmon

* fixup

* fix

* fixup

* make sure fuyu runs now that LM has new attn API

* fixup + tests

* qwen vl uses new mask interface as well

* qwen image features format

* update

* remove image_sizes

* address comments

* i am dumb...
2025-05-29 11:08:23 +00:00
a6f7acb603 [Tests] Clean up test cases for few models (#38315)
* Update tests

* revert aria change

* too slow hence revert
2025-05-29 08:21:28 +00:00
8010f3cf61 feat: add cache retention for requests (#38446)
* feat: add cache retention for requests

* fix: propagate `manual_eviction` param & refactor `finish_request`

`finish_request` now only takes `request_id: str` as an input rather
than the full `RequestState`, which was not needed and simplifies
calling from `ContinuousBatchingManager::evict_request_from_cache`

* refactor: pop req from `active_requests`

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 18:15:10 +00:00
66da700145 Fix GLM4 checkpoints (#38412)
* fix

* fix

* fix

* fix

* fix

* fix

* test style bot

* Apply style fixes

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-28 16:40:08 +00:00
2872e8bac5 Merge type hints from microsoft/python-type-stubs (post dropping support for Python 3.8) (#38335)
* Merge type hints from microsoft/python-type-stubs (post Python 3.8)

* Remove mention of pylance

* Resolved conflict

* Merge type hints from microsoft/python-type-stubs (post Python 3.8)

* Remove mention of pylance

* Resolved conflict

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Avasam <samuel.06@hotmail.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-05-28 16:21:40 +00:00
942c60956f Model card for mobilenet v1 and v2 (#37948)
* doc: #36979

* doc: update hfoptions

* add model checkpoints links

* add model checkpoints links

* update example output

* update style #36979

* add pipeline tags

* improve comments

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply suggested changes

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:20:19 -07:00
9a8510572b Updated the model card for ViTMAE (#38302)
* Update vit_mae.md

* badge float:right

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/vit_mae.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update model_doc/vit_mae.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:43 -07:00
c9fcbd5bf9 Updated the Model docs - for the ALIGN model (#38072)
* Updated the Model docs - for the ALIGN model

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Updated align.md

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/align.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update align.md

* fix

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-28 09:19:09 -07:00
cba94e9272 Fix handling of slow/fast image processors in image_processing_auto.py (#38161)
Fix wrong error when torchvision is not installed
2025-05-28 16:00:23 +00:00
21b10d9aa4 Fix from_args_and_dict ProcessorMixin (#38296)
* fix-from-args-and-dict-processormixin

* change used_kwargs to valid_kwargs

* remove manual valid_kwargs

* fix copies

* fix modular aria
2025-05-28 11:46:33 -04:00
f844733568 Fix MoE gradient test (#38438) 2025-05-28 16:44:20 +01:00
0ed6f7e6b4 Remove redundant test_sdpa_equivalence test (#38436)
* Remove redundant test

* make fixup
2025-05-28 17:22:25 +02:00
51e0fac29f Trigger doc-builder job after style bot (#38398)
* update

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 17:15:34 +02:00
c24d18bbae Fix convert weights for InternVL (#38233)
Fix internvl convert weights
2025-05-28 11:14:56 -04:00
8850427242 Fix typo in tokenization_utils_base.py docstring (#38418)
Fix typo in tokenization_utils_base.py
2025-05-28 14:52:10 +00:00
bab40c6838 [core] support tensor-valued _extra_state values in from_pretrained (#38155)
Support tensor-valued _extra_state values

TransformerEngine uses the pytorch get/set_extra_state API to store FP8
layer config information as bytes Tensor in the _extra_state entry in
the state dict. With recent changes to from_pretrained, this
functionality has broken and loading a model that uses this API doesn't
appear to work. This PR fixes the save/load pretrained functions for
extra state entries that use a pytorch tensor, and adds a (currently
x-failing) test for a dictionary extra state.

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-05-28 15:38:42 +02:00
badc71b9f6 🔴[Attention] Attention refactor for Whisper-based models (#38235)
* start refactoring whisper

* revert for now

* first step

* carry over attn fixes

* check if this works

* whisper has an off by one somewhere - cutting mask in any interface

* make it based on interface

* remove some tests that were skipped but now work

* some fixes for whisper tests

* interface changes

* change the order of fix

* some attention adjustments for eager + TP

* fix scaling

* mask changes

* why does whisper contain those extra seq lens?

* fix from config for fa2 as input_ids is invalid

* fix another test

* another fix

* disable flex attn due to compile issues

* copies and refactor for qwen audio since it somewhat relies on whisper

* fix scaling and smaller things

* retrigger

* new new interface version + more fixups

* adjust qwen

* add comment

* forgot this one

* change copies as whisper cuts on the mask

* add guard

* add flex attention

* switch to new mask function + add skips for torchscript

* remove old api with cache position

* last changes?

* trigger ci
2025-05-28 13:32:38 +02:00
565a0052ed make Llama4TextMoe forward more readable (#37529)
* update forward of Llama4TextMoe

* remove redudant transpose

* fix formatting

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-28 11:54:45 +02:00
defeb04299 Fix CircleCI not triggered when PR is opened from a branch of huggingface/transformers (#38413)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 11:25:43 +02:00
593276fe1e Update error when using additional and/or masks (#38429)
update error
2025-05-28 11:08:49 +02:00
3aab6e95cb Disable mi210 scheduled CI (#38411) 2025-05-28 10:35:41 +02:00
fb82a98717 enable large_gpu and torchao cases on XPU (#38355)
* cohere2 done

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* enable torchao cases on XPU

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* rename

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix comments

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: Matrix YAO <matrix.yao@intel.com>
2025-05-28 10:30:16 +02:00
cea254c909 Update CsmForConditionalGenerationIntegrationTest (#38424)
* require_read_token

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-28 10:20:43 +02:00
baddbdd24b [qwen-vl] Look for vocab size in text config (#38372)
fix qwen
2025-05-28 09:32:26 +02:00
a974e3b4e1 Fix an error in verify_tp_plan for keys without '.' (#38420) 2025-05-28 09:30:43 +02:00
b1eae943a2 Change slack channel for mi250 CI (#38410) 2025-05-28 09:20:34 +02:00
5f49e180a6 Add mi300 to amd daily ci workflows definition (#38415) 2025-05-28 09:17:41 +02:00
3b3ebcec40 Updated model card for OLMo2 (#38394)
* Updated OLMo2 model card

* added command line

* Add suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Added suggestions

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Indented code block as per suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 16:24:36 -07:00
f5307272f5 Falcon-H1 - Fix auto_docstring and add can_return_tuple decorator (#38260)
Fix auto_docstring and add can_return_tuple
2025-05-27 16:18:05 -04:00
a092f6babf Update granite.md (#37791)
* Update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update granite.md

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/granite.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* minor fixes

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 12:55:15 -07:00
be7aa3210b New bart model card (#37858)
* Modified BART documentation wrt to issue #36979.

* Modified BART documentation wrt to issue #36979.

* fixed a typo.

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bart.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* blank commit.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:41 -07:00
587c1b0ed1 Updated BERTweet model card. (#37981)
* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

* Updated BERTweet model card.

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bertweet.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* updated toctree (EN).

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:51:22 -07:00
b73faef52f Updated BigBird Model card as per #36979. (#37959)
* Updated BigBird Model card as per #36979.

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/big_bird.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 11:24:28 -07:00
538e847c06 Updated Zoedepth model card (#37898)
* Edited zoedepth model card according to specifications.

* Edited Zoedepth model file

* made suggested changes.
2025-05-27 10:06:53 -07:00
4f7b0ff8d1 Update Model Card for Mamba-2 (#37951)
* update model page.

* update model page.

* Update docs/source/en/model_doc/mamba2.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* update the model page.

* update.

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Apply the suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add an quantization example and update the toctree.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* remove the additional comma

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-27 10:06:39 -07:00
9c50576860 [mllama] Allow pixel_values with inputs_embeds (#38334)
* Allow pixel_values and inputs_embeds at the same time

* remove unnecessary overwritten tests
2025-05-27 16:33:56 +00:00
0f5a8243c4 [tests] remove overload for deleted test (test_offloaded_cache_implementation) (#37896)
* remove overload for deleted tests

* make fixup
2025-05-27 16:45:15 +01:00
f85fd90407 [cleanup] delete deprecated kwargs in qwen2_audio 🧹 (#38404)
delete deprecated
2025-05-27 16:08:53 +01:00
b9f8f863d9 [CSM] update model id (#38211)
* update model id

* codec_model eval

* add processor img

* use ungated repo for processor tests
2025-05-27 17:03:55 +02:00
07dd6b2495 Add report_repo_id to mi300 workflow (#38401) 2025-05-27 16:35:07 +02:00
3142bd8592 [CSM] infer codec model with no_grad + audio eos label (#38215)
* infer codec model with no_grad

* codec_model eval

* training labels: add audio eos token
2025-05-27 14:10:17 +00:00
10ae443ec0 Fix Qwen2.5-VL Video Processor (#38366)
* Update processing_qwen2_5_vl.py

* Update processing_qwen2_5_vl.py

* Update modular_qwen2_5_vl.py

* Fix CI

* Update modular_qwen2_5_vl.py

* Update processing_qwen2_5_vl.py

* Update video_processing_utils.py
2025-05-27 13:46:37 +02:00
80902ae9b1 [chat] use the checkpoint's generation_config.json as base parameterization (#38330)
* use model gen config

* unwanted diff
2025-05-27 10:35:33 +00:00
008e0d87c5 Fix convert to original state dict for VLMs (#38385)
* fix convert to original state dict

* fix

* lint

* Update modeling_utils.py
2025-05-27 10:27:59 +00:00
c769483188 [chat] improvements for thinking models and reduce default verbosity (#38322)
misc improvements
2025-05-27 10:20:58 +00:00
55f2333366 guard size mismatch check to only quantized models (#38397)
fix
2025-05-27 11:45:03 +02:00
1a5be2f5c0 [aya vision] fix processor for vLLM (#38371)
accidentally merged two PRs in one (;-_-)
2025-05-27 09:43:53 +00:00
19fdb75cf0 [video utils] group and reorder by number of frames (#38374)
fix
2025-05-27 11:32:33 +02:00
b0735dc0c1 [paligemma] fix processor with suffix (#38365)
fix pg processor
2025-05-27 11:31:56 +02:00
9e1017b479 [transformers x vLLM] standardize processors (#37915)
* standardize

* fix tests

* batch update some processors, not final yet

* oke, now I tested that everything indeed runs. Still needs prettification

* emu3

* fixup

* gemma3 but it doesn't generate anything

* fuyu

* update

* why?

* Update src/transformers/models/aya_vision/processing_aya_vision.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address comments

* bc

* why do we need to guard import this every time?

* i hate guarded imports

* i am blind

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-27 11:30:30 +02:00
b5ececb900 Fix image token mask in Gemma3 (#38295)
fix mask
2025-05-27 11:15:52 +02:00
c4e71e8fff Add AMD MI300 CI caller leveraging self-hosted runner scale set workflow in hf-workflows (#38132) 2025-05-26 23:13:02 +02:00
706b00928f Stop autoconverting custom code checkpoints (#37751)
* Stop autoconverting custom code checkpoints

* make fixup

* Better auto class detection

* Match the kwarg ordering
2025-05-26 19:15:28 +01:00
07848a8405 update gemma tests (#38384)
* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 19:54:04 +02:00
cd0f3ce73b [cli] cli usable without torch (#38386)
cli without torch
2025-05-26 16:54:18 +00:00
ba6d72226d 🚨 🚨 Fix custom code saving (#37716)
* Firstly: Better detection of when we're a custom class

* Trigger tests

* Let's break everything

* make fixup

* fix mistaken line doubling

* Let's try to get rid of it from config classes at least

* Let's try to get rid of it from config classes at least

* Fixup image processor

* no more circular import

* Let's go back to setting `_auto_class` again

* Let's go back to setting `_auto_class` again

* stash commit

* Revert the irrelevant changes until we figure out AutoConfig

* Change tests since we're breaking expectations

* make fixup

* do the same for all custom classes

* Cleanup for feature extractor tests

* Cleanup tokenization tests too

* typo

* Fix tokenizer tests

* make fixup

* fix image processor test

* make fixup

* Remove warning from register_for_auto_class

* Stop adding model info to auto map entirely

* Remove todo

* Remove the other todo

* Let's start slapping _auto_class on models why not

* Let's start slapping _auto_class on models why not

* Make sure the tests know what's up

* Make sure the tests know what's up

* Completely remove add_model_info_to_*

* Start adding _auto_class to models

* Start adding _auto_class to models

* Add a flaky decorator

* Add a flaky decorator and import

* stash commit

* More message cleanup

* make fixup

* fix indent

* Fix trust_remote_code prompts

* make fixup

* correct indentation

* Reincorporate changes into dynamic_module_utils

* Update call to trust_remote_code

* make fixup

* Fix video processors too

* Fix video processors too

* Remove is_flaky additions

* make fixup
2025-05-26 17:37:30 +01:00
701caef704 Stop TF weight rename reDOS (#38325)
* let's try a non-regex solution

* make fixup

* Slight adjustment

* Let's just use the original code with a check

* slight tweak to conditional

* slight tweak to conditional
2025-05-26 16:58:51 +01:00
0a4e8e2855 fix typo: tokenizer -> tokenize (#38357) 2025-05-26 15:29:16 +00:00
63964b7c67 fix typos (#38336)
* Update video_processor.md

* Update deepseek_v3.md
2025-05-26 14:42:37 +00:00
8b03c8eaf2 Better check in initialize_weights (#38382)
* Update modeling_utils.py

* CIs

* CIs
2025-05-26 16:20:23 +02:00
eb74cf977b Use one utils/notification_service.py (#38379)
* step 1

* step 2

* step 3

* step 4

* step 5

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 16:15:29 +02:00
98328fd9a1 for now disable compile (#38383) 2025-05-26 15:57:11 +02:00
78079abeff Improved cache docs (#38060)
* improved cache docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-26 13:53:41 +00:00
7a9b071bfd [Falcon H1] Fix slow path forward pass (#38320)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

* fix typo

* make style

* fix slow path generations

* clean debug traces

* debug

* remove debug traces final confirmation

* clean debug traces final

* fix format and lineup

* make style

* debug

* Update src/transformers/models/falcon_h1/modular_falcon_h1.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* adress comments

* fix fix-copies

* fix integration test

* Merge pull request #7 from ydshieh/fix-slow-path

update

* another update (#8)

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: younesbelkada <younes.belkada@tii.ae>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 15:30:35 +02:00
b5b76b5561 Protect get_default_device for torch<2.3 (#38376)
* Update modeling_utils.py

* CIs
2025-05-26 15:00:09 +02:00
bff32678cc Fix incorrect batching audio index calculation for Phi-4-Multimodal (#38103)
* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* add tests

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* Update src/transformers/models/phi4_multimodal/feature_extraction_phi4_multimodal.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-26 12:41:31 +00:00
9f0402bc4d Fix all import errors based on older torch versions (#38370)
* Update masking_utils.py

* fix

* fix

* fix

* Update masking_utils.py

* Update executorch.py

* fix
2025-05-26 12:11:54 +02:00
d03a3ca692 [OPT] Fix attention scaling (#38290)
* fix opt attention scaling

* add comment to why we do this
2025-05-26 11:02:16 +02:00
a5a0c7b888 switch to device agnostic device calling for test cases (#38247)
* use device agnostic APIs in test cases

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* add one more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xpu now supports integer device id, aligning to CUDA behaviors

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update to use device_properties

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* update comment

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix comments

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-26 10:18:53 +02:00
cba279f46c [VLMs] add helpers for get/set embedding (#38144)
* add helpers in VLMs

* fix tied weight key test
2025-05-26 09:50:32 +02:00
6e3063422c Uninstall kernels for AMD docker images (#38354)
Uninstall kernels for AMD docker images

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-25 19:42:25 +02:00
4a03044ddb Hot fix for AMD CI workflow (#38349)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-25 11:15:31 +02:00
d0c9c66d1c new failure CI reports for all jobs (#38298)
* new failures

* report_repo_id

* report_repo_id

* report_repo_id

* More fixes

* More fixes

* More fixes

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-24 19:15:02 +02:00
31f8a0fe8a [docs]: update roformer.md model card (#37946)
* Update roformer model card

* fix example purpose description

* fix model description according to the comments

* revert changes for autodoc

* remove unneeded tags

* fix review issues

* fix hfoption

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 16:27:56 -07:00
36f97ae15b docs(swinv2): Update SwinV2 model card to new standard format (#37942)
* docs(swinv2): Update SwinV2 model card to new standard format

* docs(swinv2): Apply review suggestions

Incorporates feedback from @stevhliu to:
- Enhance the introductory paragraph with more details about scaling and SimMIM.
- Generalize the tip from "image classification tasks" to "vision tasks".

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:04:13 -07:00
33d23c39ed Update BioGPT model card (#38214)
* Update BioGPT model card

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/biogpt.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* correction for CPU fallback

* added quantization code and method

* fixed transformers-cli call

---------

Co-authored-by: Aguedo <aguedo@fakeemail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-23 13:03:47 -07:00
dffb118013 Remove duplicate docstring: resample (#38305)
Duplicate of the line above.
2025-05-23 13:02:58 -07:00
e0aad278fe Never fallback to eager implicitly (#38327)
* remove arg everywhere

* Update warnings

* add more models

* Update sdpa_attention.py

* fix style

* fix

* readd warnings but not for flex

* Update test_modeling_common.py

* skip

* fix

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-23 19:48:01 +02:00
e64ed0304c Use Gradient Checkpointing Layer in Jamba & Blip Related Models (#38310)
* Use gradient checkpointing class in blip classes

* Use gradient checkpointing class in jamba/bamba
2025-05-23 19:35:25 +02:00
53fb245eb6 🚨 🚨 Inherited CausalLM Tests (#37590)
* stash commit

* Experiment 1: Try just Gemma

* Experiment 1: Just try Gemma

* make fixup

* Trigger tests

* stash commit

* Try adding Gemma3 as well

* make fixup

* Correct attrib names

* Correct pipeline model mapping

* Add in all_model_classes for Gemma1 again

* Move the pipeline model mapping around again

* make fixup

* Revert Gemma3 changes since it's a VLM

* Let's try Falcon

* Correct attributes

* Correct attributes

* Let's try just overriding get_config() for now

* Do Nemotron too

* And Llama!

* Do llama/persimmon

* Correctly skip tests

* Fix Persimmon

* Include Phimoe

* Fix Gemma2

* Set model_tester_class correctly

* Add GLM

* More models!

* models models models

* make fixup

* Add Qwen3 + Qwen3MoE

* Correct import

* make fixup

* Add the QuestionAnswering classes

* Add the QuestionAnswering classes

* Move pipeline mapping to the right place

* Jetmoe too

* Stop RoPE testing models with no RoPE

* Fix up JetMOE a bit

* Fix up JetMOE a bit

* Can we just force pad_token_id all the time?

* make fixup

* fix starcoder2

* Move pipeline mapping

* Fix RoPE skipping

* Fix RecurrentGemma tests

* Fix Falcon tests

* Add MoE attributes

* Fix values for RoPE testing

* Make sure we set bos_token_id and eos_token_id in an appropriate range

* make fixup

* Fix GLM4

* Add mamba attributes

* Revert bits of JetMOE

* Re-add the JetMOE skips

* Update tests/causal_lm_tester.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add licence

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-05-23 18:29:31 +01:00
d5f992f5e6 Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag (#36835)
* Get parallel loader working. Include tests.

* Update the tests for parallel loading

* Rename env variables.

* Add docs for parallel model weight loading.

* Touch up parallel model loading docs.

* Touch up parallel model loading docs again.

* Edit comment in test_modeling_utils_parallel_loading.py

* Make sure HF_PARALLEL_LOADING_WORKERS is spelled correctly in modeling_utils.py

* Correct times for parallelized loading, previous times were for a "hot" filesystem

* Update parallel model loading so the spawn method is encapsulated. DRY up the code by leveraging get_submodule.

* Update docs on model loading parallelism so that details on setting the multiprocessing start method are removed, now that the package handles this step internally.

* Fix style on model loading parallelism changes.

* Merge latest version of master's modeling_utils.

* Removed unused variable.

* Fix argument packing for the parallel loader.

* Fix state dict being undefined in the parallel model loader.

* Rename variables used in parallel model loading for clarity. Use get_module_from_name().

* Switch to the use of threads for parallel model loading.

* Update docs for parallel loading.

* Remove the use of json.loads when evaluating HF_ENABLE_PARALLEL_LOADING. Prefer simple casting.

* Move parallelized shard loading into its own function.

* Remove use of is_true(). Favor checking env var true values for HF_ENABLE_PARALLEL_LOADING.

* Update copyright to 2025 in readme for paralell model loading.

* Remove garbage collection line in load_shard_file, implicit garbage collection already occurs.

* Run formatter on modeling_utils.py

* Apply style fixes

* Delete tests/utils/test_modeling_utils_parallel_loading.py

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-05-23 16:39:47 +00:00
1ed19360b1 [FlexAttention] Reenable flex for encoder-decoder and make the test more robust (#38321)
* reenable most flex attention test cases

* style

* trigger

* trigger
2025-05-23 18:16:43 +02:00
bb567d85a4 refactor can_save_slow_tokenizer (#37722)
* refactor to rm property can_save_slow_tokenizer, it can be done within the if of save_vocab

* move property to fast

* revert if

* check if vocab_file is attr

* fix check for sp

* fix if condition

* fix if condition

* fix if condition
2025-05-23 17:29:38 +02:00
3c289e2104 [performance_optim] reduce frequency of declaring attention_mask in Ascend NPU flash attention (#38278)
[performance_optim] reduce frequency of declaring attention_mask in ASCEND NPU flash attention
2025-05-23 17:24:51 +02:00
f5d45d89c4 🚨Early-error🚨 config will error out if output_attentions=True and the attn implementation is wrong (#38288)
* Protect ParallelInterface

* early error out on output attention setting for no wraning in modeling

* modular update

* fixup

* update model tests

* update

* oups

* set model's config

* more cases

* ??

* properly fix

* fixup

* update

* last onces

* update

* fix?

* fix wrong merge commit

* fix hub test

* nits

* wow I am tired

* updates

* fix pipeline!

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-05-23 17:17:38 +02:00
896833c183 Fix some tests (especially compile with fullgraph=True on Python<3.11) (#38319)
* fix tests

* better fix for python<3.11

* fixes

* style
2025-05-23 17:11:40 +02:00
a63bc17416 add vasqu to self-comment-ci.yml (#38324)
add vasqu

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-23 17:09:44 +02:00
54cd86708d [custom_generate] don't forward custom_generate and trust_remote_code (#38304)
* prevent infinite loops

* docs

* more links to custom generation methods
2025-05-23 14:49:39 +00:00
135163e9c5 Expose AutoModelForTimeSeriesPrediction for import (#38307)
* expose AutoModelForTimeSeriesPrediction for import

* add in docs
2025-05-23 13:09:29 +00:00
a6b51e7341 [Whisper + beam search] fix usage of beam_indices (#38259)
* tmp

* fix test_tiny_token_timestamp_batch_generation

* better comments

* test

* comments

* Apply suggestions from code review

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-05-23 10:05:44 +00:00
3e960e032d [tf/flax] handle forced_decoder_ids deletion (#38316)
fix tf/flax, attr checks
2025-05-23 09:44:58 +00:00
9eb0a37c9e Adds use_repr to model_addition_debugger_context (#37984)
* Adds use_repr to model_addition_debugger_context

* Updating docs for use_repr option
2025-05-23 09:35:13 +00:00
38f9c5b15b Fix typo: change 'env' to 'environment' in .circleci/config.yml (#38273)
* Fix typo: change 'env' to 'environment' in .circleci/config.yml

* Remove CIRCLE_TOKEN environment variable from artifact retrieval step

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-23 10:45:27 +02:00
11b670a282 Fix run_slow (#38314)
Signed-off-by: cyy <cyyever@outlook.com>
2025-05-23 10:18:30 +02:00
b01984a51d [emu3] fix conversion script (#38297)
* fix conversion script and update weights

* fixup

* remove commented line
2025-05-23 09:49:56 +02:00
2b585419b4 [Tests] Cleanup Janus Testcase (#38311)
* Cleanup janus testcase

* shift code to setup
2025-05-23 09:29:16 +02:00
b59386dc0a Oups typo for HybridChunkedCache (#38303)
typo
2025-05-22 17:52:37 +02:00
211f2b0875 Add CB (#38085)
* stash for now

* initial commit

* small updated

* up

* up

* works!

* nits and fixes

* don't loop too much

* finish working example

* update

* fix the small freeblocks issue

* feat: stream inputs to continuous batch

* fix: update attn from `eager` to `sdpa`

* refactor: fmt

* refactor: cleanup unnecessary code

* feat: add `update` fn to `PagedAttentionCache`

* feat: broken optimal block size computation

* fix: debugging invalid cache logic

* fix: attention mask

* refactor: use custom prompts for example

* feat: add streaming output

* fix: prefill split

refactor: add doc strings and unsound/redundant logic
fix: compute optimal blocks logic

* fix: send decoded tokens when `prefilling_split` -> `decoding`

* refactor: move logic to appropriate parent class

* fix: remove truncation as we split prefilling anyways

refactor: early return when we have enough selected requests

* feat: add paged attention forward

* push Ggraoh>

* add paged sdpa

* update

* btter mps defaults

* feat: add progress bar for `generate_batch`

* feat: add opentelemetry metrics (ttft + batch fill %age)

* feat: add tracing

* Add cuda graphs (#38059)

* draft cudagraphs addition

* nits

* styling

* update

* fix

* kinda draft of what it should look like

* fixes

* lol

* not sure why inf everywhere

* can generate but output is shit

* some fixes

* we should have a single device synch

* broken outputs but it does run

* refactor

* updates

* updates with some fixes

* fix mask causality

* another commit that casts after

* add error

* simplify example

* update

* updates

* revert llama changes

* fix merge conflicts

* fix: tracing and metrics

* my updates

* update script default values

* fix block allocation issue

* fix prefill split attnetion mask

* no bugs

* add paged eager

* fix

* update

* style

* feat: add pytorch traces

* fix

* fix

* refactor: remove pytorch profiler data

* style

* nits

* cleanup

* draft test file

* fix

* fix

* fix paged and graphs

* small renamings

* cleanups and push

* refactor: move tracing and metrics logic to utils

* refactor: trace more blocks of code

* nits

* nits

* update

* to profile or not to profile

* refactor: create new output object

* causal by default

* cleanup but generations are still off for IDK what reason

* simplifications but not running still

* this does work.

* small quality of life updates

* nits

* updaet

* fix the scheduler

* fix warning

* ol

* fully fixed

* nits

* different generation parameters

* nice

* just style

* feat: add cache memory usage

* feat: add kv cache free memory

* feat: add active/waiting count & req latency

* do the sampling

* fix: synchronize CUDA only if available and improve error handling in ContinuousBatchingManager

* fix on mps

* feat: add dashboard & histogram buckets

* perf: improve waiting reqs data structures

* attempt to compile, but we should only do it on mps AFAIK

* feat: decouple scheduling logic

* just a draft

* c;eanup and fixup

* optional

* style

* update

* update

* remove the draft documentation

* fix import as well

* update

* fix the test

* style doomed

---------

Co-authored-by: Luc Georges <luc.sydney.georges@gmail.com>
2025-05-22 17:43:48 +02:00
73286d8e29 Fix HybridChunedCache & Llama4 (#38299)
* Update cache_utils.py

* Update cache_utils.py
2025-05-22 17:25:51 +02:00
d95c864a25 🔴🔴🔴 [Attention] Refactor Attention Interface for Bart-based Models (#38108)
* starting attn refactor for encoder decoder models via bart (eager + sdpa)

* flash attention works, remove unnecessary code

* flex attention support for bart!, gotta check if the renaming is not too aggressive

* some comments

* skip flex grad test for standalone as done with the other test

* revert flex attn rename (for now), sdpa simplify, and todos

* more todos

* refactor mask creation for reuse

* modular attempt at biogpt

* first batch of other models

* fix attn dropout

* fix autoformer copies

* hubert

* another batch of models

* copies/style + last round of bart models --> whisper next?

* remove unnecessary _reshape function and remove copy to whisper

* add skip for decoder-only models out of enc-dec (same as in bart)

* bring back licences

* remove comment, added to pr read instead

* mostly docs

* disable sew flex attn as it's unclear attn mask for now

* oops

* test fixes for enc-dec

* torch fx fixes + try at flex attn

* skip on mbart

* some more fixes

* musicgen skip / delete old attn class logic + sdpa compose compile skip

* disable flex attn for musicgen, not worth the effort

* more fixes and style

* flex attention test for dropout and encoder decoder that dont have main input names

* informer fixes

* the weirdest thing I've encountered yet...

* style

* remove empty tensor attempt, found core root in previous commits

* disable time series due to tests being very text centric on inputs

* add speech to text to be ignoring the other attns, also due to tests

* update docs

* remaining issues resolved ?

* update docs for current state --> nllb moe and pegasus x sdpa is questionable :D

* some models have not set the is_causal flag...

* change dtype in softmax tol old behaviour + some modular fixes

* I hate it but it is what it is

* fixes from main for bart

* forgot this one

* some model fixes

* style

* current status

* marian works now

* fixing some copies

* some copy fixes + time series x informer

* last models possibly and fixes on style/copies

* some post merge fixes

* more fixes

* make attention interface callable and move warnings there

* style lol

* add comment to "unsupported"

* remove callable interface and change interface warnings + some copies

* fix

* ternary is ugly af, make it simpler

* how did that happen

* fix flex attn test

* failing the test

* no more fallback! fixing copies next

* style + attn fixed

* fixing copies and mask creation

* wrong copy

* fixup tests and disable flex attn for now

* fixup last tests?
2025-05-22 17:12:58 +02:00
9895819514 Update CI Docker base image for AMD tests (#38261)
use newer Pytorch base image for AMD CI tests
2025-05-22 16:38:40 +02:00
dfbee79ca3 refine transformers env output (#38274)
* refine `transformers env` output

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix Yao <matrix.yao@intel.com>

---------

Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-22 15:22:18 +02:00
1234683309 More typing in src/transformers/training_args.py (#38106)
* Annotate `framework` in src/transformers/training_args.py

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Revert framework change

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-05-22 13:14:33 +02:00
03a4c024dc Fix tp error when torch distributed is already initialized (#38294)
fix tp error
2025-05-22 12:34:05 +02:00
dcaf47dde5 add liger-kernel to docker file (#38292)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-22 11:58:17 +02:00
163138a911 🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866)
* start

* start having a clean 4d mask primitive

* Update mask_utils.py

* Update mask_utils.py

* switch name

* Update masking_utils.py

* add a new AttentionMask tensor class

* fix import

* nits

* fixes

* use full and quandrants

* general sdpa mask for all caches

* style

* start some tests

* tests with sliding, chunked

* add styling

* test hybrid

* Update masking_utils.py

* small temp fixes

* Update modeling_gemma2.py

* compile compatible

* Update masking_utils.py

* improve

* start making it more general

* Update masking_utils.py

* generate

* make it work with flex style primitives!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* improve

* Update cache_utils.py

* Update masking_utils.py

* simplify - starting to look good!

* Update masking_utils.py

* name

* Update masking_utils.py

* style

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* small fix for flex

* flex compile

* FA2

* Update masking_utils.py

* Escape for TGI/vLLM!

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* General case without cache

* rename

* full test on llama4

* small fix for FA2 guard with chunk

* Update modeling_gemma2.py

* post rebase cleanup

* FA2 supports static cache!

* Update modeling_flash_attention_utils.py

* Update flex_attention.py

* Update masking_utils.py

* Update masking_utils.py

* Update utils.py

* override for export

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update executorch.py

* Update masking_utils.py

* Update masking_utils.py

* output attentions

* style

* Update masking_utils.py

* Update executorch.py

* Add doicstring

* Add license and put mask visualizer at the end

* Update test_modeling_common.py

* fix broken test

* Update test_modeling_gemma.py

* Update test_modeling_gemma2.py

* Use fullgraph=False with FA2

* Update utils.py

* change name

* Update masking_utils.py

* improve doc

* change name

* Update modeling_attn_mask_utils.py

* more explicit logic based on model's property

* pattern in config

* extend

* fixes

* make it better

* generalize to other test models

* fix

* Update masking_utils.py

* fix

* do not check mask equivalence if layer types are different

* executorch

* Update modeling_gemma2.py

* Update masking_utils.py

* use layer_idx instead

* adjust

* Update masking_utils.py

* test

* fix imports

* Update modeling_gemma2.py

* other test models

* Update modeling_llama4.py

* Update masking_utils.py

* improve

* simplify

* Update masking_utils.py

* typos

* typo

* fix

* Update masking_utils.py

* default DynamicCache

* remove default cache

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* simplify

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* export

* Update executorch.py

* Update executorch.py

* Update flex_attention.py

* Update executorch.py

* upstream to modular gemma 1 & 2

* Update modular_mistral.py

* switch names

* use dict

* put it in the Layer directly

* update copy model source for mask functions

* apply so many modular (hopefully 1 shot)

* use explicite dicts for make style happy

* protect import

* check docstring

* better default in hybrid caches

* qwens

* Update modular_qwen2.py

* simplify core logic!

* Update executorch.py

* qwen3 moe

* Update masking_utils.py

* Update masking_utils.py

* simplify a lot sdpa causal skip

* Update masking_utils.py

* post-rebase

* gemma3 finally

* style

* check it before

* gemma3

* More general with newer torch

* align gemma3

* Update utils.py

* Update utils.py

* Update masking_utils.py

* Update test_modeling_common.py

* Update flex_attention.py

* Update flex_attention.py

* Update flex_attention.py

* test

* executorch

* Update test_modeling_common.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update masking_utils.py

* Update executorch.py

* Update test_modeling_common.py

* fix copies

* device

* sdpa can be used without mask -> pass the torchscript tests in this case

* Use enum for check

* revert enum and add check instead

* remove broken test

* cohere2

* some doc & reorganize the Interface

* Update tensor_parallel.py

* Update tensor_parallel.py

* doc and dummy

* Update test_modeling_paligemma2.py

* Update modeling_falcon_h1.py

* Update masking_utils.py

* executorch patch

* style

* CIs

* use register in executorch

* final comments!

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-22 11:38:26 +02:00
f8630c778c [Whisper] handle deprecation of forced_decoder_ids (#38232)
* fix

* working saved forced_decoder_ids

* docstring

* add deprecation message

* exception message ordering

* circular import comment
2025-05-22 09:16:38 +00:00
aa02a5d902 [whisper] move processor test into processor test file 🧹 (#38266)
move processor tests
2025-05-22 10:07:11 +01:00
b26157d64c add XPU info print in print_env (#38282)
Signed-off-by: Matrix Yao <matrix.yao@intel.com>
2025-05-22 11:03:56 +02:00
b369a65480 docs(swin): Update Swin model card to standard format (#37628)
* docs(swin): Update Swin model card to standard format

* docs(swin): Refine link to Microsoft organization for Swin models

Apply suggestion from @stevhliu in PR #37628.

This change updates the link pointing to the official Microsoft Swin Transformer checkpoints on the Hugging Face Hub.

The link now directs users specifically to the Microsoft organization page, filtered for Swin models, providing a clearer and more canonical reference compared to the previous general search link.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Clarify padding description and link to backbone docs

Apply suggestion from @stevhliu in PR #37628.

This change introduces two improvements to the Swin model card:

1.  Refines the wording describing how Swin handles input padding for better clarity.
2.  Adds an internal documentation link to the general "backbones" page when discussing Swin's capability as a backbone model.

These updates enhance readability and improve navigation within the Transformers documentation.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(swin): Change Swin paper link to huggingface.co/papers as suggested

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 16:16:43 -07:00
28d3148b07 Update Model Card for Mamba (#37863)
* update model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update quantization example.

* update example.

* update

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-05-21 10:58:23 -07:00
7b7bb8df97 Protect ParallelInterface (#38262)
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-05-21 17:45:38 +02:00
5c13cc0f94 Remove Japanese sequence_classification doc and update references (#38246) 2025-05-21 08:33:41 -07:00
71009e4b68 assign the correct torchao data layout for xpu (#37781)
* assign the correct data layout for xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check torch version before using torchao xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix the log

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix zero point type

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check torch version

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-05-21 17:21:55 +02:00
d6c34cdcd0 Fix: missing else branch to handle "--load_best_model_at_end" in training_args.py (#38217)
Update training_args.py
2025-05-21 14:28:56 +00:00
ae3e4e2d97 Improve typing in TrainingArgument (#36944)
* Better error message in TrainingArgument typing checks

* Better typing

* Small fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-05-21 13:54:38 +00:00
174684a9b6 Simplify DTensor Check for modeling_utils.py (#38245)
Update modeling_utils.py
2025-05-21 13:35:44 +00:00
e4decee9c0 [whisper] small changes for faster tests (#38236) 2025-05-21 14:11:08 +01:00
ddf67d2d73 Clearer error on import failure (#38257)
Clearer error
2025-05-21 14:32:29 +02:00
9a962dd9ed Add tearDown method to Quark to solve OOM issues (#38234)
fix
2025-05-21 14:26:44 +02:00
101b3fa4ea fix multi-image case for llava-onevision (#38084)
* _get_padding_size module

* do not patchify images when processing multi image

* modify llava onevision image processor fast

* tensor to list of tensors

* backward compat

* reuse pad_to_square in llave & some clarification

* add to doc

* fix: consider no image cases (text only or video)

* add integration test

* style & repo_consistency
2025-05-21 11:50:46 +02:00
a21f11fca2 [compile] re-enable for Qwen-VL models (#38127)
* compile qwen models

* delete TODO comment

* fix embeds test

* fix assisted decoding

* add comments
2025-05-21 09:50:39 +00:00
4542086db7 [Falcon H1] Fix Typo in Integration Test (#38256)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

* fix typo

* make style

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: younesbelkada <younes.belkada@tii.ae>
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-21 11:25:26 +02:00
6829936ee0 [MODEL] Add Falcon H1 (#38249)
* Create push-important-models.yml

* feat: add falcon-h1

* fixup

* address comment

* fix

* fix copies

* fix copies

* fix

* fix

* fix

* fix

* fix copies

* fix

* fix copies

* fix test import to at least trigget the cis

* yups

* update

* fix make fix copies

* fix inits?

* fix style

* skip annoying test

* add integration test for Falcon H1

* fix copies

* fix

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: dhia.rhaiem <dhia.rhaiem@tii.ae>
2025-05-21 10:43:11 +02:00
e288ee00d8 tp plan should not be NONE (#38255)
* accept custom device_mesh

* fix device_map

* assert that num_heads % tp_size == 0

* todo.

* ReplicateParallel

* handle tied weights

* handle dtensor in save_pretrained with safe_serialization

* tp test works

* doesnt work

* fix shard_and_distribute_module's rank should be local_rank

* tp=4 is correct

* dp+tp is broken

* todo allreduce with dtensors on another dim is annoying

* workaround to sync dp grads when using dtensors

* loading a checkpoint works

* wandb and compare losses with different tp/dp

* cleaning

* cleaning

* .

* .

* logs

* CP2 DP2 no mask works after commenting attn_mask and is_causal from scaled_dot_product_attention

* DP=2 TP=2 now works even with tied embeddings

* model.parameters() and model.module.parameters() are empty..

* reformat sanity_check_tensor_sync

* set atol=1e-4 for CP to pass

* try populate _parameters from named_modules

* refactors
TP2 DP2 works
CP2 DP2 works

* is_causal=True and pack sequences, no attn mask, and preshuffle dataset

* fix packing

* CP=4 doesn't work

* fix labels and position_ids for CP

* DP CP works with transformers 🥳🥳🥳

* refactor

* add example cp

* fixup

* revert sdpa changes

* example cleared

* add CP, DP to the mesh init

* nit

* clean

* use `ALL_PARALLEL_STYLES`

* style

* FSDP works

* log on 1 rank

* .

* fix?

* FSDP1 also has .parameters() bug

* reported gradnorm when using FSDP1 is wrong, but loss is correct so it's okay

* .

* style and fixup

* move stuff around

* fix tests

* style

* let's make it a check

* add missing licences

* warning should be an info

* tp plan should not be NONE

* test all

* god damn it

* test all

---------

Co-authored-by: nouamanetazi <nouamane98@gmail.com>
2025-05-21 10:22:38 +02:00
711d78d104 Revert parallelism temporarily (#38240)
* Revert "Protect ParallelInterface"

This reverts commit cb513e35f9c096d60558bd43110837cbb66611ce.

* Revert "parallelism goes brrr (#37877)"

This reverts commit 1c2f36b480e02c9027d2523746d34e27b39e01a4.

* Empty commit
2025-05-20 22:43:04 +02:00
feec294dea CI reporting improvements (#38230)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-05-20 19:34:58 +02:00
cb513e35f9 Protect ParallelInterface 2025-05-20 18:27:50 +02:00
f4ef41c45e v4.53.0.dev0 2025-05-20 18:12:56 +02:00
4255 changed files with 363300 additions and 454657 deletions

View File

@ -43,16 +43,6 @@ jobs:
parallelism: 1
steps:
- checkout
- run: python3 utils/extract_pr_number_from_circleci.py > pr_number.txt
- run: echo $(cat pr_number.txt)
- run: if [[ "$(cat pr_number.txt)" == "" && "$CIRCLE_BRANCH" != "main" && "$CIRCLE_BRANCH" != *-release ]]; then echo "Not a PR, not the main branch and not a release branch, skip test!"; circleci-agent step halt; fi
- run: 'curl -L -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" https://api.github.com/repos/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pulls/$(cat pr_number.txt) >> github.txt'
- run: cat github.txt
- run: (python3 -c 'import json; from datetime import datetime; fp = open("github.txt"); data = json.load(fp); fp.close(); f = "%Y-%m-%dT%H:%M:%SZ"; created = datetime.strptime(data["created_at"], f); updated = datetime.strptime(data["updated_at"], f); s = (updated - created).total_seconds(); print(int(s))' || true) > elapsed.txt
- run: if [ "$(cat elapsed.txt)" == "" ]; then echo 60 > elapsed.txt; fi
- run: cat elapsed.txt
- run: if [ "$(cat elapsed.txt)" -lt "30" ]; then echo "PR is just opened, wait some actions from GitHub"; sleep 30; fi
- run: 'if grep -q "\"draft\": true," github.txt; then echo "draft mode, skip test!"; circleci-agent step halt; fi'
- run: uv pip install -U -e .
- run: echo 'export "GIT_COMMIT_MESSAGE=$(git show -s --format=%s)"' >> "$BASH_ENV" && source "$BASH_ENV"
- run: mkdir -p test_preparation
@ -122,8 +112,6 @@ jobs:
- run:
name: "Retrieve Artifact Paths"
env:
CIRCLE_TOKEN: ${{ secrets.CI_ARTIFACT_TOKEN }}
command: |
project_slug="gh/${CIRCLE_PROJECT_USERNAME}/${CIRCLE_PROJECT_REPONAME}"
job_number=${CIRCLE_BUILD_NUM}
@ -196,6 +184,7 @@ jobs:
- run: python utils/check_dummies.py
- run: python utils/check_repo.py
- run: python utils/check_inits.py
- run: python utils/check_pipeline_typing.py
- run: python utils/check_config_docstrings.py
- run: python utils/check_config_attributes.py
- run: python utils/check_doctest_list.py

View File

@ -16,10 +16,9 @@
import argparse
import copy
import os
import random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
import glob
from typing import Any, Optional
import yaml
@ -82,15 +81,15 @@ class EmptyJob:
@dataclass
class CircleCIJob:
name: str
additional_env: Dict[str, Any] = None
docker_image: List[Dict[str, str]] = None
install_steps: List[str] = None
additional_env: dict[str, Any] = None
docker_image: list[dict[str, str]] = None
install_steps: list[str] = None
marker: Optional[str] = None
parallelism: Optional[int] = 0
pytest_num_workers: int = 8
pytest_options: Dict[str, Any] = None
pytest_options: dict[str, Any] = None
resource_class: Optional[str] = "xlarge"
tests_to_run: Optional[List[str]] = None
tests_to_run: Optional[list[str]] = None
num_test_files_per_worker: Optional[int] = 10
# This should be only used for doctest job!
command_timeout: Optional[int] = None
@ -109,8 +108,9 @@ class CircleCIJob:
self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev"
print(f"Using {self.docker_image} docker image")
if self.install_steps is None:
self.install_steps = ["uv venv && uv pip install ."]
self.install_steps.append("uv venv && uv pip install git+https://github.com/ydshieh/pytest.git@8.3.5-ydshieh git+https://github.com/ydshieh/pluggy.git@1.5.0-ydshieh")
self.install_steps = ["uv pip install ."]
# Use a custom patched pytest to force exit the process at the end, to avoid `Too long with no output (exceeded 10m0s): context deadline exceeded`
self.install_steps.append("uv pip install git+https://github.com/ydshieh/pytest.git@8.4.1-ydshieh")
if self.pytest_options is None:
self.pytest_options = {}
if isinstance(self.tests_to_run, str):
@ -129,6 +129,12 @@ class CircleCIJob:
def to_dict(self):
env = COMMON_ENV_VARIABLES.copy()
if self.job_name != "tests_hub":
# fmt: off
# not critical
env.update({"HF_TOKEN": "".join(["h", "f", "_", "H", "o", "d", "V", "u", "M", "q", "b", "R", "m", "t", "b", "z", "F", "Q", "O", "Q", "A", "J", "G", "D", "l", "V", "Q", "r", "R", "N", "w", "D", "M", "V", "C", "s", "d"])})
# fmt: on
# Do not run tests decorated by @is_flaky on pull requests
env['RUN_FLAKY'] = os.environ.get("CIRCLE_PULL_REQUEST", "") == ""
env.update(self.additional_env)
@ -148,7 +154,7 @@ class CircleCIJob:
# Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues
timeout_cmd = f"timeout {self.command_timeout} " if self.command_timeout else ""
marker_cmd = f"-m '{self.marker}'" if self.marker is not None else ""
junit_flags = f" -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
junit_flags = " -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
joined_flaky_patterns = "|".join(FLAKY_TEST_FAILURE_PATTERNS)
repeat_on_failure_flags = f"--reruns 5 --reruns-delay 2 --only-rerun '({joined_flaky_patterns})'"
parallel = f' << pipeline.parameters.{self.job_name}_parallelism >> '
@ -176,14 +182,32 @@ class CircleCIJob:
"command": f"TESTS=$(circleci tests split --split-by=timings {self.job_name}_test_list.txt) && echo $TESTS > splitted_tests.txt && echo $TESTS | tr ' ' '\n'" if self.parallelism else f"awk '{{printf \"%s \", $0}}' {self.job_name}_test_list.txt > splitted_tests.txt"
}
},
{"run": {"name": "fetch hub objects before pytest", "command": "python3 utils/fetch_hub_objects_for_ci.py"}},
# During the CircleCI docker images build time, we might already (or not) download the data.
# If it's done already, the files are inside the directory `/test_data/`.
{"run": {"name": "fetch hub objects before pytest", "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py"}},
{"run": {
"name": "Run tests",
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"run":
{
"name": "Check for test crashes",
"when": "always",
"command": """if [ ! -f tests_output.txt ]; then
echo "ERROR: tests_output.txt does not exist - tests may not have run properly"
exit 1
elif grep -q "crashed and worker restarting disabled" tests_output.txt; then
echo "ERROR: Worker crash detected in test output"
echo "Found: crashed and worker restarting disabled"
exit 1
else
echo "Tests output file exists and no worker crashes detected"
fi"""
},
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"store_test_results": {"path": "test-results"}},
{"store_artifacts": {"path": "test-results/junit.xml"}},
{"store_artifacts": {"path": "reports"}},
@ -214,7 +238,7 @@ generate_job = CircleCIJob(
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv venv && uv pip install . && uv pip install networkx==3.2.1"],
install_steps=["uv pip install ."],
marker="generate",
parallelism=6,
)
@ -231,22 +255,6 @@ processor_job = CircleCIJob(
parallelism=8,
)
tf_job = CircleCIJob(
"tf",
docker_image=[{"image":"huggingface/transformers-tf-light"}],
parallelism=6,
)
flax_job = CircleCIJob(
"flax",
docker_image=[{"image":"huggingface/transformers-jax-light"}],
parallelism=6,
pytest_num_workers=16,
resource_class="2xlarge",
)
pipelines_torch_job = CircleCIJob(
"pipelines_torch",
additional_env={"RUN_PIPELINE_TESTS": True},
@ -255,47 +263,27 @@ pipelines_torch_job = CircleCIJob(
parallelism=4,
)
pipelines_tf_job = CircleCIJob(
"pipelines_tf",
additional_env={"RUN_PIPELINE_TESTS": True},
docker_image=[{"image":"huggingface/transformers-tf-light"}],
marker="is_pipeline_test",
parallelism=4,
)
custom_tokenizers_job = CircleCIJob(
"custom_tokenizers",
additional_env={"RUN_CUSTOM_TOKENIZERS": True},
docker_image=[{"image": "huggingface/transformers-custom-tokenizers"}],
)
examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
docker_image=[{"image":"huggingface/transformers-examples-torch"}],
# TODO @ArthurZucker remove this once docker is easier to build
install_steps=["uv venv && uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
install_steps=["uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
pytest_num_workers=4,
)
examples_tensorflow_job = CircleCIJob(
"examples_tensorflow",
additional_env={"OMP_NUM_THREADS": 8},
docker_image=[{"image":"huggingface/transformers-examples-tf"}],
pytest_num_workers=2,
)
hub_job = CircleCIJob(
"hub",
additional_env={"HUGGINGFACE_CO_STAGING": True},
docker_image=[{"image":"huggingface/transformers-torch-light"}],
install_steps=[
'uv venv && uv pip install .',
'uv pip install .',
'git config --global user.email "ci@dummy.com"',
'git config --global user.name "ci"',
],
@ -304,20 +292,6 @@ hub_job = CircleCIJob(
resource_class="medium",
)
onnx_job = CircleCIJob(
"onnx",
docker_image=[{"image":"huggingface/transformers-torch-tf-light"}],
install_steps=[
"uv venv",
"uv pip install .[torch,tf,testing,sentencepiece,onnxruntime,vision,rjieba]",
],
pytest_options={"k onnx": None},
pytest_num_workers=1,
resource_class="small",
)
exotic_models_job = CircleCIJob(
"exotic_models",
docker_image=[{"image":"huggingface/transformers-exotic-models"}],
@ -325,7 +299,6 @@ exotic_models_job = CircleCIJob(
pytest_options={"durations": 100},
)
repo_utils_job = CircleCIJob(
"repo_utils",
docker_image=[{"image":"huggingface/transformers-consistency"}],
@ -333,13 +306,12 @@ repo_utils_job = CircleCIJob(
resource_class="large",
)
non_model_job = CircleCIJob(
"non_model",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv venv && uv pip install . && uv pip install networkx==3.2.1"],
install_steps=["uv pip install .[serving]"],
marker="not generate",
parallelism=6,
)
@ -357,7 +329,7 @@ doc_test_job = CircleCIJob(
additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"},
install_steps=[
# Add an empty file to keep the test step running correctly even no file is selected to be tested.
"uv venv && pip install .",
"uv pip install .",
"touch dummy.py",
command,
"cat pr_documentation_tests_temp.txt",
@ -369,7 +341,7 @@ doc_test_job = CircleCIJob(
pytest_num_workers=1,
)
REGULAR_TESTS = [torch_job, flax_job, hub_job, onnx_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
REGULAR_TESTS = [torch_job, hub_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
EXAMPLES_TESTS = [examples_torch_job]
PIPELINE_TESTS = [pipelines_torch_job]
REPO_UTIL_TESTS = [repo_utils_job]

View File

@ -1,5 +1,6 @@
import re
import argparse
import re
def parse_pytest_output(file_path):
skipped_tests = {}

View File

@ -36,19 +36,23 @@ body:
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @gante and @Rocketknight1
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
@ -56,6 +60,7 @@ body:
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
Devices/Backends:
@ -69,19 +74,6 @@ body:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @Rocketknight1
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
placeholder: "@Username ..."

View File

@ -51,7 +51,7 @@ Library:
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @zach-huggingface and @SunMarc
- trainer: @zach-huggingface, @SunMarc and @qgallouedec
- chat templates: @Rocketknight1
Integrations:

39
.github/copilot-instructions.md vendored Normal file
View File

@ -0,0 +1,39 @@
# copilot-instructions.md Guide for Hugging Face Transformers
This copilot-instructions.md file provides guidance for code agents working with this codebase.
## Core Project Structure
- `/src/transformers`: This contains the core source code for the library
- `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
- `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -13,14 +13,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import github
import json
from github import Github
import os
import re
from collections import Counter
from pathlib import Path
import github
from github import Github
def pattern_to_regex(pattern):
if pattern.startswith("/"):
start_anchor = True

View File

@ -48,7 +48,7 @@ jobs:
- name: Run database init script
run: |
psql -f benchmark/init_db.sql
psql -f benchmark/utils/init_db.sql
env:
PGDATABASE: metrics
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
@ -64,7 +64,7 @@ jobs:
commit_id=$GITHUB_SHA
fi
commit_msg=$(git show -s --format=%s | cut -c1-70)
python3 benchmark/benchmarks_entrypoint.py "$BRANCH_NAME" "$commit_id" "$commit_msg"
python3 benchmark/benchmarks_entrypoint.py "huggingface/transformers" "$BRANCH_NAME" "$commit_id" "$commit_msg"
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
# Enable this to see debug logs

77
.github/workflows/benchmark_v2.yml vendored Normal file
View File

@ -0,0 +1,77 @@
name: Benchmark v2 Framework
on:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
benchmark-v2:
name: Benchmark v2
runs-on: ${{ inputs.runner }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
(github.event_name == 'schedule')
container:
image: huggingface/transformers-pytorch-gpu
options: --gpus all --privileged --ipc host --shm-size "16gb"
steps:
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install benchmark dependencies
run: |
python3 -m pip install -r benchmark_v2/requirements.txt
- name: Reinstall transformers in edit mode
run: |
python3 -m pip uninstall -y transformers
python3 -m pip install -e ".[torch]"
- name: Show installed libraries and their versions
run: |
python3 -m pip list
python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
nvidia-smi || true
- name: Run benchmark v2
working-directory: benchmark_v2
run: |
echo "Running benchmarks"
python3 run_benchmarks.py \
--commit-id '${{ inputs.commit_sha || github.sha }}' \
--run-id '${{ inputs.run_id }}' \
--push-to-hub '${{ inputs.benchmark_repo_id}}' \
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -0,0 +1,19 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -0,0 +1,19 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: amd-mi325-ci-1gpu
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -26,7 +26,7 @@ jobs:
strategy:
matrix:
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "tf-light", "exotic-models", "torch-tf-light", "jax-light", "examples-torch", "examples-tf"]
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "exotic-models", "examples-torch"]
continue-on-error: true
steps:

View File

@ -19,7 +19,7 @@ concurrency:
jobs:
latest-docker:
name: "Latest PyTorch + TensorFlow [dev]"
name: "Latest PyTorch [dev]"
runs-on:
group: aws-general-8-plus
steps:
@ -267,44 +267,6 @@ jobs:
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-tensorflow:
name: "Latest TensorFlow [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-tensorflow-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-tensorflow-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-tensorflow-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-deepspeed-amd:
name: "PyTorch + DeepSpeed (AMD) [dev]"
runs-on:

View File

@ -2,6 +2,10 @@ name: Build docker images (Nightly CI)
on:
workflow_call:
inputs:
job:
required: true
type: string
push:
branches:
- build_nightly_ci_docker_image*
@ -12,7 +16,8 @@ concurrency:
jobs:
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
name: "Nightly PyTorch"
if: inputs.job == 'latest-with-torch-nightly-docker' || inputs.job == ''
runs-on:
group: aws-general-8-plus
steps:
@ -41,6 +46,7 @@ jobs:
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
if: inputs.job == 'nightly-torch-deepspeed-docker' || inputs.job == ''
runs-on:
group: aws-g4dn-2xlarge-cache
steps:

View File

@ -16,7 +16,7 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de en es fr hi it ko pt tr zh ja te
languages: ar de en es fr hi it ja ko pt zh
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}

View File

@ -1,128 +0,0 @@
name: Process failed tests
on:
workflow_call:
inputs:
docker:
required: true
type: string
start_sha:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
run_models_gpu:
name: " "
runs-on:
group: aws-g4dn-4xlarge-cache
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_run_models_gpu
path: /transformers/ci_results_run_models_gpu
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Get target commit
working-directory: /transformers/utils
run: |
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"]); print(commit)')" >> $GITHUB_ENV
- name: Checkout to `start_sha`
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.start_sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Check failed tests
working-directory: /transformers
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_run_models_gpu/new_model_failures.json --output_file new_model_failures_with_bad_commit.json
- name: Show results
working-directory: /transformers
run: |
ls -l new_model_failures_with_bad_commit.json
cat new_model_failures_with_bad_commit.json
- name: Checkout back
working-directory: /transformers
run: |
git checkout ${{ inputs.start_sha }}
- name: Process report
shell: bash
working-directory: /transformers
env:
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
run: |
python3 utils/process_bad_commit_report.py
- name: Process report
shell: bash
working-directory: /transformers
env:
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
run: |
{
echo 'REPORT_TEXT<<EOF'
python3 utils/process_bad_commit_report.py
echo EOF
} >> "$GITHUB_ENV"
- name: Send processed report
if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: '#transformers-ci-feedback-tests'
# For posting a rich message using Block Kit
payload: |
{
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "${{ env.REPORT_TEXT }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

208
.github/workflows/check_failed_tests.yml vendored Normal file
View File

@ -0,0 +1,208 @@
name: Process failed tests
on:
workflow_call:
inputs:
docker:
required: true
type: string
start_sha:
required: true
type: string
job:
required: true
type: string
slack_report_channel:
required: true
type: string
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
check_new_failures:
name: " "
runs-on:
group: aws-g5-4xlarge-cache
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: /transformers/ci_results_${{ inputs.job }}
- name: Check file
working-directory: /transformers
run: |
if [ -f ci_results_${{ inputs.job }}/new_failures.json ]; then
echo "`ci_results_${{ inputs.job }}/new_failures.json` exists, continue ..."
echo "process=true" >> $GITHUB_ENV
else
echo "`ci_results_${{ inputs.job }}/new_failures.json` doesn't exist, abort."
echo "process=false" >> $GITHUB_ENV
fi
- uses: actions/download-artifact@v4
if: ${{ env.process == 'true' }}
with:
pattern: setup_values*
path: setup_values
merge-multiple: true
- name: Prepare some setup values
if: ${{ env.process == 'true' }}
run: |
if [ -f setup_values/prev_workflow_run_id.txt ]; then
echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Update clone
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Get target commit
working-directory: /transformers/utils
if: ${{ env.process == 'true' }}
run: |
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
- name: Checkout to `start_sha`
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.start_sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
if: ${{ env.process == 'true' }}
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: pip freeze
- name: Check failed tests
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_${{ inputs.job }}/new_failures.json --output_file new_failures_with_bad_commit.json
- name: Show results
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
ls -l new_failures_with_bad_commit.json
cat new_failures_with_bad_commit.json
- name: Checkout back
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
git checkout ${{ inputs.start_sha }}
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
run: |
python3 utils/process_bad_commit_report.py
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
run: |
{
echo 'REPORT_TEXT<<EOF'
python3 utils/process_bad_commit_report.py
echo EOF
} >> "$GITHUB_ENV"
- name: Prepare Slack report title
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
pip install slack_sdk
echo "title=$(python3 -c 'import sys; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = "${{ inputs.ci_event }}"; job = "${{ inputs.job }}"; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
- name: Send processed report
if: ${{ env.process == 'true' && !endsWith(env.REPORT_TEXT, '{}') }}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: '#${{ inputs.slack_report_channel }}'
# For posting a rich message using Block Kit
payload: |
{
"blocks": [
{
"type": "header",
"text": {
"type": "plain_text",
"text": "${{ env.title }}"
}
},
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "${{ env.REPORT_TEXT }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

43
.github/workflows/collated-reports.yml vendored Normal file
View File

@ -0,0 +1,43 @@
name: CI collated reports
on:
workflow_call:
inputs:
job:
required: true
type: string
report_repo_id:
required: true
type: string
machine_type:
required: true
type: string
gpu_name:
description: Name of the GPU used for the job. Its enough that the value contains the name of the GPU, e.g. "noise-h100-more-noise". Case insensitive.
required: true
type: string
jobs:
collated_reports:
name: Collated reports
runs-on: ubuntu-22.04
if: always()
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
- name: Collated reports
shell: bash
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_SHA: ${{ github.sha }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
run: |
pip install huggingface_hub
python3 utils/collated_reports.py \
--path . \
--machine-type ${{ inputs.machine_type }} \
--commit-hash ${{ env.CI_SHA }} \
--job ${{ inputs.job }} \
--report-repo-id ${{ inputs.report_repo_id }} \
--gpu-name ${{ inputs.gpu_name }}

View File

@ -28,10 +28,10 @@ jobs:
matrix:
split_keys: ${{ fromJson(inputs.split_keys) }}
runs-on:
group: aws-g4dn-4xlarge-cache
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers

View File

@ -15,10 +15,10 @@ jobs:
setup:
name: Setup
runs-on:
group: aws-g4dn-4xlarge-cache
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
job_splits: ${{ steps.set-matrix.outputs.job_splits }}
split_keys: ${{ steps.set-matrix.outputs.split_keys }}

157
.github/workflows/get-pr-info.yml vendored Normal file
View File

@ -0,0 +1,157 @@
name: Get PR commit SHA
on:
workflow_call:
inputs:
pr_number:
required: true
type: string
outputs:
PR_HEAD_REPO_FULL_NAME:
description: "The full name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_FULL_NAME }}
PR_BASE_REPO_FULL_NAME:
description: "The full name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_FULL_NAME }}
PR_HEAD_REPO_OWNER:
description: "The owner of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}
PR_BASE_REPO_OWNER:
description: "The owner of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_OWNER }}
PR_HEAD_REPO_NAME:
description: "The name of the repository from which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}
PR_BASE_REPO_NAME:
description: "The name of the repository to which the pull request is created"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REPO_NAME }}
PR_HEAD_REF:
description: "The branch name of the pull request in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_REF }}
PR_BASE_REF:
description: "The branch name in the base repository (to merge into)"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_REF }}
PR_HEAD_SHA:
description: "The head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_SHA }}
PR_BASE_SHA:
description: "The head sha of the target branch in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_BASE_SHA }}
PR_MERGE_COMMIT_SHA:
description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_HEAD_COMMIT_DATE:
description: "The date of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
PR_MERGE_COMMIT_DATE:
description: "The date of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_HEAD_COMMIT_TIMESTAMP:
description: "The timestamp of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_TIMESTAMP }}
PR_MERGE_COMMIT_TIMESTAMP:
description: "The timestamp of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
PR:
description: "The PR"
value: ${{ jobs.get-pr-info.outputs.PR }}
PR_FILES:
description: "The files touched in the PR"
value: ${{ jobs.get-pr-info.outputs.PR_FILES }}
jobs:
get-pr-info:
runs-on: ubuntu-22.04
name: Get PR commit SHA better
outputs:
PR_HEAD_REPO_FULL_NAME: ${{ steps.pr_info.outputs.head_repo_full_name }}
PR_BASE_REPO_FULL_NAME: ${{ steps.pr_info.outputs.base_repo_full_name }}
PR_HEAD_REPO_OWNER: ${{ steps.pr_info.outputs.head_repo_owner }}
PR_BASE_REPO_OWNER: ${{ steps.pr_info.outputs.base_repo_owner }}
PR_HEAD_REPO_NAME: ${{ steps.pr_info.outputs.head_repo_name }}
PR_BASE_REPO_NAME: ${{ steps.pr_info.outputs.base_repo_name }}
PR_HEAD_REF: ${{ steps.pr_info.outputs.head_ref }}
PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
PR_HEAD_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.head_commit_timestamp }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ steps.get_timestamps.outputs.merge_commit_timestamp }}
PR: ${{ steps.pr_info.outputs.pr }}
PR_FILES: ${{ steps.pr_info.outputs.files }}
if: ${{ inputs.pr_number != '' }}
steps:
- name: Extract PR details
id: pr_info
uses: actions/github-script@v6
with:
script: |
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
const { data: head_commit } = await github.rest.repos.getCommit({
owner: pr.head.repo.owner.login,
repo: pr.head.repo.name,
ref: pr.head.ref
});
const { data: merge_commit } = await github.rest.repos.getCommit({
owner: pr.base.repo.owner.login,
repo: pr.base.repo.name,
ref: pr.merge_commit_sha,
});
const { data: files } = await github.rest.pulls.listFiles({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
core.setOutput('head_repo_full_name', pr.head.repo.full_name);
core.setOutput('base_repo_full_name', pr.base.repo.full_name);
core.setOutput('head_repo_owner', pr.head.repo.owner.login);
core.setOutput('base_repo_owner', pr.base.repo.owner.login);
core.setOutput('head_repo_name', pr.head.repo.name);
core.setOutput('base_repo_name', pr.base.repo.name);
core.setOutput('head_ref', pr.head.ref);
core.setOutput('base_ref', pr.base.ref);
core.setOutput('head_sha', pr.head.sha);
core.setOutput('base_sha', pr.base.sha);
core.setOutput('merge_commit_sha', pr.merge_commit_sha);
core.setOutput('pr', pr);
core.setOutput('head_commit_date', head_commit.commit.committer.date);
core.setOutput('merge_commit_date', merge_commit.commit.committer.date);
core.setOutput('files', files);
console.log('PR head commit:', {
head_commit: head_commit,
commit: head_commit.commit,
date: head_commit.commit.committer.date
});
console.log('PR merge commit:', {
merge_commit: merge_commit,
commit: merge_commit.commit,
date: merge_commit.commit.committer.date
});
- name: Convert dates to timestamps
id: get_timestamps
run: |
head_commit_date=${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date=${{ steps.pr_info.outputs.merge_commit_date }}
echo $head_commit_date
echo $merge_commit_date
head_commit_timestamp=$(date -d "$head_commit_date" +%s)
merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
echo $head_commit_timestamp
echo $merge_commit_timestamp
echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT

36
.github/workflows/get-pr-number.yml vendored Normal file
View File

@ -0,0 +1,36 @@
name: Get PR number
on:
workflow_call:
outputs:
PR_NUMBER:
description: "The extracted PR number"
value: ${{ jobs.get-pr-number.outputs.PR_NUMBER }}
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request.number }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"

View File

@ -12,16 +12,22 @@ on:
slice_id:
required: true
type: number
runner:
required: true
type: string
docker:
required: true
type: string
commit_sha:
required: false
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
runner_type:
required: false
type: string
report_repo_id:
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -49,6 +55,8 @@ jobs:
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
machine_type: ${{ steps.set_machine_type.outputs.machine_type }}
steps:
- name: Echo input and matrix info
shell: bash
@ -70,7 +78,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -102,14 +110,15 @@ jobs:
run: pip freeze
- name: Set `machine_type` for report and artifact names
id: set_machine_type
working-directory: /transformers
shell: bash
run: |
echo "${{ inputs.machine_type }}"
if [ "${{ inputs.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ inputs.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ inputs.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
@ -117,26 +126,58 @@ jobs:
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
echo "machine_type=$machine_type" >> $GITHUB_OUTPUT
- name: Create report directory if it doesn't exist
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
exit ${EXIT_CODE:-1}
- name: Failure short reports
if: ${{ failure() }}
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
run: |
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
collated_reports:
name: Collated Reports
if: ${{ always() }}
needs: run_models_gpu
uses: huggingface/transformers/.github/workflows/collated-reports.yml@main
with:
job: run_models_gpu
report_repo_id: ${{ inputs.report_repo_id }}
gpu_name: ${{ inputs.runner_type }}
machine_type: ${{ needs.run_models_gpu.outputs.machine_type }}
secrets: inherit

View File

@ -1,128 +0,0 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
machine_type:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
docker:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 1 # For now, not to parallelize. Can change later if it works well.
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on: ['${{ inputs.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') }}
working-directory: /transformers
run: |
python3 -m pip install -U datasets
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }}
working-directory: /transformers
run: |
python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports

View File

@ -0,0 +1,121 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
machine_type:
required: true
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
env:
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 8
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: ${{ inputs.runner }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ inputs.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all tests on Gaudi
run: python3 -m pytest -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports

View File

@ -6,7 +6,6 @@ on:
types: [created]
permissions:
contents: write
pull-requests: write
jobs:
@ -16,4 +15,4 @@ jobs:
python_quality_dependencies: "[quality]"
style_command_type: "default"
secrets:
bot_token: ${{ secrets.GITHUB_TOKEN }}
bot_token: ${{ secrets.HF_STYLE_BOT_ACTION }}

View File

@ -0,0 +1,134 @@
name: PR - build doc via comment
on:
issue_comment:
types:
- created
branches-ignore:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'build-doc') }}
cancel-in-progress: true
permissions: {}
jobs:
get-pr-number:
name: Get PR number
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
verity_pr_commit:
name: Verity PR commit corresponds to a specific event by comparing timestamps
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
runs-on: ubuntu-22.04
needs: get-pr-info
env:
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
steps:
- run: |
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
create_run:
name: Create run
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
permissions:
statuses: write
runs-on: ubuntu-22.04
steps:
- name: Create Run
id: create_run
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Custom doc building job" -f "context=custom-doc-build"
reply_to_comment:
name: Reply to the comment
if: ${{ needs.create_run.result == 'success' }}
needs: [get-pr-number, create_run]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=[Building docs for all languages...](${{ env.GITHUB_RUN_URL }})"
build-doc:
name: Build doc
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
package: transformers
languages: ar de en es fr hi it ko pt tr zh ja te
update_run_status:
name: Update Check Run Status
needs: [ get-pr-info, create_run, build-doc ]
permissions:
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.create_run.result) }}
steps:
- name: Get `build-doc` job status
run: |
echo "${{ needs.build-doc.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
- name: Update PR commit statuses
run: |
echo "${{ needs.build-doc.result }}"
echo "${{ env.STATUS }}"
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Custom doc building job" -f "context=custom-doc-build"

177
.github/workflows/pr_run_slow_ci.yml vendored Normal file
View File

@ -0,0 +1,177 @@
name: PR slow CI
on:
pull_request_target:
types: [opened, synchronize, reopened]
jobs:
get-pr-number:
name: Get PR number
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
get-jobs:
name: Get test files to run
runs-on: ubuntu-22.04
needs: [get-pr-number, get-pr-info]
outputs:
jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
steps:
- name: Get repository content
id: repo_content
uses: actions/github-script@v6
with:
script: |
const { data: tests_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_models_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/models',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
const { data: tests_quantization_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
path: 'tests/quantization',
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
core.setOutput('tests_dir', tests_dir);
core.setOutput('tests_models_dir', tests_models_dir);
core.setOutput('tests_quantization_dir', tests_quantization_dir);
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Write tests_dir file
run: |
cat > tests_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_dir }}
EOF
- name: Write tests_models_dir file
run: |
cat > tests_models_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_models_dir }}
EOF
- name: Write tests_quantization_dir file
run: |
cat > tests_quantization_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_quantization_dir }}
EOF
- name: Run script to get jobs to run
id: get_jobs
run: |
python utils/get_pr_run_slow_jobs.py | tee output.txt
echo "jobs_to_run: $(tail -n 1 output.txt)"
echo "jobs_to_run=$(tail -n 1 output.txt)" >> $GITHUB_OUTPUT
send_comment:
# Will delete the previous comment and send a new one if:
# - either the content is changed
# - or the previous comment is 30 minutes or more old
name: Send a comment to suggest jobs to run
if: ${{ needs.get-jobs.outputs.jobs != '' }}
needs: [get-pr-number, get-jobs]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Check and update comment if needed
uses: actions/github-script@v7
env:
BODY: "\n\nrun-slow: ${{ needs.get-jobs.outputs.jobs }}"
with:
script: |
const prNumber = ${{ needs.get-pr-number.outputs.PR_NUMBER }};
const commentPrefix = "**[For maintainers]** Suggested jobs to run (before merge)";
const thirtyMinutesAgo = new Date(Date.now() - 30 * 60 * 1000); // 30 minutes ago
const newBody = `${commentPrefix}${process.env.BODY}`;
// Get all comments on the PR
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber
});
// Find existing comments that start with our prefix
const existingComments = comments.filter(comment =>
comment.user.login === 'github-actions[bot]' &&
comment.body.startsWith(commentPrefix)
);
let shouldCreateNewComment = true;
let commentsToDelete = [];
if (existingComments.length > 0) {
// Get the most recent comment
const mostRecentComment = existingComments
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
const commentDate = new Date(mostRecentComment.created_at);
const isOld = commentDate < thirtyMinutesAgo;
const isDifferentContent = mostRecentComment.body !== newBody;
console.log(`Most recent comment created: ${mostRecentComment.created_at}`);
console.log(`Is older than 30 minutes: ${isOld}`);
console.log(`Has different content: ${isDifferentContent}`);
if (isOld || isDifferentContent) {
// Delete all existing comments and create new one
commentsToDelete = existingComments;
console.log(`Will delete ${commentsToDelete.length} existing comment(s) and create new one`);
} else {
// Content is same and comment is recent, skip
shouldCreateNewComment = false;
console.log('Comment is recent and content unchanged, skipping update');
}
} else {
console.log('No existing comments found, will create new one');
}
// Delete old comments if needed
for (const comment of commentsToDelete) {
console.log(`Deleting comment #${comment.id} (created: ${comment.created_at})`);
await github.rest.issues.deleteComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: comment.id
});
}
// Create new comment if needed
if (shouldCreateNewComment) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: prNumber,
body: newBody
});
console.log('✅ New comment created');
} else {
console.log(' No comment update needed');
}

View File

@ -4,17 +4,6 @@ on:
push:
branches: [ main ]
env:
OUTPUT_SLACK_CHANNEL_ID: "C06L2SGMEEA"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:
get_modified_models:
name: "Get all modified files"
@ -25,111 +14,144 @@ jobs:
- name: Check out code
uses: actions/checkout@v4
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c
- name: Get changed files using `actions/github-script`
id: get-changed-files
uses: actions/github-script@v7
with:
files: src/transformers/models/**
script: |
let files = [];
// Only handle push events
if (context.eventName === 'push') {
const afterSha = context.payload.after;
const branchName = context.payload.ref.replace('refs/heads/', '');
let baseSha;
if (branchName === 'main') {
console.log('Push to main branch, comparing to parent commit');
// Get the parent commit of the pushed commit
const { data: commit } = await github.rest.repos.getCommit({
owner: context.repo.owner,
repo: context.repo.repo,
ref: afterSha
});
baseSha = commit.parents[0]?.sha;
if (!baseSha) {
throw new Error('No parent commit found for the pushed commit');
}
} else {
console.log(`Push to branch ${branchName}, comparing to main`);
baseSha = 'main';
}
const { data: comparison } = await github.rest.repos.compareCommits({
owner: context.repo.owner,
repo: context.repo.repo,
base: baseSha,
head: afterSha
});
// Include added, modified, and renamed files
files = comparison.files
.filter(file => file.status === 'added' || file.status === 'modified' || file.status === 'renamed')
.map(file => file.filename);
}
// Include all files under src/transformers/ (not just models subdirectory)
const filteredFiles = files.filter(file =>
file.startsWith('src/transformers/')
);
core.setOutput('changed_files', filteredFiles.join(' '));
core.setOutput('any_changed', filteredFiles.length > 0 ? 'true' : 'false');
- name: Run step if only the files listed above change
if: steps.changed-files.outputs.any_changed == 'true'
id: set-matrix
- name: Parse changed files with Python
if: steps.get-changed-files.outputs.any_changed == 'true'
env:
ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
CHANGED_FILES: ${{ steps.get-changed-files.outputs.changed_files }}
id: set-matrix
run: |
model_arrays=()
for file in $ALL_CHANGED_FILES; do
model_path="${file#*models/}"
model_path="models/${model_path%%/*}"
if grep -qFx "$model_path" utils/important_models.txt; then
# Append the file to the matrix string
model_arrays+=("$model_path")
fi
done
matrix_string=$(printf '"%s", ' "${model_arrays[@]}" | sed 's/, $//')
echo "matrix=[$matrix_string]" >> $GITHUB_OUTPUT
test_modified_files:
python3 - << 'EOF'
import os
import sys
import json
# Add the utils directory to Python path
sys.path.insert(0, 'utils')
# Import the important models list
from important_files import IMPORTANT_MODELS
print(f"Important models: {IMPORTANT_MODELS}")
# Get the changed files from the previous step
changed_files_str = os.environ.get('CHANGED_FILES', '')
changed_files = changed_files_str.split() if changed_files_str else []
# Filter to only Python files
python_files = [f for f in changed_files if f.endswith('.py')]
print(f"Python files changed: {python_files}")
result_models = set()
# Specific files that trigger all models
transformers_utils_files = [
'modeling_utils.py',
'modeling_rope_utils.py',
'modeling_flash_attention_utils.py',
'modeling_attn_mask_utils.py',
'cache_utils.py',
'masking_utils.py',
'pytorch_utils.py'
]
# Single loop through all Python files
for file in python_files:
# Check for files under src/transformers/models/
if file.startswith('src/transformers/models/'):
remaining_path = file[len('src/transformers/models/'):]
if '/' in remaining_path:
model_dir = remaining_path.split('/')[0]
if model_dir in IMPORTANT_MODELS:
result_models.add(model_dir)
print(f"Added model directory: {model_dir}")
# Check for specific files under src/transformers/ or src/transformers/generation/ files
elif file.startswith('src/transformers/generation/') or \
(file.startswith('src/transformers/') and os.path.basename(file) in transformers_utils_files):
print(f"Found core file: {file} - including all important models")
result_models.update(IMPORTANT_MODELS)
break # No need to continue once we include all models
# Convert to sorted list and create matrix
result_list = sorted(list(result_models))
print(f"Final model list: {result_list}")
if result_list:
matrix_json = json.dumps(result_list)
print(f"matrix={matrix_json}")
# Write to GITHUB_OUTPUT
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write(f"matrix={matrix_json}\n")
else:
print("matrix=[]")
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write("matrix=[]\n")
EOF
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
needs: get_modified_models
name: Slow & FA2 tests
runs-on:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }}
strategy:
fail-fast: false
matrix:
model-name: ${{ fromJson(needs.get_modified_models.outputs.matrix) }}
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Install locally transformers & other libs
run: |
apt install sudo
sudo -H pip install --upgrade pip
sudo -H pip uninstall -y transformers
sudo -H pip install -U -e ".[testing]"
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip install bitsandbytes
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Show installed libraries and their versions
run: pip freeze
- name: Run FA2 tests
id: run_fa2_tests
run:
pytest -rsfE -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: ${{ matrix.model-name }}_fa2_tests"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.model-name }}_fa2_tests
path: /transformers/reports/${{ matrix.model-name }}_fa2_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the FA2 tests - ${{ matrix.model-name }}
status: ${{ steps.run_fa2_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Run integration tests
id: run_integration_tests
if: always()
run:
pytest -rsfE -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: tests_integration_${{ matrix.model-name }}"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_integration_${{ matrix.model-name }}
path: /transformers/reports/tests_integration_${{ matrix.model-name }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the Integration tests - ${{ matrix.model-name }}
status: ${{ steps.run_integration_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Tailscale # In order to be able to SSH when a test fails
if: ${{ runner.debug == '1'}}
uses: huggingface/tailscale-action@v1
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true
if: needs.get_modified_models.outputs.matrix != '' && needs.get_modified_models.outputs.matrix != '[]'
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-push"
docker: huggingface/transformers-all-latest-gpu
ci_event: push
report_repo_id: hf-internal-testing/transformers_ci_push
commit_sha: ${{ github.sha }}
models: ${{ needs.get_modified_models.outputs.matrix }}
secrets: inherit

View File

@ -29,7 +29,7 @@ jobs:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
@ -185,7 +185,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.models) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -239,9 +239,9 @@ jobs:
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -292,7 +292,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.quantizations) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -338,9 +338,9 @@ jobs:
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}

View File

@ -1,43 +1,56 @@
name: Self-hosted runner (nightly-ci)
name: Nvidia CI with nightly torch
on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
# triggered when the daily scheduled Nvidia CI is completed.
# This way, we can compare the results more easily.
workflow_run:
workflows: ["Nvidia CI"]
branches: ["main"]
types: [completed]
push:
branches:
- run_nightly_ci*
- run_ci_with_nightly_torch*
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
build_nightly_ci_images:
name: Build Nightly CI Docker Images
if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci'))
build_nightly_torch_ci_images:
name: Build CI Docker Images with nightly torch
uses: ./.github/workflows/build-nightly-ci-docker-images.yml
with:
job: latest-with-torch-nightly-docker
secrets: inherit
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
needs: [build_nightly_ci_images]
needs: build_nightly_torch_ci_images
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
docker: huggingface/transformers-all-latest-torch-nightly-gpu
ci_event: Nightly CI
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
needs: [build_nightly_ci_images]
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
# test deepspeed nightly build with the latest release torch
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Nightly CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
secrets: inherit

View File

@ -1,25 +0,0 @@
name: Self-hosted runner (AMD mi300 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi300
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci'))))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi300
secrets: inherit

View File

@ -31,12 +31,12 @@ jobs:
name: Setup
strategy:
matrix:
machine_type: [aws-g4dn-2xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
@ -131,12 +131,12 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g4dn-2xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
@ -169,9 +169,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -244,7 +244,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -282,9 +282,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -357,12 +357,12 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-2xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
@ -395,9 +395,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -467,7 +467,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -505,9 +505,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-2xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}

View File

@ -1,55 +0,0 @@
name: Self-hosted runner (AMD mi210 scheduled CI caller)
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi210
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi210
secrets: inherit

View File

@ -15,10 +15,11 @@ jobs:
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
torch-pipeline:
@ -26,10 +27,11 @@ jobs:
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
example-ci:
@ -37,10 +39,11 @@ jobs:
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit
deepspeed-ci:
@ -48,8 +51,9 @@ jobs:
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
slack_report_channel: "#transformers-ci-daily-amd"
runner: mi250
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi250
report_repo_id: optimum-amd/transformers_daily_ci
secrets: inherit

View File

@ -0,0 +1,67 @@
name: Self-hosted runner scale set (AMD mi325 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu scale set: amd-mi325-ci-1gpu
# 2gpu scale set: amd-mi325-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit

View File

@ -0,0 +1,63 @@
name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -1,5 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI
on:
repository_dispatch:
@ -7,18 +6,55 @@ on:
- cron: "17 2 * * *"
push:
branches:
- run_scheduled_ci*
- run_nvidia_ci*
workflow_dispatch:
inputs:
prev_workflow_run_id:
description: 'previous workflow run id to compare'
type: string
required: false
default: ""
other_workflow_run_id:
description: 'other workflow run id to compare'
type: string
required: false
default: ""
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-models"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
torch-pipeline:
@ -27,20 +63,10 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-torch"
runner: daily-ci
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
secrets: inherit
tf-pipeline:
name: TF pipeline CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_pipelines_tf_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-tf"
runner: daily-ci
docker: huggingface/transformers-tensorflow-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
example-ci:
@ -49,9 +75,10 @@ jobs:
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-examples"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
trainer-fsdp-ci:
@ -60,9 +87,11 @@ jobs:
with:
job: run_trainer_and_fsdp_gpu
slack_report_channel: "#transformers-ci-daily-training"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
runner_type: "a10"
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
deepspeed-ci:
@ -71,10 +100,11 @@ jobs:
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-training"
runner: daily-ci
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Daily CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
quantization-ci:
@ -83,7 +113,8 @@ jobs:
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-daily-quantization"
runner: daily-ci
docker: huggingface/transformers-quantization-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit

View File

@ -0,0 +1,342 @@
name: Self-hosted runner (scheduled-intel-gaudi)
on:
workflow_call:
inputs:
job:
required: true
type: string
slack_report_channel:
required: true
type: string
runner_scale_set:
required: true
type: string
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
env:
NUM_SLICES: 2
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Setup
runs-on: ubuntu-latest
outputs:
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
quantization_matrix: ${{ steps.set-matrix.outputs.quantization_matrix }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- id: set-matrix
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: tests
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
- id: set-matrix-quantization
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
name: Identify quantization method to test
working-directory: tests
run: |
echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT
run_models_gpu:
if: ${{ inputs.job == 'run_models_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
secrets: inherit
run_trainer_and_fsdp_gpu:
if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
run_pipelines_torch_gpu:
if: ${{ inputs.job == 'run_pipelines_torch_gpu' }}
name: Pipelines
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all pipeline tests on Intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
path: reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run examples tests on Intel Gaudi
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_examples_gpu_test_reports examples/pytorch -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_examples_gpu_test_reports
path: reports/${{ env.machine_type }}_run_examples_gpu_test_reports
run_torch_cuda_extensions_gpu:
if: ${{ inputs.job == 'run_torch_cuda_extensions_gpu' }}
name: Intel Gaudi deepspeed tests
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.20.0
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all deepspeed tests on intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Slack Report
needs:
[
setup,
run_models_gpu,
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_pipelines_torch_gpu,
run_trainer_and_fsdp_gpu,
]
if: ${{ always() }}
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
setup_status: ${{ needs.setup.result }}
slack_report_channel: ${{ inputs.slack_report_channel }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
report_repo_id: ${{ inputs.report_repo_id }}
ci_event: ${{ inputs.ci_event }}
secrets: inherit

View File

@ -0,0 +1,67 @@
name: Self-hosted runner (Intel Gaudi3 scheduled CI caller)
on:
repository_dispatch:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"
jobs:
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_models_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
pipeline-ci:
name: Pipeline CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_pipelines_torch_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_examples_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_torch_cuda_extensions_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
trainer-fsdp-ci:
name: Trainer/FSDP CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_trainer_and_fsdp_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit

View File

@ -1,4 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI (job definitions)
# Note that each job's dependencies go into a corresponding docker file.
#
@ -15,9 +15,6 @@ on:
slack_report_channel:
required: true
type: string
runner:
required: true
type: string
docker:
required: true
type: string
@ -28,6 +25,19 @@ on:
default: ''
required: false
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
runner_type:
required: false
type: string
models:
default: ""
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -45,16 +55,16 @@ env:
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
name: Setup
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
strategy:
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
@ -63,7 +73,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Cleanup
working-directory: /transformers
@ -82,7 +92,7 @@ jobs:
working-directory: /transformers/tests
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
@ -107,15 +117,17 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
secrets: inherit
run_trainer_and_fsdp_gpu:
@ -125,15 +137,17 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: [0, 1]
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
@ -143,7 +157,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -152,7 +166,7 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -177,9 +191,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -205,91 +219,22 @@ jobs:
name: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports
run_pipelines_tf_gpu:
if: ${{ inputs.job == 'run_pipelines_tf_gpu' }}
name: TensorFlow pipelines
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-tensorflow-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all pipeline tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports tests/pipelines
- name: Failure short reports
if: ${{ always() }}
run: |
cat /transformers/reports/${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_pipelines_tf_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -314,9 +259,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -349,7 +294,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -358,7 +303,7 @@ jobs:
steps:
- name: Update clone
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: ${{ inputs.working-directory-prefix }}/transformers
@ -411,9 +356,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -448,7 +393,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }}
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -466,7 +411,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -491,9 +436,9 @@ jobs:
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g4dn-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g4dn-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
@ -530,6 +475,7 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install transformers
run: pip install transformers
@ -567,13 +513,12 @@ jobs:
run_models_gpu,
run_trainer_and_fsdp_gpu,
run_pipelines_torch_gpu,
run_pipelines_tf_gpu,
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_quantization_torch_gpu,
run_extract_warnings
]
if: ${{ always() }}
if: always() && !cancelled()
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
@ -584,15 +529,22 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
secrets: inherit
check_new_model_failures:
if: ${{ always() && inputs.ci_event == 'Daily CI' && inputs.job == 'run_models_gpu' && needs.send_results.result == 'success' }}
name: Check new model failures
check_new_failures:
if: ${{ always() && inputs.ci_event == 'Daily CI' && needs.send_results.result == 'success' }}
name: Check new failures
needs: send_results
uses: ./.github/workflows/check_failed_model_tests.yml
uses: ./.github/workflows/check_failed_tests.yml
with:
docker: ${{ inputs.docker }}
start_sha: ${{ github.sha }}
start_sha: ${{ inputs.commit_sha || github.sha }}
job: ${{ inputs.job }}
slack_report_channel: ${{ inputs.slack_report_channel }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
secrets: inherit

View File

@ -21,6 +21,13 @@ on:
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
@ -29,7 +36,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
if: always() && !cancelled()
steps:
- name: Preliminary job status
shell: bash
@ -38,9 +45,28 @@ jobs:
echo "Setup status: ${{ inputs.setup_status }}"
- uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- uses: actions/download-artifact@v4
- name: Prepare some setup values
run: |
if [ -f setup_values/prev_workflow_run_id.txt ]; then
echo "PREV_WORKFLOW_RUN_ID=$(cat setup_values/prev_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Send message to Slack
if: ${{ inputs.job != 'run_quantization_torch_gpu' }}
shell: bash
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
@ -49,20 +75,25 @@ jobs:
SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: ${{ inputs.ci_event }}
CI_SHA: ${{ github.sha }}
CI_WORKFLOW_REF: ${{ github.workflow_ref }}
# This `CI_TITLE` would be empty for `schedule` or `workflow_run` events.
CI_TITLE: ${{ github.event.head_commit.message }}
CI_SHA: ${{ inputs.commit_sha || github.sha }}
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
# For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an
# empty string, and the called script still get one argument (which is the emtpy string).
run: |
sudo apt-get install -y curl
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ inputs.folder_slices }}"
if [ "${{ inputs.quantization_matrix }}" != "" ]; then
python utils/notification_service.py "${{ inputs.quantization_matrix }}"
else
python utils/notification_service.py "${{ inputs.folder_slices }}"
fi
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
- name: Failure table artifacts
@ -70,32 +101,3 @@ jobs:
with:
name: ci_results_${{ inputs.job }}
path: ci_results_${{ inputs.job }}
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
- name: Send message to Slack for quantization workflow
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
CI_EVENT: ${{ inputs.ci_event }}
CI_SHA: ${{ github.sha }}
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
# We pass `needs.setup.outputs.quantization_matrix` as the argument. A processing in `notification_service_quantization.py` to change
# `quantization/bnb` to `quantization_bnb` is required, as the artifact names use `_` instead of `/`.
run: |
sudo apt-get install -y curl
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service_quantization.py "${{ inputs.quantization_matrix }}"
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.
- name: Failure table artifacts
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
uses: actions/upload-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: ci_results_${{ inputs.job }}

4
.gitignore vendored
View File

@ -13,6 +13,7 @@ tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/
reports/
# Distribution / packaging
.Python
@ -167,3 +168,6 @@ tags
# ruff
.ruff_cache
# modular conversion
*.modular_backup

39
AGENTS.md Normal file
View File

@ -0,0 +1,39 @@
# AGENTS.md Guide for Hugging Face Transformers
This AGENTS.md file provides guidance for code agents working with this codebase.
## Core Project Structure
- `/src/transformers`: This contains the core source code for the library
- `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
- `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -68,8 +68,7 @@ already reported** (use the search bar on GitHub under Issues). Your issue shoul
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
* Your **OS type and version** and **Python**, **PyTorch** and
**TensorFlow** versions when applicable.
* Your **OS type and version** and **Python**, and **PyTorch** versions when applicable.
* A short, self-contained, code snippet that allows us to reproduce the bug in
less than 30s.
* The *full* traceback if an exception is raised.
@ -165,8 +164,7 @@ You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main
mode with the `-e` flag.
Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a
failure with this command. If that's the case make sure to install the Deep Learning framework you are working with
(PyTorch, TensorFlow and/or Flax) then do:
failure with this command. If that's the case make sure to install Pytorch then do:
```bash
pip install -e ".[quality]"

View File

@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
1. https://github.com/huggingface/transformers/issues/9257
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:

View File

@ -3,18 +3,24 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples tests src utils
check_dirs := examples tests src utils scripts benchmark benchmark_v2
exclude_folders := ""
modified_only_fixup:
$(eval modified_py_files := $(shell python utils/get_modified_files.py $(check_dirs)))
@if test -n "$(modified_py_files)"; then \
echo "Checking/fixing $(modified_py_files)"; \
ruff check $(modified_py_files) --fix --exclude $(exclude_folders); \
ruff format $(modified_py_files) --exclude $(exclude_folders);\
@current_branch=$$(git branch --show-current); \
if [ "$$current_branch" = "main" ]; then \
echo "On main branch, running 'style' target instead..."; \
$(MAKE) style; \
else \
echo "No library .py files were modified"; \
modified_py_files=$$(python utils/get_modified_files.py $(check_dirs)); \
if [ -n "$$modified_py_files" ]; then \
echo "Checking/fixing files: $${modified_py_files}"; \
ruff check $${modified_py_files} --fix --exclude $(exclude_folders); \
ruff format $${modified_py_files} --exclude $(exclude_folders); \
else \
echo "No library .py files were modified"; \
fi; \
fi
# Update src/transformers/dependency_versions_table.py
@ -40,11 +46,13 @@ repo-consistency:
python utils/check_dummies.py
python utils/check_repo.py
python utils/check_inits.py
python utils/check_pipeline_typing.py
python utils/check_config_docstrings.py
python utils/check_config_attributes.py
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_docstrings.py
python utils/add_dates.py
# this target runs checks on all files
@ -81,6 +89,7 @@ fix-copies:
python utils/check_copies.py --fix_and_overwrite
python utils/check_modular_conversion.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
python utils/check_pipeline_typing.py --fix_and_overwrite
python utils/check_doctest_list.py --fix_and_overwrite
python utils/check_docstrings.py --fix_and_overwrite

View File

@ -44,13 +44,14 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Рortuguês</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> |
</p>
</h4>
@ -59,18 +60,27 @@ limitations under the License.
</h3>
<h3 align="center">
<a href="https://hf.co/course"><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/course_banner.png"></a>
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers is a library of pretrained text, computer vision, audio, video, and multimodal models for inference and training. Use Transformers to fine-tune models on your data, build inference applications, and for generative AI use cases across multiple modalities.
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
There are over 500K+ Transformers [model checkpoints](https://huggingface.co/models?library=transformers&sort=trending) on the [Hugging Face Hub](https://huggingface.com/models) you can use.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...),
and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`.
We pledge to help support new state-of-the-art models and democratize their usage by having their model definition be
simple, customizable, and efficient.
There are over 1M+ Transformers [model checkpoints](https://huggingface.co/models?library=transformers&sort=trending) on the [Hugging Face Hub](https://huggingface.com/models) you can use.
Explore the [Hub](https://huggingface.com/) today to find a model and use Transformers to help you get started right away.
## Installation
Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+.
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
@ -137,7 +147,7 @@ chat = [
{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
```
@ -183,7 +193,6 @@ pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.pn
<details>
<summary>Visual question answering</summary>
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
</h3>
@ -232,7 +241,7 @@ pipeline(
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
- The [example scripts]((https://github.com/huggingface/transformers/tree/main/examples)) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
## 100 projects using Transformers
@ -270,8 +279,8 @@ Expand each modality below to see a few example models for various use cases.
- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
- Keypoint detection with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue)
- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)

View File

@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re
models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
To avoid loading models from unsafe formats (e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
### Remote code

View File

@ -6,7 +6,7 @@ developers, researchers, students, professors, engineers, and anyone else to bui
In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [gpt4all](https://github.com/nomic-ai/gpt4all)
@ -49,7 +49,7 @@ Keywords: LLMs, Large Language Models, Agents, Chains
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
## [ParlAI](https://github.com/facebookresearch/ParlAI)
@ -257,7 +257,7 @@ Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusi
Keywords: Text-to-3D, Stable Diffusion
## [txtai](https://github.com/neuml/txtai)
[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
Keywords: Semantic search, LLM
@ -288,7 +288,7 @@ Keywords: Music understanding, Music generation
## [dalle-flow](https://github.com/jina-ai/dalle-flow)
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. Itt leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
DALL·E Flow is an interactive workflow for generating high-definition images from a text prompt. It leverages DALL·E-Mega, GLID-3 XL, and Stable Diffusion to generate image candidates, and then calls CLIP-as-service to rank the candidates w.r.t. the prompt.
The preferred candidate is fed to GLID-3 XL for diffusion, which often enriches the texture and background. Finally, the candidate is upscaled to 1024x1024 via SwinIR.
Keywords: High-definition image generation, Stable Diffusion, DALL-E Mega, GLID-3 XL, CLIP, SwinIR
@ -309,8 +309,8 @@ Keywords: OCR, LaTeX, Math formula
OpenCLIP is an open source implementation of OpenAI's CLIP.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
@ -526,7 +526,7 @@ Keywords: Model deployment, CLoud, Mobile, Edge
## [underthesea](https://github.com/undertheseanlp/underthesea)
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provides extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
[underthesea](https://github.com/undertheseanlp/underthesea) is a Vietnamese NLP toolkit. Underthesea is a suite of open source Python modules data sets and tutorials supporting research and development in Vietnamese Natural Language Processing. We provide extremely easy API to quickly apply pretrained NLP models to your Vietnamese text, such as word segmentation, part-of-speech tagging (PoS), named entity recognition (NER), text classification and dependency parsing.
Keywords: Vietnamese, NLP
@ -596,7 +596,7 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active
## [BentoML](https://github.com/bentoml/BentoML)
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
Keywords: BentoML, Framework, Deployment, AI Applications
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen

1
benchmark/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
benchmark_results/

354
benchmark/benches/llama.py Normal file
View File

@ -0,0 +1,354 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
from logging import Logger
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
# Add the parent directory to Python path to import benchmarks_entrypoint
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import gpustat
import psutil
import psycopg2
from benchmarks_entrypoint import MetricsRecorder
# Optional heavy ML dependencies - only required when actually running the benchmark
try:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
TRANSFORMERS_AVAILABLE = True
except ImportError:
TRANSFORMERS_AVAILABLE = False
torch = None
AutoModelForCausalLM = None
AutoTokenizer = None
GenerationConfig = None
StaticCache = None
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
# Only set torch precision if torch is available
if TRANSFORMERS_AVAILABLE:
torch.set_float32_matmul_precision("high")
def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
p = psutil.Process(os.getpid())
while not continue_metric_collection.is_set():
with p.oneshot():
cpu_util = p.cpu_percent()
mem_megabytes = p.memory_info().rss / (1024 * 1024)
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_util = gpu_stats[0]["utilization.gpu"]
gpu_mem_megabytes = gpu_stats[0]["memory.used"]
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
sleep(0.01)
def run_benchmark(
logger: Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
metrics_recorder=None,
num_tokens_to_generate=100,
):
# Check if required ML dependencies are available
if not TRANSFORMERS_AVAILABLE:
logger.error("Transformers and torch are required to run the LLaMA benchmark. Please install them with:")
logger.error("pip install torch transformers")
logger.error("Skipping LLaMA benchmark due to missing dependencies.")
return
continue_metric_collection = Event()
metrics_thread = None
model_id = "meta-llama/Llama-2-7b-hf"
# If no metrics_recorder is provided, create one for backward compatibility
if metrics_recorder is None:
try:
metrics_recorder = MetricsRecorder(
psycopg2.connect("dbname=metrics"), logger, repository, branch, commit_id, commit_msg, True
)
should_close_recorder = True
except Exception as e:
logger.error(f"Failed to create metrics recorder: {e}")
return
else:
should_close_recorder = False
try:
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_name = gpu_stats[0]["name"]
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
metrics_thread = Thread(
target=collect_metrics,
args=[benchmark_id, continue_metric_collection, metrics_recorder],
)
metrics_thread.start()
logger.info("started background thread to fetch device metrics")
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
device = "cuda"
logger.info("downloading weights")
# This is to avoid counting download in model load time measurement
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
logger.info("loading model")
start = perf_counter()
model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=torch.float16, generation_config=gen_config
).eval()
model.to(device)
torch.cuda.synchronize()
end = perf_counter()
model_load_time = end - start
logger.info(f"loaded model in: {model_load_time}s")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Why dogs are so cute?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Specify the max length (including both the prompt and the response)
# When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
# with sequence length = `max_length`. The longer the more you will re-use it
seq_length = inputs["input_ids"].shape[1]
model.generation_config.max_length = seq_length + num_tokens_to_generate
batch_size = inputs["input_ids"].shape[0]
# Copied from the gpt-fast repo
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
pivot = v.select(-1, -1).unsqueeze(-1)
logits = torch.where(logits < pivot, -float("Inf"), logits)
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
probs = logits_to_probs(logits[0, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs
# First eager forward pass
logger.info("running first eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_fwd_pass_time = end - start
logger.info(f"completed first eager forward pass in: {first_eager_fwd_pass_time}s")
# Second eager forward pass (should be faster)
logger.info("running second eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_fwd_pass_time = end - start
logger.info(f"completed second eager forward pass in: {second_eager_fwd_pass_time}s")
# First eager generation
logger.info("running first eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_generate_time = end - start
logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
# Second eager generation (should be faster)
logger.info("running second eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_generate_time = end - start
logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
logger.info("running generation timing loop")
input_pos = torch.arange(0, seq_length, device=device)
inputs = inputs["input_ids"]
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(inputs, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_first_token = end - start
input_pos = torch.tensor([seq_length], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_second_token = end - start
input_pos = torch.tensor([seq_length + 1], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_third_token = end - start
logger.info("running longer generation timing loop")
total_time = 0
for i in range(20):
input_pos = torch.tensor([seq_length + 2 + i], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
total_time += end - start
mean_time_to_next_token = total_time / 20
logger.info("running compilation benchmarks")
# Now compile the model
model = torch.compile(model, mode="max-autotune", fullgraph=True)
# StaticCache for generation
with torch.device(device):
model.setup_caches(max_batch_size=batch_size, max_seq_len=seq_length + num_tokens_to_generate)
input_pos = torch.arange(0, seq_length, device=device)
inputs = tokenizer(prompt, return_tensors="pt").to(device)["input_ids"]
logger.info("compiling model")
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16, generation_config=gen_config)
model.to(device)
model = torch.compile(model, mode="max-autotune", fullgraph=True)
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 1st call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
first_compile_generate_time = end - start
logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 2nd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
second_compile_generate_time = end - start
logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 3rd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
third_compile_generate_time = end - start
logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 4th call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
fourth_compile_generate_time = end - start
logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
except Exception as e:
logger.error(f"Caught exception: {e}")
continue_metric_collection.set()
if metrics_thread is not None:
metrics_thread.join()
# Only close the recorder if we created it locally
if should_close_recorder:
metrics_recorder.close()

View File

@ -31,9 +31,7 @@ from contextlib import contextmanager
from pathlib import Path
from git import Repo
from huggingface_hub import HfApi
from optimum_benchmark import Benchmark
from optimum_benchmark_wrapper import main

View File

@ -1,15 +1,36 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import importlib.util
import json
import logging
import os
from typing import Dict
import sys
import uuid
from datetime import datetime
from psycopg2.extras import Json
from psycopg2.extensions import register_adapter
import pandas as pd
register_adapter(dict, Json)
try:
from psycopg2.extensions import register_adapter
from psycopg2.extras import Json
register_adapter(dict, Json)
PSYCOPG2_AVAILABLE = True
except ImportError:
PSYCOPG2_AVAILABLE = False
class ImportModuleException(Exception):
@ -17,59 +38,273 @@ class ImportModuleException(Exception):
class MetricsRecorder:
def __init__(self, connection, logger: logging.Logger, branch: str, commit_id: str, commit_msg: str):
def __init__(
self,
connection,
logger: logging.Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
collect_csv_data: bool = True,
):
self.conn = connection
self.conn.autocommit = True
self.use_database = connection is not None
if self.use_database:
self.conn.autocommit = True
self.logger = logger
self.repository = repository
self.branch = branch
self.commit_id = commit_id
self.commit_msg = commit_msg
self.collect_csv_data = collect_csv_data
def initialise_benchmark(self, metadata: Dict[str, str]) -> int:
"""
Creates a new benchmark, returns the benchmark id
"""
# gpu_name: str, model_id: str
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO benchmarks (branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s) RETURNING benchmark_id",
(self.branch, self.commit_id, self.commit_msg, metadata),
# For CSV export - store all data in pandas DataFrames (only if CSV collection is enabled)
if self.collect_csv_data:
# Initialize empty DataFrames with proper schemas
self.benchmarks_df = pd.DataFrame(
columns=[
"benchmark_id",
"repository",
"branch",
"commit_id",
"commit_message",
"metadata",
"created_at",
]
)
benchmark_id = cur.fetchone()[0]
logger.debug(f"initialised benchmark #{benchmark_id}")
return benchmark_id
self.device_measurements_df = pd.DataFrame(
columns=["benchmark_id", "cpu_util", "mem_megabytes", "gpu_util", "gpu_mem_megabytes", "time"]
)
self.model_measurements_df = pd.DataFrame(
columns=[
"benchmark_id",
"time",
"model_load_time",
"first_eager_forward_pass_time_secs",
"second_eager_forward_pass_time_secs",
"first_eager_generate_time_secs",
"second_eager_generate_time_secs",
"time_to_first_token_secs",
"time_to_second_token_secs",
"time_to_third_token_secs",
"time_to_next_token_mean_secs",
"first_compile_generate_time_secs",
"second_compile_generate_time_secs",
"third_compile_generate_time_secs",
"fourth_compile_generate_time_secs",
]
)
else:
self.benchmarks_df = None
self.device_measurements_df = None
self.model_measurements_df = None
def collect_device_measurements(self, benchmark_id: int, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
def initialise_benchmark(self, metadata: dict[str, str]) -> str:
"""
Creates a new benchmark, returns the benchmark id (UUID)
"""
# Generate a unique UUID for this benchmark
benchmark_id = str(uuid.uuid4())
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO benchmarks (benchmark_id, repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s, %s)",
(benchmark_id, self.repository, self.branch, self.commit_id, self.commit_msg, metadata),
)
self.logger.debug(f"initialised benchmark #{benchmark_id}")
# Store benchmark data for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"repository": self.repository,
"branch": self.branch,
"commit_id": self.commit_id,
"commit_message": self.commit_msg,
"metadata": json.dumps(metadata),
"created_at": datetime.utcnow().isoformat(),
}
]
)
self.benchmarks_df = pd.concat([self.benchmarks_df, new_row], ignore_index=True)
mode_info = []
if self.use_database:
mode_info.append("database")
if self.collect_csv_data:
mode_info.append("CSV")
mode_str = " + ".join(mode_info) if mode_info else "no storage"
self.logger.debug(f"initialised benchmark #{benchmark_id} ({mode_str} mode)")
return benchmark_id
def collect_device_measurements(self, benchmark_id: str, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
"""
Collect device metrics, such as CPU & GPU usage. These are "static", as in you cannot pass arbitrary arguments to the function.
"""
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
(benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
# Store device measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"cpu_util": cpu_util,
"mem_megabytes": mem_megabytes,
"gpu_util": gpu_util,
"gpu_mem_megabytes": gpu_mem_megabytes,
"time": datetime.utcnow().isoformat(),
}
]
)
self.device_measurements_df = pd.concat([self.device_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
(benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
)
self.logger.debug(
f"inserted device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
f"collected device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
)
def collect_model_measurements(self, benchmark_id: int, measurements: Dict[str, float]):
with self.conn.cursor() as cur:
cur.execute(
"""
INSERT INTO model_measurements (
benchmark_id,
measurements
) VALUES (%s, %s)
""",
(
benchmark_id,
measurements,
),
def collect_model_measurements(self, benchmark_id: str, measurements: dict[str, float]):
# Store model measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame with flattened measurements
row_data = {"benchmark_id": benchmark_id, "time": datetime.utcnow().isoformat()}
# Flatten the measurements dict into the row
row_data.update(measurements)
new_row = pd.DataFrame([row_data])
self.model_measurements_df = pd.concat([self.model_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"""
INSERT INTO model_measurements (
benchmark_id,
measurements
) VALUES (%s, %s)
""",
(
benchmark_id,
measurements,
),
)
self.logger.debug(f"collected model measurements for benchmark #{benchmark_id}: {measurements}")
def export_to_csv(self, output_dir: str = "benchmark_results"):
"""
Export all collected data to CSV files using pandas DataFrames
"""
if not self.collect_csv_data:
self.logger.warning("CSV data collection is disabled - no CSV files will be generated")
return
if not os.path.exists(output_dir):
os.makedirs(output_dir)
self.logger.info(f"Created output directory: {output_dir}")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
files_created = []
# Export using pandas DataFrames
self._export_pandas_data(output_dir, timestamp, files_created)
self.logger.info(f"CSV export complete! Created {len(files_created)} files in {output_dir}")
def _export_pandas_data(self, output_dir: str, timestamp: str, files_created: list):
"""
Export CSV files using pandas DataFrames
"""
# Export benchmarks
benchmarks_file = os.path.join(output_dir, f"benchmarks_{timestamp}.csv")
self.benchmarks_df.to_csv(benchmarks_file, index=False)
files_created.append(benchmarks_file)
self.logger.info(f"Exported {len(self.benchmarks_df)} benchmark records to {benchmarks_file}")
# Export device measurements
device_file = os.path.join(output_dir, f"device_measurements_{timestamp}.csv")
self.device_measurements_df.to_csv(device_file, index=False)
files_created.append(device_file)
self.logger.info(f"Exported {len(self.device_measurements_df)} device measurement records to {device_file}")
# Export model measurements (already flattened)
model_file = os.path.join(output_dir, f"model_measurements_{timestamp}.csv")
self.model_measurements_df.to_csv(model_file, index=False)
files_created.append(model_file)
self.logger.info(f"Exported {len(self.model_measurements_df)} model measurement records to {model_file}")
# Create comprehensive summary using pandas operations
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.csv")
self._create_summary(summary_file)
files_created.append(summary_file)
def _create_summary(self, summary_file: str):
"""
Create a comprehensive summary CSV using pandas operations
"""
if len(self.benchmarks_df) == 0:
# Create empty summary file
summary_df = pd.DataFrame()
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created empty benchmark summary at {summary_file}")
return
# Start with benchmarks as the base
summary_df = self.benchmarks_df.copy()
# Add model measurements (join on benchmark_id)
if len(self.model_measurements_df) > 0:
# Drop 'time' column from model measurements to avoid conflicts
model_df = self.model_measurements_df.drop(columns=["time"], errors="ignore")
summary_df = summary_df.merge(model_df, on="benchmark_id", how="left")
# Calculate device measurement aggregates using pandas groupby
if len(self.device_measurements_df) > 0:
device_agg = (
self.device_measurements_df.groupby("benchmark_id")
.agg(
{
"cpu_util": ["mean", "max", "std", "count"],
"mem_megabytes": ["mean", "max", "std"],
"gpu_util": ["mean", "max", "std"],
"gpu_mem_megabytes": ["mean", "max", "std"],
}
)
.round(3)
)
self.logger.debug(f"inserted model measurements for benchmark #{benchmark_id}: {measurements}")
# Flatten column names
device_agg.columns = [f"{col[0]}_{col[1]}" for col in device_agg.columns]
device_agg = device_agg.reset_index()
# Rename count column to be more descriptive
if "cpu_util_count" in device_agg.columns:
device_agg = device_agg.rename(columns={"cpu_util_count": "device_measurement_count"})
# Merge with summary
summary_df = summary_df.merge(device_agg, on="benchmark_id", how="left")
# Export the comprehensive summary
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created comprehensive benchmark summary with {len(summary_df)} records at {summary_file}")
def close(self):
self.conn.close()
if self.use_database and self.conn:
self.conn.close()
logger = logging.getLogger(__name__)
@ -82,12 +317,18 @@ handler.setFormatter(formatter)
logger.addHandler(handler)
def parse_arguments():
def parse_arguments() -> tuple[str, str, str, str, bool, str]:
"""
Parse command line arguments for the benchmarking CLI.
"""
parser = argparse.ArgumentParser(description="CLI for benchmarking the huggingface/transformers.")
parser.add_argument(
"repository",
type=str,
help="The repository name on which the benchmarking is performed.",
)
parser.add_argument(
"branch",
type=str,
@ -106,9 +347,21 @@ def parse_arguments():
help="The commit message associated with the commit, truncated to 70 characters.",
)
parser.add_argument("--csv", action="store_true", default=False, help="Enable CSV output files generation.")
parser.add_argument(
"--csv-output-dir",
type=str,
default="benchmark_results",
help="Directory for CSV output files (default: benchmark_results).",
)
args = parser.parse_args()
return args.branch, args.commit_id, args.commit_msg
# CSV is disabled by default, only enabled when --csv is used
generate_csv = args.csv
return args.repository, args.branch, args.commit_id, args.commit_msg, generate_csv, args.csv_output_dir
def import_from_path(module_name, file_path):
@ -122,22 +375,128 @@ def import_from_path(module_name, file_path):
raise ImportModuleException(f"failed to load python module: {e}")
def create_database_connection():
"""
Try to create a database connection. Returns None if connection fails.
"""
if not PSYCOPG2_AVAILABLE:
logger.warning("psycopg2 not available - running in CSV-only mode")
return None
try:
import psycopg2
conn = psycopg2.connect("dbname=metrics")
logger.info("Successfully connected to database")
return conn
except Exception as e:
logger.warning(f"Failed to connect to database: {e}. Running in CSV-only mode")
return None
def create_global_metrics_recorder(
repository: str, branch: str, commit_id: str, commit_msg: str, generate_csv: bool = False
) -> MetricsRecorder:
"""
Create a global metrics recorder that will be used across all benchmarks.
"""
connection = create_database_connection()
recorder = MetricsRecorder(connection, logger, repository, branch, commit_id, commit_msg, generate_csv)
# Log the storage mode
storage_modes = []
if connection is not None:
storage_modes.append("database")
if generate_csv:
storage_modes.append("CSV")
if not storage_modes:
logger.warning("Running benchmarks with NO data storage (no database connection, CSV disabled)")
logger.warning("Use --csv flag to enable CSV output when database is unavailable")
else:
logger.info(f"Running benchmarks with: {' + '.join(storage_modes)} storage")
return recorder
if __name__ == "__main__":
benchmarks_folder_path = os.path.dirname(os.path.realpath(__file__))
benches_folder_path = os.path.join(benchmarks_folder_path, "benches")
branch, commit_id, commit_msg = parse_arguments()
repository, branch, commit_id, commit_msg, generate_csv, csv_output_dir = parse_arguments()
for entry in os.scandir(benchmarks_folder_path):
try:
# Create a global metrics recorder
global_metrics_recorder = create_global_metrics_recorder(repository, branch, commit_id, commit_msg, generate_csv)
successful_benchmarks = 0
failed_benchmarks = 0
# Automatically discover all benchmark modules in benches/ folder
benchmark_modules = []
if os.path.exists(benches_folder_path):
logger.debug(f"Scanning for benchmarks in: {benches_folder_path}")
for entry in os.scandir(benches_folder_path):
if not entry.name.endswith(".py"):
continue
if entry.path == __file__:
if entry.name.startswith("__"): # Skip __init__.py, __pycache__, etc.
continue
logger.debug(f"loading: {entry.name}")
module = import_from_path(entry.name.split(".")[0], entry.path)
logger.info(f"running benchmarks in: {entry.name}")
module.run_benchmark(logger, branch, commit_id, commit_msg)
# Check if the file has a run_benchmark function
try:
logger.debug(f"checking if benches/{entry.name} has run_benchmark function")
module = import_from_path(entry.name.split(".")[0], entry.path)
if hasattr(module, "run_benchmark"):
benchmark_modules.append(entry.name)
logger.debug(f"discovered benchmark: {entry.name}")
else:
logger.debug(f"skipping {entry.name} - no run_benchmark function found")
except Exception as e:
logger.debug(f"failed to check benches/{entry.name}: {e}")
else:
logger.warning(f"Benches directory not found: {benches_folder_path}")
if benchmark_modules:
logger.info(f"Discovered {len(benchmark_modules)} benchmark(s): {benchmark_modules}")
else:
logger.warning("No benchmark modules found in benches/ directory")
for module_name in benchmark_modules:
module_path = os.path.join(benches_folder_path, module_name)
try:
logger.debug(f"loading: {module_name}")
module = import_from_path(module_name.split(".")[0], module_path)
logger.info(f"running benchmarks in: {module_name}")
# Check if the module has an updated run_benchmark function that accepts metrics_recorder
try:
# Try the new signature first
module.run_benchmark(logger, repository, branch, commit_id, commit_msg, global_metrics_recorder)
except TypeError:
# Fall back to the old signature for backward compatibility
logger.warning(
f"Module {module_name} using old run_benchmark signature - database connection will be created per module"
)
module.run_benchmark(logger, repository, branch, commit_id, commit_msg)
successful_benchmarks += 1
except ImportModuleException as e:
logger.error(e)
failed_benchmarks += 1
except Exception as e:
logger.error(f"error running benchmarks for {entry.name}: {e}")
logger.error(f"error running benchmarks for {module_name}: {e}")
failed_benchmarks += 1
# Export CSV results at the end (if enabled)
try:
if generate_csv:
global_metrics_recorder.export_to_csv(csv_output_dir)
logger.info(f"CSV reports have been generated and saved to the {csv_output_dir} directory")
else:
logger.info("CSV generation disabled - no CSV files created (use --csv to enable)")
logger.info(f"Benchmark run completed. Successful: {successful_benchmarks}, Failed: {failed_benchmarks}")
except Exception as e:
logger.error(f"Failed to export CSV results: {e}")
finally:
global_metrics_recorder.close()

View File

@ -19,7 +19,7 @@ backend:
model: meta-llama/Llama-2-7b-hf
cache_implementation: static
torch_compile: true
torch_dtype: float16
dtype: float16
torch_compile_config:
backend: inductor
mode: reduce-overhead

View File

@ -1,33 +0,0 @@
CREATE TABLE IF NOT EXISTS benchmarks (
benchmark_id SERIAL PRIMARY KEY,
branch VARCHAR(255),
commit_id VARCHAR(72),
commit_message VARCHAR(70),
metadata jsonb,
created_at timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS benchmarks_benchmark_id_idx ON benchmarks (benchmark_id);
CREATE INDEX IF NOT EXISTS benchmarks_branch_idx ON benchmarks (branch);
CREATE TABLE IF NOT EXISTS device_measurements (
measurement_id SERIAL PRIMARY KEY,
benchmark_id int REFERENCES benchmarks (benchmark_id),
cpu_util double precision,
mem_megabytes double precision,
gpu_util double precision,
gpu_mem_megabytes double precision,
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS device_measurements_branch_idx ON device_measurements (benchmark_id);
CREATE TABLE IF NOT EXISTS model_measurements (
measurement_id SERIAL PRIMARY KEY,
benchmark_id int REFERENCES benchmarks (benchmark_id),
measurements jsonb,
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS model_measurements_branch_idx ON model_measurements (benchmark_id);

View File

@ -1,342 +0,0 @@
from logging import Logger
import os
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
from benchmarks_entrypoint import MetricsRecorder
import gpustat
import psutil
import psycopg2
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
p = psutil.Process(os.getpid())
while not continue_metric_collection.is_set():
with p.oneshot():
cpu_util = p.cpu_percent()
mem_megabytes = p.memory_info().rss / (1024 * 1024)
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_util = gpu_stats[0]["utilization.gpu"]
gpu_mem_megabytes = gpu_stats[0]["memory.used"]
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
sleep(0.01)
def run_benchmark(logger: Logger, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100):
continue_metric_collection = Event()
metrics_thread = None
model_id = "meta-llama/Llama-2-7b-hf"
metrics_recorder = MetricsRecorder(psycopg2.connect("dbname=metrics"), logger, branch, commit_id, commit_msg)
try:
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_name = gpu_stats[0]["name"]
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
metrics_thread = Thread(
target=collect_metrics,
args=[benchmark_id, continue_metric_collection, metrics_recorder],
)
metrics_thread.start()
logger.info("started background thread to fetch device metrics")
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
device = "cuda"
logger.info("downloading weights")
# This is to avoid counting download in model load time measurement
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
logger.info("loading model")
start = perf_counter()
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.float16, generation_config=gen_config
).eval()
model.to(device)
torch.cuda.synchronize()
end = perf_counter()
model_load_time = end - start
logger.info(f"loaded model in: {model_load_time}s")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Why dogs are so cute?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Specify the max length (including both the prompt and the response)
# When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
# with sequence length = `max_length`. The longer the more you will re-use it
seq_length = inputs["input_ids"].shape[1]
model.generation_config.max_length = seq_length + num_tokens_to_generate
batch_size = inputs["input_ids"].shape[0]
# Copied from the gpt-fast repo
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
pivot = v.select(-1, -1).unsqueeze(-1)
logits = torch.where(logits < pivot, -float("Inf"), logits)
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
probs = logits_to_probs(logits[:, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs
def decode_one_token(model, cur_token, cache_position, past_key_values):
logits = model(
cur_token,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)[0]
new_token = sample(logits, temperature=0.6, top_k=5)[0]
return new_token
#########
# Eager #
#########
with torch.no_grad():
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate,
)
cache_position = torch.arange(seq_length, device=device)
start = perf_counter()
model(
**inputs,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)
end = perf_counter()
first_eager_fwd_pass_time = end - start
logger.info(f"completed first eager fwd pass in: {first_eager_fwd_pass_time}s")
start = perf_counter()
output = model.generate(**inputs, do_sample=False)
end = perf_counter()
first_eager_generate_time = end - start
logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate,
)
cache_position = torch.arange(seq_length, device=device)
start = perf_counter()
model(
**inputs,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)
end = perf_counter()
second_eager_fwd_pass_time = end - start
logger.info(f"completed second eager fwd pass in: {second_eager_fwd_pass_time}s")
start = perf_counter()
model.generate(**inputs, do_sample=False)
end = perf_counter()
second_eager_generate_time = end - start
logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
torch.compiler.reset()
################
# Forward pass #
################
# `torch.compile(model, ...)` is not recommended as you compile callbacks
# and full generate. We recommend compiling only the forward for now.
# "reduce-overhead" will use cudagraphs.
generated_ids = torch.zeros(
(batch_size, num_tokens_to_generate + seq_length), dtype=torch.int, device=device
)
generated_ids[:, :seq_length] = inputs["input_ids"]
decode_one_token = torch.compile(decode_one_token, mode="reduce-overhead", fullgraph=True)
# model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
# TODO use decode_one_token(model, input_id.clone(), cache_position) for verification
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate + 10,
)
cache_position = torch.arange(seq_length, device=device)
all_generated_tokens = []
### First compile, prefill
start = perf_counter()
next_token = decode_one_token(
model, inputs["input_ids"], cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_first_token = end - start
logger.info(f"completed first compile generation in: {time_to_first_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
cache_position = torch.tensor([seq_length], device=device)
### First compile, decoding
start = perf_counter()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_second_token = end - start
logger.info(f"completed second compile generation in: {time_to_second_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
### Second compile, decoding
start = perf_counter()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_third_token = end - start
logger.info(f"completed third compile forward in: {time_to_third_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
### Using cuda graphs decoding
start = perf_counter()
for _ in range(1, num_tokens_to_generate):
all_generated_tokens += next_token.tolist()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
cache_position += 1
torch.cuda.synchronize()
end = perf_counter()
mean_time_to_next_token = (end - start) / num_tokens_to_generate
logger.info(f"completed next compile generation in: {mean_time_to_next_token}s")
logger.info(f"generated: {tokenizer.batch_decode(all_generated_tokens)}")
####################
# Generate compile #
####################
torch.compiler.reset()
# we will not compile full generate as it' s to intensive, tho we measure full forward!
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 1st call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
torch.cuda.synchronize()
end = perf_counter()
first_compile_generate_time = end - start
logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 2nd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
torch.cuda.synchronize()
end = perf_counter()
second_compile_generate_time = end - start
logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 3rd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
third_compile_generate_time = end - start
logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 4th call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
fourth_compile_generate_time = end - start
logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
except Exception as e:
logger.error(f"Caught exception: {e}")
continue_metric_collection.set()
if metrics_thread is not None:
metrics_thread.join()
metrics_recorder.close()

View File

@ -3,7 +3,11 @@ import subprocess
def main(config_dir, config_name, args):
subprocess.run(["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"] + ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"] + args)
subprocess.run(
["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"]
+ ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"]
+ args
)
if __name__ == "__main__":

View File

@ -2,4 +2,5 @@ gpustat==1.1.1
psutil==6.0.0
psycopg2==2.9.9
torch>=2.4.0
hf_transfer
hf_transfer
pandas>=1.5.0

View File

1
benchmark_v2/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
benchmark_results/

138
benchmark_v2/README.md Normal file
View File

@ -0,0 +1,138 @@
# Benchmarking v2
A comprehensive benchmarking framework for transformer models that supports multiple execution modes (eager, compiled, kernelized), detailed performance metrics collection, and structured output format.
## Quick Start
### Running All Benchmarks
```bash
# Run all benchmarks with default settings
python run_benchmarks.py
# Specify output directory
python run_benchmarks.py --output-dir my_results
# Run with custom parameters
python run_benchmarks.py \
--warmup-iterations 5 \
--measurement-iterations 10 \
--num-tokens-to-generate 200
```
### Uploading Results to HuggingFace Dataset
You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:
```bash
# Upload to a public dataset with auto-generated run ID
python run_benchmarks.py --upload-to-hub username/benchmark-results
# Upload with a custom run ID for easy identification
python run_benchmarks.py --upload-to-hub username/benchmark-results --run-id experiment_v1
# Upload with custom HuggingFace token (if not set in environment)
python run_benchmarks.py --upload-to-hub username/benchmark-results --token hf_your_token_here
```
**Dataset Directory Structure:**
```
dataset_name/
├── 2025-01-15/
│ ├── runs/ # Non-scheduled runs (manual, PR, etc.)
│ │ └── 123-1245151651/ # GitHub run number and ID
│ │ └── benchmark_results/
│ │ ├── benchmark_summary_20250115_143022.json
│ │ └── model-name/
│ │ └── model-name_benchmark_20250115_143022.json
│ └── benchmark_results_abc123de/ # Scheduled runs (daily CI)
│ ├── benchmark_summary_20250115_143022.json
│ └── model-name/
│ └── model-name_benchmark_20250115_143022.json
└── 2025-01-16/
└── ...
```
**Authentication for Uploads:**
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash
# Include only specific benchmarks
python run_benchmarks.py --include llama
# Exclude specific benchmarks
python run_benchmarks.py --exclude old_benchmark
## Output Format
Results are saved as JSON files with the following structure:
```json
{
"model_name": "llama_2_7b",
"benchmark_scenarios": [
{
"scenario_name": "eager_variant",
"metadata": {
"timestamp": "2025-01-XX...",
"commit_id": "abc123...",
"hardware_info": {
"gpu_name": "NVIDIA A100",
"gpu_memory_total": 40960,
"cpu_count": 64
},
"config": {
"variant": "eager",
"warmup_iterations": 3,
"measurement_iterations": 5
}
},
"measurements": {
"latency": {
"mean": 2.45,
"median": 2.43,
"std": 0.12,
"min": 2.31,
"max": 2.67,
"p95": 2.61,
"p99": 2.65
},
"time_to_first_token": {
"mean": 0.15,
"std": 0.02
},
"tokens_per_second": {
"mean": 87.3,
"unit": "tokens/sec"
}
},
"gpu_metrics": {
"gpu_utilization_mean": 85.2,
"gpu_memory_used_mean": 12450
}
}
]
}
```
### Debug Mode
```bash
python run_benchmarks.py --log-level DEBUG
```
## Contributing
To add new benchmarks:
1. Create a new file in `benches/`
2. Implement the `ModelBenchmark` interface
3. Add a runner function (`run_<benchmark_name>` or `run_benchmark`)
4. run_benchmarks.py

View File

@ -0,0 +1 @@
# Benchmark implementations directory

View File

@ -0,0 +1,165 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
from typing import Any
import torch
from benchmark_framework import ModelBenchmark
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
class LLaMABenchmark(ModelBenchmark):
"""Simplified LLaMA model benchmark implementation using the ModelBenchmark base class."""
def __init__(self, logger: logging.Logger):
super().__init__(logger)
self._default_prompt = "Why dogs are so cute?" # Custom prompt for LLaMA
def get_scenario_configs(self) -> list[dict[str, Any]]:
"""
Get LLaMA-specific scenario configurations.
Returns:
List of scenario configuration dictionaries
"""
return [
# Eager variants
{"variant": "eager", "compile_mode": None, "use_cache": True, "description": "Eager execution with cache"},
# Compiled variants
{
"variant": "compiled",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Compiled with max autotune",
},
# Kernelized variant (if available)
{
"variant": "kernelized",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Kernelized execution",
},
]
def _is_kernelization_available(self) -> bool:
"""Check if kernelization is available for LLaMA."""
try:
from kernels import Mode, kernelize # noqa: F401
return True
except ImportError:
self.logger.debug("Kernelization not available: kernels module not found")
return False
def get_default_generation_config(self) -> dict[str, Any]:
"""Get LLaMA-specific generation configuration."""
return {
"do_sample": False,
"top_p": 1.0,
"temperature": 1.0,
"repetition_penalty": 1.0,
"max_new_tokens": None, # Will be set per scenario
}
def get_model_init_kwargs(self, config) -> dict[str, Any]:
"""Get LLaMA-specific model initialization kwargs."""
return {
"torch_dtype": getattr(torch, config.torch_dtype),
"attn_implementation": config.attn_implementation,
"use_cache": True,
}
def get_default_torch_dtype(self) -> str:
"""Get default torch dtype for LLaMA."""
return "float16" # LLaMA works well with float16
def get_default_device(self) -> str:
"""Get default device for LLaMA."""
return "cuda" # LLaMA prefers CUDA
def run_llama(logger, output_dir, **kwargs):
"""
Run LLaMA benchmark with the given configuration.
Args:
logger: Logger instance
output_dir: Output directory for results
**kwargs: Additional configuration options
Returns:
Path to output file if successful
"""
from benchmark_framework import BenchmarkRunner
# Extract parameters with defaults
model_id = kwargs.get("model_id", "meta-llama/Llama-2-7b-hf")
warmup_iterations = kwargs.get("warmup_iterations", 3)
measurement_iterations = kwargs.get("measurement_iterations", 5)
num_tokens_to_generate = kwargs.get("num_tokens_to_generate", 100)
include_sdpa_variants = kwargs.get("include_sdpa_variants", True)
device = kwargs.get("device", "cuda")
torch_dtype = kwargs.get("torch_dtype", "float16")
batch_size = kwargs.get("batch_size", 1)
commit_id = kwargs.get("commit_id")
logger.info(f"Starting LLaMA benchmark for model: {model_id}")
logger.info(
f"Configuration: warmup={warmup_iterations}, measurement={measurement_iterations}, tokens={num_tokens_to_generate}"
)
try:
# Create benchmark instance
benchmark = LLaMABenchmark(logger)
# Create scenarios
scenarios = benchmark.create_scenarios(
model_id=model_id,
warmup_iterations=warmup_iterations,
measurement_iterations=measurement_iterations,
num_tokens_to_generate=num_tokens_to_generate,
include_sdpa_variants=include_sdpa_variants,
device=device,
torch_dtype=torch_dtype,
batch_size=batch_size,
)
logger.info(f"Created {len(scenarios)} benchmark scenarios")
# Create runner and execute benchmarks
runner = BenchmarkRunner(logger, output_dir)
results = runner.run_benchmark(benchmark, scenarios, commit_id=commit_id)
if not results:
logger.warning("No successful benchmark results")
return None
# Save results
model_name = model_id.split("/")[-1] # Extract model name from ID
output_file = runner.save_results(model_name, results)
logger.info(f"LLaMA benchmark completed successfully. Results saved to: {output_file}")
return output_file
except Exception as e:
logger.error(f"LLaMA benchmark failed: {e}")
import traceback
logger.debug(traceback.format_exc())
raise

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,7 @@
numpy>=1.21.0
psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
huggingface_hub>=0.16.0

495
benchmark_v2/run_benchmarks.py Executable file
View File

@ -0,0 +1,495 @@
#!/usr/bin/env python3
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Top-level benchmarking script that automatically discovers and runs all benchmarks
in the ./benches directory, organizing outputs into model-specific subfolders.
"""
import argparse
import importlib.util
import json
import logging
import os
import sys
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
def setup_logging(log_level: str = "INFO", enable_file_logging: bool = False) -> logging.Logger:
"""Setup logging configuration."""
numeric_level = getattr(logging, log_level.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError(f"Invalid log level: {log_level}")
handlers = [logging.StreamHandler(sys.stdout)]
if enable_file_logging:
handlers.append(logging.FileHandler(f"benchmark_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"))
logging.basicConfig(
level=numeric_level, format="[%(levelname)s - %(asctime)s] %(name)s: %(message)s", handlers=handlers
)
return logging.getLogger(__name__)
def discover_benchmarks(benches_dir: str) -> list[dict[str, Any]]:
"""
Discover all benchmark modules in the benches directory.
Returns:
List of dictionaries containing benchmark module info
"""
benchmarks = []
benches_path = Path(benches_dir)
if not benches_path.exists():
raise FileNotFoundError(f"Benches directory not found: {benches_dir}")
for py_file in benches_path.glob("*.py"):
if py_file.name.startswith("__"):
continue
module_name = py_file.stem
try:
# Import the module
spec = importlib.util.spec_from_file_location(module_name, py_file)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Check if it has a benchmark runner function
if hasattr(module, f"run_{module_name}"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, f"run_{module_name}"),
}
)
elif hasattr(module, "run_benchmark"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, "run_benchmark"),
}
)
else:
logging.warning(f"No runner function found in {py_file}")
except Exception as e:
logging.error(f"Failed to import {py_file}: {e}")
return benchmarks
def run_single_benchmark(
benchmark_info: dict[str, Any], output_dir: str, logger: logging.Logger, **kwargs
) -> Optional[str]:
"""
Run a single benchmark and return the output file path.
Args:
benchmark_info: Dictionary containing benchmark module info
output_dir: Base output directory
logger: Logger instance
**kwargs: Additional arguments to pass to the benchmark
Returns:
Path to the output file if successful, None otherwise
"""
benchmark_name = benchmark_info["name"]
runner_func = benchmark_info["runner_function"]
logger.info(f"Running benchmark: {benchmark_name}")
try:
# Check function signature to determine what arguments to pass
import inspect
sig = inspect.signature(runner_func)
# Prepare arguments based on function signature
func_kwargs = {"logger": logger, "output_dir": output_dir}
# Add other kwargs if the function accepts them
for param_name in sig.parameters:
if param_name in kwargs:
func_kwargs[param_name] = kwargs[param_name]
# Filter kwargs to only include parameters the function accepts
# If function has **kwargs, include all provided kwargs
has_var_kwargs = any(param.kind == param.VAR_KEYWORD for param in sig.parameters.values())
if has_var_kwargs:
valid_kwargs = {**func_kwargs, **kwargs}
else:
valid_kwargs = {k: v for k, v in func_kwargs.items() if k in sig.parameters}
# Run the benchmark
result = runner_func(**valid_kwargs)
if isinstance(result, str):
# Function returned a file path
return result
else:
logger.info(f"Benchmark {benchmark_name} completed successfully")
return "completed"
except Exception as e:
logger.error(f"Benchmark {benchmark_name} failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return None
def generate_summary_report(
output_dir: str,
benchmark_results: dict[str, Any],
logger: logging.Logger,
benchmark_run_uuid: Optional[str] = None,
) -> str:
"""Generate a summary report of all benchmark runs."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.json")
summary_data = {
"run_metadata": {
"timestamp": datetime.utcnow().isoformat(),
"benchmark_run_uuid": benchmark_run_uuid,
"total_benchmarks": len(benchmark_results),
"successful_benchmarks": len([r for r in benchmark_results.values() if r is not None]),
"failed_benchmarks": len([r for r in benchmark_results.values() if r is None]),
},
"benchmark_results": benchmark_results,
"output_directory": output_dir,
}
with open(summary_file, "w") as f:
json.dump(summary_data, f, indent=2, default=str)
logger.info(f"Summary report saved to: {summary_file}")
return summary_file
def upload_results_to_hf_dataset(
output_dir: str,
summary_file: str,
dataset_name: str,
run_id: Optional[str] = None,
token: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> Optional[str]:
"""
Upload benchmark results to a HuggingFace Dataset.
Based on upload_collated_report() from utils/collated_reports.py
Args:
output_dir: Local output directory containing results
summary_file: Path to the summary file
dataset_name: Name of the HuggingFace dataset to upload to
run_id: Unique run identifier (if None, will generate one)
token: HuggingFace token for authentication (if None, will use environment variables)
logger: Logger instance
Returns:
The run_id used for the upload, None if upload failed
"""
if logger is None:
logger = logging.getLogger(__name__)
import os
from huggingface_hub import HfApi
api = HfApi()
if run_id is None:
github_run_number = os.getenv("GITHUB_RUN_NUMBER")
github_run_id = os.getenv("GITHUB_RUN_ID")
if github_run_number and github_run_id:
run_id = f"{github_run_number}-{github_run_id}"
date_folder = datetime.now().strftime("%Y-%m-%d")
github_event_name = os.getenv("GITHUB_EVENT_NAME")
if github_event_name != "schedule":
# Non-scheduled runs go under a runs subfolder
repo_path = f"{date_folder}/runs/{run_id}/benchmark_results"
else:
# Scheduled runs go directly under the date
repo_path = f"{date_folder}/{run_id}/benchmark_results"
logger.info(f"Uploading benchmark results to dataset '{dataset_name}' at path '{repo_path}'")
try:
# Upload all files in the output directory
from pathlib import Path
output_path = Path(output_dir)
for file_path in output_path.rglob("*"):
if file_path.is_file():
# Calculate relative path from output_dir
relative_path = file_path.relative_to(output_path)
path_in_repo = f"{repo_path}/{relative_path}"
logger.debug(f"Uploading {file_path} to {path_in_repo}")
api.upload_file(
path_or_fileobj=str(file_path),
path_in_repo=path_in_repo,
repo_id=dataset_name,
repo_type="dataset",
token=token,
commit_message=f"Upload benchmark results for run {run_id}",
)
logger.info(
f"Successfully uploaded results to: https://huggingface.co/datasets/{dataset_name}/tree/main/{repo_path}"
)
return run_id
except Exception as upload_error:
logger.error(f"Failed to upload results: {upload_error}")
import traceback
logger.debug(traceback.format_exc())
return None
def main():
"""Main entry point for the benchmarking script."""
# Generate a unique UUID for this benchmark run
benchmark_run_uuid = str(uuid.uuid4())[:8]
parser = argparse.ArgumentParser(
description="Run all benchmarks in the ./benches directory",
epilog="""
Examples:
# Run all available benchmarks
python3 run_benchmarks.py
# Run with specific model and upload to HuggingFace Dataset
python3 run_benchmarks.py --model-id meta-llama/Llama-2-7b-hf --upload-to-hf username/benchmark-results
# Run with custom run ID and upload to HuggingFace Dataset
python3 run_benchmarks.py --run-id experiment_v1 --upload-to-hf org/benchmarks
# Run only specific benchmarks with file logging
python3 run_benchmarks.py --include llama --enable-file-logging
""", # noqa: W293
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--output-dir",
type=str,
default="benchmark_results",
help="Base output directory for benchmark results (default: benchmark_results)",
)
parser.add_argument(
"--benches-dir",
type=str,
default="./benches",
help="Directory containing benchmark implementations (default: ./benches)",
)
parser.add_argument(
"--log-level",
type=str,
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level (default: INFO)",
)
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup-iterations", type=int, default=3, help="Number of warmup iterations (default: 3)")
parser.add_argument(
"--measurement-iterations", type=int, default=5, help="Number of measurement iterations (default: 5)"
)
parser.add_argument(
"--num-tokens-to-generate",
type=int,
default=100,
help="Number of tokens to generate in benchmarks (default: 100)",
)
parser.add_argument("--include", type=str, nargs="*", help="Only run benchmarks matching these names")
parser.add_argument("--exclude", type=str, nargs="*", help="Exclude benchmarks matching these names")
parser.add_argument("--enable-file-logging", action="store_true", help="Enable file logging (disabled by default)")
parser.add_argument(
"--commit-id", type=str, help="Git commit ID for metadata (if not provided, will auto-detect from git)"
)
parser.add_argument(
"--push-to-hub",
type=str,
help="Upload results to HuggingFace Dataset (provide dataset name, e.g., 'username/benchmark-results')",
)
parser.add_argument(
"--run-id", type=str, help="Custom run ID for organizing results (if not provided, will generate a unique ID)"
)
parser.add_argument(
"--token",
type=str,
help="HuggingFace token for dataset uploads (if not provided, will use HF_TOKEN environment variable)",
)
args = parser.parse_args()
# Setup logging
logger = setup_logging(args.log_level, args.enable_file_logging)
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Benches directory: {args.benches_dir}")
# Create output directory
os.makedirs(args.output_dir, exist_ok=True)
try:
# Discover benchmarks
benchmarks = discover_benchmarks(args.benches_dir)
logger.info(f"Discovered {len(benchmarks)} benchmark(s): {[b['name'] for b in benchmarks]}")
if not benchmarks:
logger.warning("No benchmarks found!")
return 1
# Filter benchmarks based on include/exclude
filtered_benchmarks = benchmarks
if args.include:
filtered_benchmarks = [
b for b in filtered_benchmarks if any(pattern in b["name"] for pattern in args.include)
]
logger.info(f"Filtered to include: {[b['name'] for b in filtered_benchmarks]}")
if args.exclude:
filtered_benchmarks = [
b for b in filtered_benchmarks if not any(pattern in b["name"] for pattern in args.exclude)
]
logger.info(f"After exclusion: {[b['name'] for b in filtered_benchmarks]}")
if not filtered_benchmarks:
logger.warning("No benchmarks remaining after filtering!")
return 1
# Prepare common kwargs for benchmarks
benchmark_kwargs = {
"warmup_iterations": args.warmup_iterations,
"measurement_iterations": args.measurement_iterations,
"num_tokens_to_generate": args.num_tokens_to_generate,
}
if args.model_id:
benchmark_kwargs["model_id"] = args.model_id
# Add commit_id if provided
if args.commit_id:
benchmark_kwargs["commit_id"] = args.commit_id
# Run benchmarks
benchmark_results = {}
successful_count = 0
for benchmark_info in filtered_benchmarks:
result = run_single_benchmark(benchmark_info, args.output_dir, logger, **benchmark_kwargs)
benchmark_results[benchmark_info["name"]] = result
if result is not None:
successful_count += 1
# Generate summary report
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger, benchmark_run_uuid)
# Upload results to HuggingFace Dataset if requested
upload_run_id = None
if args.push_to_hub:
logger.info("=" * 60)
logger.info("UPLOADING TO HUGGINGFACE DATASET")
logger.info("=" * 60)
# Use provided run_id or fallback to benchmark run UUID
effective_run_id = args.run_id or benchmark_run_uuid
upload_run_id = upload_results_to_hf_dataset(
output_dir=args.output_dir,
summary_file=summary_file,
dataset_name=args.push_to_hub,
run_id=effective_run_id,
token=args.token,
logger=logger,
)
if upload_run_id:
logger.info(f"Upload completed with run ID: {upload_run_id}")
else:
logger.warning("Upload failed - continuing with local results")
# Final summary
total_benchmarks = len(filtered_benchmarks)
failed_count = total_benchmarks - successful_count
logger.info("=" * 60)
logger.info("BENCHMARK RUN SUMMARY")
logger.info("=" * 60)
logger.info(f"Total benchmarks: {total_benchmarks}")
logger.info(f"Successful: {successful_count}")
logger.info(f"Failed: {failed_count}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Summary report: {summary_file}")
if args.push_to_hub:
if upload_run_id:
logger.info(f"HuggingFace Dataset: {args.push_to_hub}")
logger.info(f"Run ID: {upload_run_id}")
logger.info(
f"View results: https://huggingface.co/datasets/{args.push_to_hub}/tree/main/{datetime.now().strftime('%Y-%m-%d')}/runs/{upload_run_id}"
)
else:
logger.warning("Upload to HuggingFace Dataset failed")
if failed_count > 0:
logger.warning(f"{failed_count} benchmark(s) failed. Check logs for details.")
return 1
else:
logger.info("All benchmarks completed successfully!")
return 0
except Exception as e:
logger.error(f"Benchmark run failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -16,6 +16,7 @@
# by pytest before any tests are run
import doctest
import os
import sys
import warnings
from os.path import abspath, dirname, join
@ -23,12 +24,18 @@ from os.path import abspath, dirname, join
import _pytest
import pytest
from transformers.testing_utils import HfDoctestModule, HfDocTestParser
from transformers.testing_utils import (
HfDoctestModule,
HfDocTestParser,
is_torch_available,
patch_testing_methods_to_collect_info,
patch_torch_compile_force_graph,
)
NOT_DEVICE_TESTS = {
"test_tokenization",
"test_processor",
"test_tokenization_mistral_common",
"test_processing",
"test_beam_constraints",
"test_configuration_utils",
@ -60,8 +67,6 @@ NOT_DEVICE_TESTS = {
"test_mismatched_shapes_have_properly_initialized_weights",
"test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist",
"test_model_is_small",
"test_tf_from_pt_safetensors",
"test_flax_from_pt_safetensors",
"ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device
"ModelTester::test_pipeline_",
"/repo_utils/",
@ -83,6 +88,8 @@ def pytest_configure(config):
config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment")
config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate")
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
def pytest_collection_modifyitems(items):
@ -127,3 +134,18 @@ class CustomOutputChecker(OutputChecker):
doctest.OutputChecker = CustomOutputChecker
_pytest.doctest.DoctestModule = HfDoctestModule
doctest.DocTestParser = HfDocTestParser
if is_torch_available():
import torch
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
# We set it to `False` for CI. See https://github.com/pytorch/pytorch/issues/157274#issuecomment-3090791615
torch.backends.cudnn.allow_tf32 = False
# patch `torch.compile`: if `TORCH_COMPILE_FORCE_FULLGRAPH=1` (or values considered as true, e.g. yes, y, etc.),
# the patched version will always run with `fullgraph=True`.
patch_torch_compile_force_graph()
if os.environ.get("PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS", "").lower() in ("yes", "true", "on", "y", "1"):
patch_testing_methods_to_collect_info()

View File

@ -1,15 +1,13 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
USER root
ARG REF=main
RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN git lfs install
RUN uv pip uninstall transformers

View File

@ -1,10 +1,10 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz
RUN tar xvf jumanpp-2.0.0-rc3.tar.xz
@ -15,12 +15,20 @@ RUN mv catch.hpp ../libs/
RUN cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
RUN make install -j 10
WORKDIR /
RUN uv pip install --no-cache --upgrade 'torch==2.6.0' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache --upgrade 'torch' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,spacy,ftfy,rjieba]" unidic unidic-lite
# spacy is not used so not tested. Causes to failures. TODO fix later
RUN python3 -m unidic download
RUN uv run python -m unidic download
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,13 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git
RUN apt-get install -y g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv
RUN uv pip install --no-cache-dir -U pip setuptools albumentations seqeval
RUN uv pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,12 +1,19 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,17 +1,24 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1-mesa-glx libgl1 g++ tesseract-ocr
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1 g++ tesseract-ocr git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps timm accelerate
RUN pip install -U --upgrade-strategy eager --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
RUN uv pip install -U --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
# RUN uv pip install --no-cache-dir natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[testing, vision]" 'scikit-learn' 'torch-stft' 'nose' 'dataset'
# RUN git clone https://github.com/facebookresearch/detectron2.git
# RUN python3 -m pip install --no-cache-dir -e detectron2
RUN uv pip install 'git+https://github.com/facebookresearch/detectron2.git@92ae9f0b92aba5867824b4f12aa06a22a60a45d3' --no-build-isolation
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,10 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,10 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake g++
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3" tensorflow_probability
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,11 +1,18 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir --upgrade 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]"
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

View File

@ -1,9 +1,9 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y time git
RUN apt-get update && apt-get install -y time git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv
RUN pip install uv
RUN uv pip install --no-cache-dir -U pip setuptools GitPython "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ruff]" urllib3
RUN apt-get install -y jq curl && apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,12 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ pkg-config openssh-client git
RUN apt-get install -y cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,16 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-deps accelerate
RUN uv pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,audio,sklearn,sentencepiece,vision,testing]"
# RUN pip install --no-cache-dir "scipy<1.13" "transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,11 +1,17 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir --upgrade 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken,num2words,video]"
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

View File

@ -1,19 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
RUN echo ${REF}
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch==2.6.0' 'torchaudio==2.6.0' 'torchvision==0.21.0' --index-url https://download.pytorch.org/whl/cpu
RUN git lfs install
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,sentencepiece,vision,testing]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3" librosa
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,4 +1,4 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -9,11 +9,9 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.6.0'
# (not always a valid torch version)
ARG INTEL_TORCH_EXT='2.3.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu121'
ARG CUDA='cu126'
# Disable kernel mapping for now until all tests pass
ENV DISABLE_KERNEL_MAPPING=1
@ -28,13 +26,14 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -U tensorflow==2.13 protobuf==3.20.3 "tensorflow_text<2.16" "tensorflow_probability<0.22" && python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir -U timm
RUN python3 -m pip install --no-cache-dir intel_extension_for_pytorch==$INTEL_TORCH_EXT -f https://developer.intel.com/ipex-whl-stable-cpu
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN python3 -m pip install --no-cache-dir pytesseract
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -43,9 +42,11 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/pef
# For bettertransformer
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
# For kernels
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
# For video model testing
RUN python3 -m pip install --no-cache-dir av==9.2.0
RUN python3 -m pip install --no-cache-dir av
# Some slow tests require bnb
RUN python3 -m pip install --no-cache-dir bitsandbytes
@ -53,15 +54,14 @@ RUN python3 -m pip install --no-cache-dir bitsandbytes
# Some tests require quanto
RUN python3 -m pip install --no-cache-dir quanto
# After using A10 as CI runner, let's run FA2 tests
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
# TODO (ydshieh): check this again
# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests
# (`deformable_detr`, `rwkv`, `mra`)
RUN python3 -m pip uninstall -y ninja
# For `dinat` model
# The `XXX` part in `torchXXX` needs to match `PYTORCH` (to some extent)
# pin `0.17.4` otherwise `cannot import name 'natten2dav' from 'natten.functional'`
RUN python3 -m pip install --no-cache-dir natten==0.17.4+torch250cu121 -f https://shi-labs.com/natten/wheels
# For `nougat` tokenizer
RUN python3 -m pip install --no-cache-dir python-Levenshtein
@ -71,6 +71,9 @@ RUN python3 -m pip install --no-cache-dir g2p-en
# For Some bitsandbytes tests
RUN python3 -m pip install --no-cache-dir einops
# For Some tests with `@require_liger_kernel`
RUN python3 -m pip install --no-cache-dir liger-kernel
# `kernels` may give different outputs (within 1e-5 range) even with the same model (weights) and the same inputs
RUN python3 -m pip uninstall -y kernels

View File

@ -15,8 +15,8 @@ RUN apt update && \
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \

View File

@ -0,0 +1,71 @@
FROM intel/deep-learning-essentials:2025.1.3-0-devel-ubuntu24.04 AS base
LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VERSION=3.12
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update
RUN apt-get update && \
apt-get -y install \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
vim \
numactl \
gnupg2 \
gpg-agent \
python3-dev \
python3-opencv \
unzip \
ffmpeg \
tesseract-ocr \
espeak-ng \
wget \
ncurses-term \
google-perftools \
libjemalloc-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Use virtual env because Ubuntu:24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VERSION} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install --upgrade pip wheel
RUN pip install torch torchvision torchaudio torchcodec --index-url https://download.pytorch.org/whl/cpu --no-cache-dir
RUN pip install av pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sentence_transformers sacremoses nltk rouge_score librosa soundfile mpi4py pytorch_msssim
RUN pip install onnx optimum onnxruntime
RUN pip install autoawq
RUN pip install gptqmodel --no-build-isolation
RUN pip install -U datasets timm transformers accelerate peft diffusers opencv-python kenlm evaluate
RUN pip install -U intel-openmp
# install bitsandbytes
RUN git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ && \
cmake -DCOMPUTE_BACKEND=cpu -S . && make && pip install . && cd ../
# CPU don't need triton
RUN pip uninstall triton -y
ENV LD_PRELOAD=${LD_PRELOAD}:/opt/venv/lib/libiomp5.so:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
ENV KMP_AFFINITY=granularity=fine,compact,1,0
RUN touch /entrypoint.sh
RUN chmod +x /entrypoint.sh
RUN echo "#!/bin/bash" >> /entrypoint.sh
RUN echo "/bin/bash" >> /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -1,59 +0,0 @@
ARG BASE_DOCKER_IMAGE
FROM $BASE_DOCKER_IMAGE
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs libaio-dev
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
ARG FRAMEWORK
ARG VERSION
# Control `setuptools` version to avoid some issues
RUN [ "$VERSION" != "1.10" ] && python3 -m pip install -U setuptools || python3 -m pip install -U "setuptools<=59.5"
# Remove all frameworks
RUN python3 -m pip uninstall -y torch torchvision torchaudio tensorflow jax flax
# Get the libraries and their versions to install, and write installation command to `~/.profile`.
RUN python3 ./transformers/utils/past_ci_versions.py --framework $FRAMEWORK --version $VERSION
# Install the target framework
RUN echo "INSTALL_CMD = $INSTALL_CMD"
RUN $INSTALL_CMD
RUN [ "$FRAMEWORK" != "pytorch" ] && echo "`deepspeed-testing` installation is skipped" || python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Remove `accelerate`: it requires `torch`, and this causes import issues for TF-only testing
# We will install `accelerate@main` in Past CI workflow file
RUN python3 -m pip uninstall -y accelerate
# Uninstall `torch-tensorrt` and `apex` shipped with the base image
RUN python3 -m pip uninstall -y torch-tensorrt apex
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -1,4 +1,4 @@
FROM rocm/dev-ubuntu-22.04:6.2.4
FROM rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -11,9 +11,6 @@ RUN apt update && \
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy
RUN python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4
RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
ARG REF=main
@ -23,9 +20,8 @@ WORKDIR /
ADD https://api.github.com/repos/huggingface/transformers/git/refs/heads/main version.json
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video]
RUN python3 -m pip uninstall -y tensorflow flax
# Install transformers
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video,audio]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
@ -33,3 +29,9 @@ RUN cd transformers && python3 setup.py develop
# Remove nvml and nvidia-ml-py as it is not compatible with ROCm. apex is not tested on NVIDIA either.
RUN python3 -m pip uninstall py3nvml pynvml nvidia-ml-py apex -y
# `kernels` may causes many failing tests
RUN python3 -m pip uninstall -y kernels
# On ROCm, torchcodec is required to decode audio files and 0.4 or 0.6 fails
RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"

View File

@ -48,3 +48,6 @@ RUN python3 -c "from deepspeed.launcher.runner import main"
# Remove nvml as it is not compatible with ROCm
RUN python3 -m pip uninstall py3nvml pynvml nvidia-ml-py apex -y
# `kernels` may causes many failing tests
RUN python3 -m pip uninstall -y kernels

View File

@ -4,7 +4,7 @@ LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
ARG PYTORCH='2.6.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
@ -21,7 +21,7 @@ RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'p
# Install latest release PyTorch
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate

View File

@ -19,7 +19,7 @@ RUN python3 -m pip uninstall -y torch torchvision torchaudio
# Install **nightly** release PyTorch (flag `--pre`)
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
# `datasets` requires pandas, pandas has some modules compiled with numpy=1.x causing errors
RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'pandas<2' 'numpy<2'

View File

@ -1,4 +1,4 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -11,19 +11,19 @@ ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
# If set to nothing, will install the latest version
ARG PYTORCH='2.6.0'
ARG PYTORCH='2.8.0'
ARG TORCH_VISION=''
ARG TORCH_AUDIO=''
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu121'
RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
ARG CUDA='cu126'
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video]
RUN python3 -m pip uninstall -y tensorflow flax
# Install torch stuff after ./transformers[dev-torch,testing,video], otherwise torch may be resolved to a previous
# version.
RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -0,0 +1,93 @@
FROM intel/deep-learning-essentials:2025.1.3-0-devel-ubuntu22.04 AS base
LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VER=3.11
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get remove -y python3.10 && apt-get autoremove -y
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update && \
apt-get install -y python$PYTHON_VER python$PYTHON_VER-dev python3-pip && \
ln -sf /usr/bin/python$PYTHON_VER /usr/bin/python3 && \
ln -sf /usr/bin/python3 /usr/bin/python && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get -y install \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
vim \
numactl \
gnupg2 \
gpg-agent \
zlib1g-dev \
rsync \
sudo \
libnl-genl-3-200 \
xpu-smi \
unzip \
ffmpeg \
tesseract-ocr \
espeak-ng \
wget \
ncurses-term && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install -y \
linux-headers-$(uname -r) \
linux-modules-extra-$(uname -r) \
flex bison \
intel-fw-gpu intel-i915-dkms xpu-smi \
intel-opencl-icd libze-intel-gpu1 libze1 \
intel-media-va-driver-non-free libmfx-gen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo intel-ocloc \
libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev libze-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install triton==3.3.0
RUN pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install evaluate torchdata pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree
RUN pip install gguf hqq compressed_tensors gptqmodel mergekit autoawq deepspeed torchao onnx
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft
RUN pip install git+https://github.com/linkedin/Liger-Kernel.git --extra-index-url https://download.pytorch.org/whl/test/xpu
# install bitsandbytes
RUN pip install git+https://github.com/bitsandbytes-foundation/bitsandbytes.git
ENV OCL_ICD_VENDORS=/etc/OpenCL/vendors
ENV FI_PROVIDER_PATH=${I_MPI_ROOT}/lib/libfabric/prov:/usr/lib/x86_64-linux-gnu/libfabric
ENV CCL_ROOT=/usr/local
ENV CCL_ATL_TRANSPORT=ofi
ENV I_MPI_ROOT=/usr/local
ENV CLASSPATH=${I_MPI_ROOT}/lib/mpi.jar
ENV PATH=${I_MPI_ROOT}/bin/libfabric:${PATH}
ENV LD_LIBRARY_PATH=${I_MPI_ROOT}/lib/libfabric:${LD_LIBRARY_PATH}
RUN touch /entrypoint.sh
RUN chmod +x /entrypoint.sh
RUN echo "#!/bin/bash" >> /entrypoint.sh
RUN echo "source /opt/intel/oneapi/setvars.sh --force && /bin/bash" >> /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -1,4 +1,4 @@
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -9,9 +9,9 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.6.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu121'
ARG CUDA='cu126'
# Disable kernel mapping for quantization tests
ENV DISABLE_KERNEL_MAPPING=1
@ -26,7 +26,7 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN echo torch=$VERSION
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build.
# Currently, let's just use their latest releases (when `torch` is installed with a release version)
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -46,16 +46,6 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/opt
# Add PEFT
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/peft@main#egg=peft
# Add aqlm for quantization testing
RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# Add vptq for quantization testing
RUN pip install vptq
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# Add hqq for quantization testing
RUN python3 -m pip install --no-cache-dir hqq
@ -63,21 +53,11 @@ RUN python3 -m pip install --no-cache-dir hqq
RUN python3 -m pip install --no-cache-dir gguf
# Add autoawq for quantization testing
# New release v0.2.8
RUN python3 -m pip install --no-cache-dir autoawq[kernels]
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir optimum-quanto
# Add eetq for quantization testing
RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add compressed-tensors for quantization testing
RUN python3 -m pip install --no-cache-dir compressed-tensors
@ -85,7 +65,10 @@ RUN python3 -m pip install --no-cache-dir compressed-tensors
RUN python3 -m pip install --no-cache-dir amd-quark
# Add AutoRound for quantization testing
RUN python3 -m pip install --no-cache-dir "auto-round>=0.5.0"
RUN python3 -m pip install --no-cache-dir auto-round
# Add torchao for quantization testing
RUN python3 -m pip install --no-cache-dir torchao
# Add transformers in editable mode
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch]
@ -93,6 +76,34 @@ RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch]
# `kernels` may give different outputs (within 1e-5 range) even with the same model (weights) and the same inputs
RUN python3 -m pip uninstall -y kernels
# Uninstall flash-attn installed by autoawq, it causes issues here : https://github.com/huggingface/transformers/actions/runs/15915442841/job/44892146131
RUN python3 -m pip uninstall -y flash-attn
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
# Low usage or incompatible lib, will enable later on
# # Add aqlm for quantization testing
# RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# # Add vptq for quantization testing
# RUN pip install vptq
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# # Add eetq for quantization testing
# RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add fp-quant for quantization testing
# Requires py3.11 but our CI runs on 3.9
# RUN python3 -m pip install --no-cache-dir "fp-quant>=0.1.6"

View File

@ -1,25 +0,0 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-tensorflow,testing]
# If set to nothing, will install the latest version
ARG TENSORFLOW='2.13'
RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSION='tensorflow'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y torch flax
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir -U "tensorflow_probability<0.22"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -20,22 +20,21 @@ To generate the documentation, you first have to build it. Several packages are
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
Then you need to install our special tool that builds the documentation:
```bash
pip install git+https://github.com/huggingface/doc-builder
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look before committing for instance). You don't have to commit the built documentation.
---
> [!NOTE]
> You only need to generate the documentation to inspect it locally (if you're planning changes and want to
> check how they look before committing for instance). You don't have to commit the built documentation.
## Building the documentation
@ -72,12 +71,8 @@ doc-builder preview transformers docs/source/en/
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
> [!NOTE]
> The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
## Adding a new element to the navigation bar
@ -164,6 +159,9 @@ These classes should be added using our Markdown syntax. Usually as follows:
[[autodoc]] XXXConfig
```
> [!IMPORTANT]
> Always add a blank line after `[[autodoc]]` to ensure it passes the CI/CD checks.
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
@ -278,7 +276,7 @@ Here's an example of a single value return:
```python
Returns:
`List[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
`list[int]`: A list of integers in the range [0, 1] --- 1 for a special token, 0 for a sequence token.
```
Here's an example of a tuple return, comprising several objects:

View File

@ -123,8 +123,6 @@
title: تشغيل التدريب على Amazon SageMaker
- local: serialization
title: التصدير إلى ONNX
- local: tflite
title: التصدير إلى TFLite
- local: torchscript
title: التصدير إلى TorchScript
- local: notebooks
@ -184,8 +182,6 @@
# title: التدريب الفعال على وحدة المعالجة المركزية (CPU)
# - local: perf_train_cpu_many
# title: التدريب الموزع لوحدة المعالجة المركزية (CPU)
# - local: perf_train_tpu_tf
# title: التدريب على (TPU) باستخدام TensorFlow
# - local: perf_train_special
# title: تدريب PyTorch على Apple silicon
# - local: perf_hardware
@ -203,8 +199,6 @@
# title: إنشاء نموذج كبير
# - local: debugging
# title: تصحيح الأخطاء البرمجية
# - local: tf_xla
# title: تكامل XLA لنماذج TensorFlow
# - local: perf_torch_compile
# title: تحسين الاستدلال باستخدام `torch.compile()`
# title: الأداء وقابلية التوسع
@ -260,8 +254,6 @@
# title: التكوين
# - local: main_classes/data_collator
# title: مجمع البيانات
# - local: main_classes/keras_callbacks
# title: استدعاءات Keras
# - local: main_classes/logging
# title: التسجيل
# - local: main_classes/model

View File

@ -115,8 +115,6 @@
## النموذج التلقائي (AutoModel)
<frameworkcontent>
<pt>
تسمح لك فئات `AutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -143,25 +141,4 @@
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</pt>
<tf>
أخيرًا، تسمح لك فئات `TFAutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام نفس نقطة التفتيش لتحميل بنية لمهمة مختلفة:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `TFAutoModelFor` لتحميل نسخ لنماذج مُدربة مسبقًا. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، ستتعرف على كيفية استخدام المُجزّئ اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</tf>
</frameworkcontent>

View File

@ -3,16 +3,16 @@
يُشهد في الآونة الأخيرة نمو مجال دراسي يُعنى باستكشاف آلية عمل نماذج المحولات الضخمة مثل BERT (والذي يُطلق عليها البعض اسم "BERTology"). ومن الأمثلة البارزة على هذا المجال ما يلي:
- BERT Rediscovers the Classical NLP Pipeline بواسطة Ian Tenney و Dipanjan Das و Ellie Pavlick:
https://arxiv.org/abs/1905.05950
- Are Sixteen Heads Really Better than One? بواسطة Paul Michel و Omer Levy و Graham Neubig: https://arxiv.org/abs/1905.10650
https://huggingface.co/papers/1905.05950
- Are Sixteen Heads Really Better than One? بواسطة Paul Michel و Omer Levy و Graham Neubig: https://huggingface.co/papers/1905.10650
- What Does BERT Look At? An Analysis of BERT's Attention بواسطة Kevin Clark و Urvashi Khandelwal و Omer Levy و Christopher D.
Manning: https://arxiv.org/abs/1906.04341
- CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure: https://arxiv.org/abs/2210.04633
Manning: https://huggingface.co/papers/1906.04341
- CAT-probing: A Metric-based Approach to Interpret How Pre-trained Models for Programming Language Attend Code Structure: https://huggingface.co/papers/2210.04633
لإثراء هذا المجال الناشئ، قمنا بتضمين بعض الميزات الإضافية في نماذج BERT/GPT/GPT-2 للسماح للناس بالوصول إلى التمثيلات الداخلية، والتي تم تكييفها بشكل أساسي من العمل الرائد لـ Paul Michel (https://arxiv.org/abs/1905.10650):
لإثراء هذا المجال الناشئ، قمنا بتضمين بعض الميزات الإضافية في نماذج BERT/GPT/GPT-2 للسماح للناس بالوصول إلى التمثيلات الداخلية، والتي تم تكييفها بشكل أساسي من العمل الرائد لـ Paul Michel (https://huggingface.co/papers/1905.10650):
- الوصول إلى جميع الحالات المخفية في BERT/GPT/GPT-2،
- الوصول إلى جميع أوزان الانتباه لكل رأس في BERT/GPT/GPT-2،
- استرجاع قيم ومشتقات مخرجات الرأس لحساب درجة أهمية الرأس وحذفه كما هو موضح في https://arxiv.org/abs/1905.10650.
- استرجاع قيم ومشتقات مخرجات الرأس لحساب درجة أهمية الرأس وحذفه كما هو موضح في https://huggingface.co/papers/1905.10650.
ولمساعدتك على فهم واستخدام هذه الميزات بسهولة، أضفنا مثالًا برمجيًا محددًا: [bertology.py](https://github.com/huggingface/transformers-research-projects/tree/main/bertology/run_bertology.py) أثناء استخراج المعلومات وتقليص من نموذج تم تدريبه مسبقًا على GLUE.

View File

@ -304,7 +304,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(checkpoint, dtype=torch.bfloat16, device_map="auto")
```python
messages = [

View File

@ -25,7 +25,7 @@ chat = [
import torch
from transformers import pipeline
pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipe(chat, max_new_tokens=512)
print(response[0]['generated_text'][-1]['content'])
```
@ -126,7 +126,7 @@ chat = [
]
# 1: تحميل النموذج والمحلل
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", torch_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
# 2: تطبيق قالب الدردشة
@ -164,7 +164,7 @@ print("Decoded output:\n", decoded_output)
### اعتبارات الذاكرة
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "torch_dtype" كما فعلنا أعلاه.
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "dtype" كما فعلنا أعلاه.
ومن الممكن أيضًا النزول إلى أقل من 16 بت باستخدام "التكميم"، وهي طريقة لضغط أوزان النموذج بطريقة تفقد بعض المعلومات. يسمح هذا بضغط كل معلمة إلى 8 بتات أو 4 بتات أو حتى أقل. لاحظ أنه، خاصة في 4 بتات، قد تتأثر جودة ناتج النموذج سلبًا، ولكن غالبًا ما يكون هذا مقايضة تستحق القيام بها لتناسب نموذج محادثة أكبر وأكثر قدرة في الذاكرة. دعنا كيف يمكننا تطبيق ذلك باستخدام مكتبة `bitsandbytes`:

View File

@ -81,8 +81,6 @@ DistilBertConfig {
الخطوة التالية هي إنشاء [نموذج](main_classes/models). النموذج - ويُشار إليه أحيانًا باسم البنية - يُحدد وظيفة كل طبقة والعمليات الحسابية المُنفذة. تُستخدم خصائص مثل `num_hidden_layers` من التكوين لتحديد هذه البنية. تشترك جميع النماذج في فئة أساسية واحدة هي [`PreTrainedModel`] وبعض الوظائف المُشتركة مثل غيير حجم مُدخلات الكلمات وتقليص رؤوس آلية الانتباه الذاتي. بالإضافة إلى ذلك، فإن جميع النماذج هي فئات فرعية إما من [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)، [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) أو [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . هذا يعني النماذج متوافقة مع كل استخدام لإطار العمل الخاص بها.
<frameworkcontent>
<pt>
قم بتحميل خصائص التكوين المخصصة الخاصة بك في النموذج:
```py
@ -105,39 +103,11 @@ DistilBertConfig {
```py
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</pt>
<tf>
قم بتحميل خصائص التكوين المُخصصة الخاصة بك في النموذج:
```py
>>> from transformers import TFDistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> tf_model = TFDistilBertModel(my_config)
```
هذا ينشئ نموذجًا بقيم عشوائية بدلاً من الأوزان المُدربة مسبقًا. لن يكون هذا النموذج مفيدًا حتى يتم تدريبه. تُعد عملية التدريب مكلفة وتستغرق وقتًا طويلاً. من الأفضل بشكل عام استخدام نموذج مُدرب مسبقًا للحصول على نتائج أفضل بشكل أسرع، مع استخدام جزء بسيط فقط من الموارد المطلوبة للتدريب.
قم بإنشاء نموذج مُدرب مسبقًا باستخدام [`~TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
عندما تقوم بتحميل الأوزان المُدربة مسبقًا،يتم تحميل إعدادات النموذج الافتراضي تلقائيًا إذا كان النموذج من مكتبة 🤗 Transformers. ومع ذلك، يمكنك أيضًا استبدال - بعض أو كل - إعدادات النموذج الافتراضية بإعداداتك الخاصة:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</tf>
</frameworkcontent>
### رؤوس النموذج
في هذه المرحلة، لديك نموذج DistilBERT الأساسي الذي يخرج *حالات الكامنة*. تُمرَّر هذه الحالات الكامنة كمدخلات لرأس النموذج لإنتاج المخرجات النهائية. توفر مكتبة 🤗 Transformers رأس نموذج مختلف لكل مهمة طالما أن النموذج يدعم المهمة (أي لا يمكنك استخدام DistilBERT لمهمة تسلسل إلى تسلسل مثل الترجمة).
<frameworkcontent>
<pt>
على سبيل المثال، [`DistilBertForSequenceClassification`] هو نموذج DistilBERT الأساس مزودًا برأس تصنيف تسلسلي. يُشكّل رأس التصنيف التسلسلي طبقة خطية فوق المخرجات المجمعة.
```py
@ -153,25 +123,6 @@ DistilBertConfig {
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</pt>
<tf>
على سبيل المثال، [`TFDistilBertForSequenceClassification`] هو نموذج DistilBERT الأساسي برأس تصنيف تسلسل. رأس التصنيف التسلسلي هو طبقة خطية أعلى المخرجات المجمعة.
```py
>>> from transformers import TFDistilBertForSequenceClassification
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام هذا نقطة التحقق لمهمة أخرى عن طريق التبديل إلى رأس نموذج مختلف. لمهمة الإجابة على الأسئلة، ستستخدم رأس النموذج [`TFDistilBertForQuestionAnswering`]. رأس الإجابة على الأسئلة مشابه لرأس التصنيف التسلسلي باستثناء أنه طبقة خطية أعلى حالات الإخراج المخفية.
```py
>>> from transformers import TFDistilBertForQuestionAnswering
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</tf>
</frameworkcontent>
## مجزئ النصوص

Some files were not shown because too many files have changed in this diff Show More