Compare commits

..

1068 Commits

Author SHA1 Message Date
9e14f68097 fix remaining 2025-10-01 10:47:55 +02:00
46b90ac28e [docs] update tips syntax 2025-10-01 10:46:35 +02:00
f22cb1e868 fix qwen text config (#41158)
* fix qwen text config

* fix tests

* fix one more test

* address comments
2025-09-30 17:23:44 +00:00
374ded5ea4 Fix white space in documentation (#41157)
* Fix white space

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix autodoc

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 09:41:03 -07:00
16a141765c [docs] Fix tp_plan (#41205)
remove manual
2025-09-30 09:27:50 -07:00
5d1e853032 [Trainer] deprecate num_train_tokens (#41165)
* dep

* fix

* fix
2025-09-30 15:53:16 +00:00
cecd92849e [v5] Remove train kwargs (#41127)
* rm train kwargs

* fix
2025-09-30 17:43:25 +02:00
103fa6d235 [v5] Remove deprecated prediction loop (#41123)
* rem deprecated

* more

* rm all instances of legacy arg
2025-09-30 17:43:01 +02:00
aa3e8798ba [v5] Remove tokenizer from Trainer (#41128)
* tokenizer deprecated

* style

* forgot this

* style
2025-09-30 17:42:10 +02:00
e99dee6470 Remove old sagemaker api support (#41161)
* fix

* fix
2025-09-30 17:41:52 +02:00
dded9fd112 [v5] More Training Args cleaning (#41131)
clean
2025-09-30 17:38:07 +02:00
6fb6117abe Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)
* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"

This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.

* fix
2025-09-30 17:37:42 +02:00
5bdb70450d Fix sliding window attn mask (#41228)
* Fix sliding window attn mask

* Clearer test

* Apply style fixes

* If Picasso made ascii drawings he would have made this

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 17:22:53 +02:00
a61fc6a0b9 Fix typing of train_args (#41142)
* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix fsdp typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:28:02 +00:00
919a4845fb Unify is_torchvision_v2_available with is_torchvision_available (#41227)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 15:21:49 +01:00
8e7b0655f1 update code owners (#41221)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-30 16:21:19 +02:00
2dd175e6bb Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)
Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.

Co-authored-by: frozenleaves <frozen@Mac.local>
2025-09-30 14:19:57 +00:00
cf0887f62c Remove old Python code (#41226)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:15:59 +00:00
52f5eca7c9 🚨 [v5] Remove headmasking (#41076)
* first attempt at removing

* copies

* last bits in core

* quick fixes

* tests purge

* docs and examples

* some fixes

* more

* another round of cleanups

* fix

* fix a bunch of models

* fix dummy bert

* fix

* fix new model

* fix signature change

* fix

* fix style/copies

* new models

* fix copies didnt find that damn

* test

* this shouldnt have happened during model addition
2025-09-30 16:04:57 +02:00
a80f05dfcb [generate] cache missing custom generate file (#41216)
* cache missing custom generate file

* make fixup
2025-09-30 13:32:24 +00:00
1f1e93e095 Align pull request template to bug report template (#41220)
The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.
2025-09-30 14:25:41 +02:00
2a596f5b2f [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)
add accepts_loss_kwargs=False to EsmPreTrainedModel

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 12:06:47 +00:00
3edd8048b0 Trainer: Pass num_items_in_batch to compute_loss in prediction_step (#41183)
* Add num_items_in_batch computation to predict_step.

* address comments.

* Fix test cases.

* fixup

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 09:45:17 +00:00
59035fd0e1 Avoid assumption that model has config attribute in deepspeed (#41207)
Avoid assumption that model has config in deepspeed
2025-09-30 11:42:50 +02:00
d97397787e Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)
* Update trainer.py

* fix

* fix format

* move barrier, delete redundant
2025-09-30 11:41:03 +02:00
06c04e0851 Deprecate half_precision_backend (#41134)
* deprecate

* remove

* rm apex

* fix

* fix

* fix doc
2025-09-30 11:36:44 +02:00
0e5a975608 Fix Qwen3-Omni audio_token_id serialization issue (#41192)
Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map

- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles

Fixes #41191
2025-09-30 11:15:56 +02:00
42c682514b docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)
* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)

* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes

* make style

* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes

* docs/examples(speech): address review  remove Hub subsection & Whisper tip; align dataset help text

* style: apply ruff/black/usort/codespell on examples/speech-recognition

* Apply style fixes

* Update examples/pytorch/speech-recognition/README.md

* update doc to match load_dataset

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 08:38:31 +00:00
aaf1269d83 Remove unnecessary Optional typing (#41198)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 08:38:05 +00:00
4a02bc7004 [docs] Fix links (#41110)
fix
2025-09-30 08:53:07 +02:00
def4a37e19 Embed interactive timeline in docs (#41015)
* embed timeline in docs (test web componentand Iframe)

* test scaling

* test multiple scales

* compensate scale in width

* set correct syle and scale

* remove bottom space created by scale

* add timeline as a separate page

* reformulate docs after review
2025-09-30 01:36:08 +00:00
3e975acc8b Fix docker quantization (#41201)
* launch docker

* remove gptq for now

* run tests

* Revert "run tests"

This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.

* revert
2025-09-29 16:36:30 +00:00
8635d8e796 Fix 8bit bnb loading (#41200)
* Fix 8bit

* oups forgot the case where it is not prequantized
2025-09-29 18:34:46 +02:00
1f0e9a4778 Fix EXAONE-4.0 dummy id (#41089)
* Fix EXAONE-4.0 dummy id

* Fix exaone4 dummy (#1)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-29 16:30:55 +00:00
bd37c45354 Add EdgeTAM (#39800)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* add rough necessarry changes

* first working edgetam

* fix issue with object pointers

* Use modular as much as possible

* nit fixes + optimization

* refactor spatial perceiver

* cleanup after merge

* add working edgetam

* improve perceiver resampler code

* simplify/unify rope attention logic

* Improve comments in apply_rotary_pos_emb_2d

* add working tests

* fix test timmwrapper

* add docs

* make fixup

* nits

* fix modular

* fix modular

* PR review part 1

* split apply_rotary_pos_emb_2d

* add granularity to _prepare_memory_conditioned_features

* add dates to doc

* add separate mlp for memory attention

* Fix memory on wrong device

* store processed frames in dict

* update checkpoints in tests

* update dates

---------

Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-29 11:54:54 -04:00
c1db38686a [Kernels Attention] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)
* fix

* be more strict

* change logic to include fa3

* fix the case where nothing is requested

* modify old tests + add kernels related tests

* style
2025-09-29 17:10:59 +02:00
5426edecab Make quantizers good citizens loading-wise (#41138)
* fix param_needs_quantization

* rewrite most hqq

* clean

* fix

* comment

* remove it from exception of safetensors

* start on bnb 4bits

* post-rebase fix

* make bnb4 bit a good citizen

* remove forgotten print

* make bnb 8bits a good citizen

* better hqq

* fix

* clean

* remove state dict from signature

* switch method

* make torchao a good citizen

* fixes

* fix torchao

* add check

* typo
2025-09-29 17:04:45 +02:00
399c589dfa Separate docker images for Nvidia and AMD in benchmarking (#41119)
Separate docker images for Nvidia and AMD
2025-09-29 17:03:27 +02:00
52cbc7c868 Fix attention sink implementation in flex attention (#41083)
* Fix attention sink implementation in flex attention

* fix dim

* fix

* Remove print

* raisae error when return_lse is False yet s_aux is providewd

* Clean test files for merge

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* force return lse

* Add to doc

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-29 14:33:03 +00:00
de9a75f5b0 fix(trainer): Avoid moving model with device_map (#41032)
* fix(trainer): Avoid moving model with device_map

When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.

This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.

A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.

* Added the logger warning for the move model

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-09-29 14:31:42 +00:00
bcc0dae77c enable flex attention ut cases on XPU (#40989)
* enable flex attention ut cases on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 14:30:49 +00:00
fcd483f0ff Bump hfh prerelease version (#41175) 2025-09-29 16:28:36 +02:00
a3fa1d3993 Fix inaccurate train_tokens_per_second when resuming from checkpoint (#41156)
* fix(trainer): Fix the issue of inaccurate token count in training sessions

During the training process, the initial token count was not saved, leading to inaccurate speed calculation. Now, the initial token count is saved and the increment during the session is calculated, ensuring that the speed metric accurately reflects the performance of the current training session.

* 修复错误

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 16:22:35 +02:00
ad74fba085 [v5] Remove model_parallel deprecated feature (#41166)
* fix

* remove model parallel

* style

* removed a bit too much

* rm comments

* fix
2025-09-29 16:14:03 +02:00
38a08b6e8a More typing fixes (#41102)
* Fix noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* remove noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix chars

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-29 13:11:53 +00:00
4fade1148f [tests] CausalLMTester automatically infers other test classes from base_model_class 🐛 🔫 (#41066)
* halfway through the models

* update test checks

* refactor all

* another one

* use tuples

* more deletions

* solve bad inheritance patterns

* type

* PR ready?

* automatic model class inference from the base class

* vaultgemma

* make fixup

* make fixup

* rebase with gpt2

* make fixup :'(

* gpt2 is special
2025-09-29 15:05:08 +02:00
cdba28c344 [XPU] Add MXFP4 support for XPU (#41117)
* XPU supports gpt-oss MXFP4

* Complete MXFP4 UT file and comment information

* Complete MXFP4 UT file and comment information

* Fix code style

* Fix code style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 12:10:41 +02:00
2dcb20dcec CI Runners - move amd runners mi355 and 325 to runner group (#41193)
* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config
2025-09-29 11:14:19 +02:00
d0d574b1e4 Modernbert fix (#41056)
* Add FA to docker

* Fixed padding for mdernbert

* Fixed logits and hidden states extraction in ModernBertForMultipleChoice

* Added a test for ModernBertForMultipleChoice

* fixes

* More fixes and GREEN CI

* consistency

* moar consistency
2025-09-29 10:52:44 +02:00
071eb5334f handle flash slow tests (#41072)
* handle flash slow tests

* update patch mask to 1/0 for flash

* don't skip flash

* flash

* raise tols

* rm flash support :(

* nits

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
2025-09-26 16:24:31 +00:00
50d2448a1a Enable fa in amd docker (#41069)
* Add FA to docker

* Use caching mechanism for qwen2_5

* Fix a typo in important models list

* Partial fixes for gemma3

* Added a commit ID for FA repo

* Detailled  the expectation storage format

* Rebase fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 13:57:58 +02:00
10f6891fc5 Remove data from examples (#41168)
Remove telemetry
2025-09-26 13:52:45 +02:00
97ca0b4712 Fix flash-attn for paged_attention when no kernels (#41078)
* Fix non-kernels flash attention paged implementation

* Cover all cases

* Style

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 10:41:21 +02:00
53838edde7 Improve add_dates script (#41167)
* utils/add_dates.py

* put lfm2-vl in correct category
2025-09-25 16:00:05 -04:00
449533af73 Add language specifiers to code blocks of markdown files (#41114)
* Add language specifiers to code blocks of markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/qwen3_omni_moe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update nemotron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phimoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix syntax error

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-25 10:29:57 -07:00
e691f84412 Force new vision models addition to include a fast image processor (#40802)
* add test

* fix test and change cutoff date

* Add documentation to test
2025-09-25 15:58:18 +00:00
e54bb62a73 Simplify and improve model loading logic (#41103)
* remove unexpected keys from inputs (they have nothing to do there)

* remove input

* simplify a lot init

* fix

* fix check for non-persistent buffer

* revert because too many old and bad models...

* remove comment

* type hint

* make it a real test

* remove model_to_load -> always use the same model

* typo

* remove legacy offload_folder (we never waste that memory anymore)

* do not change prefix anymore

* change very bad function name

* create adjust method

* remove useless method

* restrict

* BC

* remove unused method

* CI

* remove unused args

* small fix

* fix

* CI

* CI

* avoid too many loops

* fix regex

* cleaner

* typo

* fix

* fix
2025-09-25 17:28:27 +02:00
6dc9ed87a0 Fix format of compressed_tensors.md (#41155)
* Fix table format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 14:50:15 +00:00
a579de7f5e Add Parakeet (#39062)
* first commit

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update to handle masking for bs>1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tests and docs

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update model ids

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update docs and improve style

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update librosa location

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* import guard torch too

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff code checks fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff format check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* updated to parakeet names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tokenizer decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove other model dependency

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* clean tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* linting

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff lint warnings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to seperate folders

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet ctc model code

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* simplify encoder structure

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update documentation

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet to toctree

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet doc

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Address comments

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix encoding format

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix minor ctc decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
2025-09-25 13:52:24 +00:00
1dd22a234c extend gemma3n integration ut cases on XPU (#41071)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-09-25 13:46:37 +00:00
05fb90c969 Fix single quotes in markdown (#41154)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 13:03:26 +00:00
44682e7131 Adapt and test huggingface_hub v1.0.0 (#40889)
* Adapt and test huggingface_hub v1.0.0.rc0

* forgot to bump hfh

* bump

* code quality

* code quality

* relax dependency table

* fix has_file

* install hfh 1.0.0.rc0 in circle ci jobs

* repostiryo

* push to hub now returns a commit url

* catch HfHubHTTPError

* check commit on branch

* add it back

* fix ?

* remove deprecated test

* uncomment another test

* trigger

* no proxies

* many more small changes

* fix load PIL Image from httpx

* require 1.0.0.rc0

* fix mocked tests

* fix others

* unchange

* unchange

* args

* Update .circleci/config.yml

* Bump to 1.0.0.rc1

* bump kernels version

* fix deps
2025-09-25 11:13:50 +00:00
750dd2a401 Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)
Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts
2025-09-25 10:33:46 +00:00
7258ea44bc Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-24 16:44:42 +02:00
2c4caa19e7 dummy commit (#41133)
* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-24 16:31:46 +02:00
6d1875924c Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head
2025-09-24 13:13:18 +01:00
3ca43d34b1 Fixed MXFP4 model storage issue (#41118) 2025-09-24 12:11:51 +00:00
b33cb70097 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment
2025-09-24 11:54:55 +00:00
b0c7034d58 Remove self-assignment (#41062)
* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-24 12:43:17 +01:00
04a0bb569c Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:34:12 +00:00
071c7b1423 Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:27:37 +00:00
80f20e0ff8 [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-24 11:18:27 +00:00
1d81247b0c [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors

* enable torchao safetensors support

* add more version checking
2025-09-24 12:32:47 +02:00
b533cec74d Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-24 10:17:41 +00:00
65dcd66cc8 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 12:01:27 +02:00
43a613c8da Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-09-24 06:37:21 +00:00
f64354e89a Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 16:20:01 -07:00
99b0995138 Remove bad test skips (#41109)
* remove bad skips

* remove more

* fix inits
2025-09-23 20:39:28 +02:00
00f3d90720 Fix _get_test_info for inherited tests (#41106)
* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 19:35:24 +02:00
cfa022e719 [tests] gpt2 + CausalLMModelTester (#41003)
* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder
2025-09-23 18:07:06 +01:00
869735d37d 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma
2025-09-23 16:20:00 +00:00
71717ce91c docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
2025-09-23 09:18:49 -07:00
946e5f95ea fix wrong height and width when read video use torchvision (#41091) 2025-09-23 12:35:44 +00:00
870add3daf Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:43:17 +00:00
ae60692821 Remove unused arguments (#40916)
* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:40:51 +00:00
f682797866 Fix typing (#40788)
* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:36:02 +00:00
f4a6c65951 Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:27:04 +00:00
89e0f472f4 Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:14:11 +00:00
62ce6fcb60 Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script

* Adjust vars
2025-09-23 13:05:27 +02:00
257fe5eea8 Switch to python:3.10-slim for CircleCI docker images (#41067)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 12:48:48 +02:00
0ec0325781 Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-23 11:53:51 +02:00
577fa6f167 fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2025-09-23 09:50:23 +00:00
03c92884b5 Update team member list for some CI workflows (#41094)
* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 09:48:40 +00:00
cbb290ec23 Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files
2025-09-22 10:36:20 -07:00
8048c614bf [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions
2025-09-22 09:51:39 -07:00
aa30e0642e Update quantization CI (#41068)
* fix

* new everything

* fix
2025-09-22 18:10:16 +02:00
1bb69cce82 Fix CI jobs being all red 🔴 (false positive) (#41059)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 16:51:00 +02:00
f15258dec2 Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 14:29:50 +00:00
2ec37649e2 Ci utils (#40978)
* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License
2025-09-22 16:16:19 +02:00
b9d337b6f3 Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload

* Address review comments

* Address review comments
2025-09-22 14:13:46 +00:00
646ff51d1a Simplify unnecessary Optional typing (#40839)
Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:50 +00:00
c9939b3ab6 Remove repeated import (#40937)
* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:13 +00:00
4f36011545 [testing] Fix seed_oss (#41052)
* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-22 14:54:30 +02:00
2b8a7e82b5 Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking
2025-09-22 13:42:34 +01:00
226667ec2f Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 13:42:26 +01:00
6eff44bb8d Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:38:07 +00:00
9ff47a71e4 Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-09-22 12:21:38 +00:00
ae9ef2e151 docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 13:21:15 +01:00
f3c481ed87 Use torch.autocast (#40975)
* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:18:24 +00:00
37152f8446 Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:31:46 +00:00
8a52288dba Remove optax (#41030)
Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:30:39 +00:00
5f891b36cd Fix typing of tuples (#41028)
* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:29:07 +00:00
c05f9d2f0e [testing] Fix qwen2_audio (#41018)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 10:45:31 +00:00
55a1eaf6f0 Fix Qwen video tests (#41049)
fix test
2025-09-22 12:28:11 +02:00
db802aafa4 Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-09-22 10:06:59 +00:00
8a2f24a321 Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved

* redundant if condition fix
2025-09-22 09:47:34 +00:00
ebbcf00ad1 Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-09-21 23:46:27 +02:00
67097bf340 Fix benchmark runner argument name (#41012) 2025-09-20 10:53:56 +02:00
8076e755e5 Update after #41007 (#41014)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 21:55:46 +02:00
022c882e14 Fix Glm4v test (#41011)
fix
2025-09-19 18:54:26 +02:00
966b3dbcbe Fix PhimoeIntegrationTest (#41007)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:43:46 +00:00
04bf4112f2 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-19 16:41:22 +01:00
dfc230389c 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point

* update references to transformers-cli
2025-09-19 14:40:27 +00:00
8010f5d1d9 Patch more unittest.case.TestCase.assertXXX methods (#41008)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:38:12 +02:00
5bf633b32a [tests] update test_left_padding_compatibility (and minimize overwrites) (#40980)
* update test (and overwrites)

* better test comment

* 0 as a default for
2025-09-19 15:36:26 +01:00
df12617914 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
2025-09-19 14:36:12 +00:00
2a538b2ed4 fix dict like init for ModelOutput (#41002)
* fix dict like init

* style
2025-09-19 16:14:44 +02:00
96a3e898cd RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:50:26 +00:00
98c8523434 Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree

* fix new models
2025-09-19 09:47:28 -04:00
767f8a4c75 Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:18:38 +00:00
9d9c4d24c5 Make EfficientLoFTRModelTest faster (#41000)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 12:51:05 +00:00
b4ba4e1da0 [RMSNorm] Fix rms norm init for models that center around 1 (#40796)
* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
2025-09-19 12:15:36 +00:00
fce746512b [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
2025-09-19 12:04:12 +01:00
ddfa3d4402 blt wip (#38579)
* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
2025-09-19 11:55:55 +02:00
46ea7e613d [testing] test num_hidden_layers being small in model tester (#40992)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 11:45:07 +02:00
ebdc17b8e5 ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
2025-09-19 10:39:21 +01:00
e2dbde280f Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs

* more
2025-09-19 11:28:34 +02:00
155f7e2e62 🔴[Attention] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style
2025-09-19 11:23:58 +02:00
61eff450d3 Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description
2025-09-19 08:54:49 +00:00
5f6e278a51 Remove set_model_tester_for_less_flaky_tests (#40982)
remove
2025-09-18 18:56:10 +02:00
4df2529d79 🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760)
* setup

* start the purge

* continue the purge

* more and more

* more

* continue the quest: remove loading tf/jax checkpoints

* style

* fix configs

* oups forgot conflict

* continue

* still grinding

* always more

* in tje zone

* never stop

* should fix doc

* fic

* fix

* fix

* fix tests

* still tests

* fix non-deterministic

* style

* remove last rebase issues

* onnx configs

* still on the grind

* always more references

* nearly the end

* could it really be the end?

* small fix

* add converters back

* post rebase

* latest qwen

* add back all converters

* explicitly add functions in converters

* re-add
2025-09-18 18:27:39 +02:00
5ac3c5171a Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 18:27:27 +02:00
d9d7f6a6b9 Revert change in compile_friendly_resize (#40645)
fix
2025-09-18 16:25:45 +01:00
738b223f57 Add captured actual outputs to CI artifacts (#40965)
* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 15:40:53 +02:00
dd7ac4cd59 [tests] Really use small models in all fast tests (#40945)
* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency
2025-09-18 15:24:12 +02:00
2ce35a248f Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
2025-09-18 13:22:19 +00:00
6e51ac31ef [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling
2025-09-18 14:09:08 +01:00
9378f874c1 [Trainer] Fix DP loss (#40799)
* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-18 13:07:20 +00:00
7cf1f5ced0 Use skip_predictor=True in vjepa2 get_vision_features (#40966)
use skip_predictor in vjepa2 `get_vision_features`
2025-09-18 11:51:45 +00:00
f6104189fd Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 11:49:14 +00:00
c532575795 Add new model LFM2-VL (#40624)
* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
2025-09-18 11:01:58 +00:00
564fde14f1 FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-18 09:57:21 +00:00
5748352c27 Update expected values for one more test_speculative_generation after #40949 (#40967)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 11:47:14 +02:00
438343d93f Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 09:05:50 +00:00
449da6bb30 Add FlexOlmo model (#40921)
* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`
2025-09-18 09:04:06 +00:00
3bb1b4867c Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models

* PR review
2025-09-18 08:45:04 +00:00
58e13b9f12 Update expected values for some test_speculative_generation (#40949)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 20:50:38 +02:00
529d3a2b06 Fix Glm4vModelTest::test_eager_matches_fa2_generate (#40947)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 19:53:59 +02:00
a2ac4de8b0 Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports
2025-09-17 13:34:30 -04:00
8e837f6ae2 Consistent naming for images kwargs (#40834)
* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print
2025-09-17 18:40:25 +02:00
eb04363a0d Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning

* add timm

* remove
2025-09-17 18:23:37 +02:00
ecc1d778ce Fix Glm4vMoeIntegrationTest (#40930)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 18:21:18 +02:00
c5553b4120 Fix trainer tests (#40823)
* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-17 16:05:17 +00:00
14f01aee39 docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941) 2025-09-17 08:48:38 -07:00
26b65fb516 Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-17 15:42:30 +00:00
66f97d3f64 [models] remove unused import torch.utils.checkpoint (#40934) 2025-09-17 16:37:56 +01:00
3853bfe4d5 [DOC] Add missing dates in model cards (#40922)
add missing dates
2025-09-17 11:17:06 -04:00
6cade29278 Add LongCat-Flash (#40730)
* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism
2025-09-17 14:48:10 +02:00
48a5565179 Add support for Florence-2 training (#40914)
* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-17 11:49:56 +00:00
89949c5d2d Minor fix for #40727 (#40929)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 11:42:13 +02:00
c830fc1207 Adding activation kernels (#40890)
* first commit

* add mode

* revert modeling

* add compile

* rm print
2025-09-17 11:36:09 +02:00
f6999b00c3 [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 11:20:50 +02:00
8428c7b9c8 Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 09:15:55 +00:00
ddd4caf066 [Llama4] Remove image_sizes arg and deprecate vision_feature_layer (#40832)
* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment
2025-09-17 09:14:13 +00:00
b82cd1c240 Processor load with multi-processing (#40786)
push
2025-09-17 09:46:49 +02:00
6e50a8afb2 [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-16 11:31:28 -07:00
cccef4be91 Fix dtype in Paligemma (#40912)
* fix dtypes

* fix copies

* delete unused attr
2025-09-16 16:07:56 +00:00
beb09cbd5a 🔴Make center_crop fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
2025-09-16 16:01:38 +00:00
d4af0d9f03 [generate] misc fixes (#40906)
misc fixes
2025-09-16 15:18:06 +01:00
3b3f6cd0c1 [gemma3] Gemma3ForConditionalGeneration compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase
2025-09-16 15:08:48 +01:00
88ba0f107e disable test_fast_is_faster_than_slow (#40909)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:34:04 +02:00
270da89708 Remove runner_map (#40880)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:18:07 +02:00
df03fc1f9c Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-16 13:11:48 +00:00
96bc19bcdf remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-16 12:56:11 +00:00
d0af4269ec Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test
2025-09-16 13:28:23 +02:00
65f9ede359 Set seed for Glm4vIntegrationTest (#40905)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 13:01:51 +02:00
0c1839d609 [cache] Only use scalars in get_mask_sizes (#40907)
* remove tensor ops

* style

* style
2025-09-16 12:48:58 +02:00
3688a977d0 Harmonize CacheLayer names (#40892)
* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert
2025-09-16 12:14:12 +02:00
087775d10e [cache] Merge static sliding and static chunked layer (#40893)
* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle
2025-09-16 11:41:20 +02:00
1aff033ec9 Fix flaky Gemma3nAudioFeatureExtractionTest::test_dither (#40902)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 11:00:07 +02:00
65adc3aaa3 Fix getter regression (#40824)
* test things

* style

* move tests to a sane place
2025-09-16 10:57:13 +02:00
8e1a12bbee Fixing the call to kernelize (#40628)
* fix

* style

* overload train and eval

* add getter and setter
2025-09-16 10:50:54 +02:00
21c8379fb0 Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 10:21:48 +02:00
5af248b3e3 [generate] remove docs of a feature that no longer exists (#40895) 2025-09-15 19:22:31 +01:00
20ee3a73f0 🌐 [i18n-KO] Translated imageprocessor.md to Korean (#39557)
* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:07:16 -07:00
2141a5b764 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:06:57 -07:00
2a83792165 Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 17:38:13 +02:00
04d1c8f3d4 Fix deta loading & dataclass (#40878)
* fix

* fix 2
2025-09-15 17:23:13 +02:00
ff26fe8302 Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-15 15:03:43 +00:00
6254bb4a68 Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:54:14 +00:00
e674e9dadb Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:51:22 +00:00
0957999f7f 🔴 Move variable output controls to _prepare_generation_config (#40715)
* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches
2025-09-15 11:08:00 +00:00
5e9ec59d0c Fix modular consistency (#40883)
* reapply modular

* add missing one
2025-09-15 13:07:08 +02:00
3442b2f300 [VaultGemma] Update expectations in integration tests (#40855)
* fix tests

* style
2025-09-15 12:46:30 +02:00
c0dbe095b0 Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-09-15 12:46:18 +02:00
fc5f9105da [Qwen3 Next] Use numerically stable rsqrt (#40848)
use numerically stable inverse
2025-09-15 12:45:13 +02:00
96d3795cfc Update model tags and integration references in bug report (#40881) 2025-09-15 12:08:29 +02:00
f5e1641857 fix: XIELU act parameters not being casted to correct dtype (#40812) 2025-09-15 11:05:55 +02:00
ada64ce452 fix florence kwargs (#40826) 2025-09-15 11:05:47 +02:00
93f810e6fa [docstrings / type hints] Update outdated annotations for past_key_values (#40803)
* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
2025-09-15 10:52:32 +02:00
c65fea0b92 [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-15 10:46:32 +02:00
9c804f7ec4 Redirect MI355 CI results to dummy dataset (#40862) 2025-09-14 18:42:49 +02:00
02ea2b3433 Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-14 15:35:42 +00:00
d42e96a2a7 Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-13 00:49:19 +00:00
6eb3255842 [generate] Always use decoder config to init cache (#40772)
* mega derp

* fix

* always use the decoder
2025-09-12 18:24:22 +02:00
e682f90f60 [tests] move generative tests away from test_modeling_common.py (#40854)
move tests
2025-09-12 16:12:27 +00:00
8d8459132a [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve
2025-09-12 18:07:48 +02:00
291772b6b5 add: differential privacy research model (#40851)
* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 17:36:03 +02:00
8502b41bf1 [Sam2Video] Fix video inference with batched boxes and add test (#40797)
fix video inference with batched boxes and add test
2025-09-12 14:33:28 +00:00
f384bb8ad5 [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)
* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular
2025-09-12 14:21:22 +00:00
4cb41ad2a2 [tests] re-enable aria fast tests (#40846)
* rise from the dead

* test
2025-09-12 15:14:54 +01:00
ef053939ca Fixes for continuous batching (#40828)
* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style
2025-09-12 15:35:31 +02:00
98a8078127 Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-12 14:08:01 +02:00
77aa35ee9c Replace image classification loss functions to self.loss_function (#40764) 2025-09-12 12:59:37 +01:00
797859c9b8 Update no split modules in T5Gemma model (#40810)
* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-12 10:44:57 +00:00
6e69b60806 Adds Causal Conv 1D kernel for mamba models (#40765)
* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit
2025-09-12 12:22:25 +02:00
827b65c42c Add VideoProcessors to auto-backend requirements (#40843)
* add it

* fix existing ones

* add perception to auto_mapping...
2025-09-12 12:21:12 +02:00
5e2e77fb45 Improve torch_dtype checks (#40808)
* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-12 09:57:59 +00:00
c81f426f9a 🌐 [i18n-KO] Translated clipseg.md to Korean (#39903)
* docs: ko: model_doc/clipseg.md

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-11 17:07:24 -07:00
cf084f5b40 [Jetmoe] Fix RoPE (#40819)
* fix

* remove prints

* why was this there...
2025-09-11 18:41:11 +02:00
dfae7dd98d Push generation config along with checkpoints (#40804) 2025-09-11 17:33:16 +02:00
c264c0ee7e add general hub test for Fast Image Processors in test_image_processing_utils (#40086)
* build unittest for ViTImageProcessorFast

* remove redundant test case

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-09-11 14:31:37 +00:00
895b3ebe41 Fix typos in src (#40782)
Fix typoes in src

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-11 13:15:15 +01:00
6d369124ad Align torch implementation of Gated DeltaNet in Qwen3-Next with fla library. (#40807)
* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-11 13:10:15 +02:00
0f1b128d33 ⚠️ 🔴 Add ministral model (#40247)
* add ministral model

* docs, tests

* nits

* fix tests

* run modular after merge

* opsie

* integration tests

* again

* fff

* dtype

* rerun modular

* arthur review

* ops

* review
2025-09-11 10:30:39 +02:00
02f1d7c091 Fix config dtype parsing for Emu3 edge case (#40766)
* fix emu3 config

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* address comment

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add comments

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-11 08:26:45 +00:00
de01a22aff Fix edge case for tokenize (#36277) (#36555)
* Fix edge case for tokenize (#36277)

* Fix tokenizing dtype for float input cases

* add test for empty input string

* deal empty list of list like [[]]

* add tests for tokenizer for models with input that is not plain text
2025-09-11 09:57:30 +02:00
ec532f20fb feature: Add robust token counting with padding exclusion (#40416)
* created robust token counting by using existing include_num_input_tokens_seen variable and kept bool for backward compatibility and added string also to ensure everything goes well and kept default as is. also robust test cases are created

* some codebase mismatched in my local and remote, commiting to solve it and also solved code quality issue

* ci: retrigger tests

* another attemp to trigger CI for checks
2025-09-11 09:16:06 +02:00
df67cd35f0 Fix DeepSpeed mixed precision precedence over Accelerate defaults (#39856)
* Fix DeepSpeed mixed precision precedence over Accelerate defaults

Resolves issue where Accelerate would default to bf16 mixed precision
when a DeepSpeed config specifies fp16, causing a ValueError. The fix
ensures DeepSpeed config takes precedence over TrainingArguments defaults
while preserving explicit user settings.

Changes:
- Add override_training_args_from_deepspeed() method to handle config precedence
- Reorder mixed precision environment variable setting in TrainingArguments
- Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices

Fixes #39849

* Add tests for DeepSpeed mixed precision precedence fix

- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests
- Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults
- Test user explicit settings being preserved over DeepSpeed config
- Test precedence hierarchy: user settings > DeepSpeed config > defaults
- Replace massive 934-line test bloat with concise 50-line test suite
- Tests cover core functionality of PR #39856 mixed precision precedence fix
2025-09-11 09:12:15 +02:00
549ba5b8b6 [Docs] Add missing class documentation for optimizer_schedules (#31870, #23010) (#40761)
* Add missing class documentation for optimizer_schedules (#31870, #23010)

* Add section level header to the optimizer schedules
2025-09-10 14:58:21 -07:00
dae1ccfb98 fix_image_processing_fast_for_glm4v (#40483)
* fix_image_processing_fast_for_glm4v

* fix(format): auto-ruff format

* add test image processing glm4v

* fix quality

---------

Co-authored-by: Your Name <you@example.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-10 21:05:27 +00:00
7d57b31e16 Remove use_ipex option from Trainer (#40784)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 17:00:15 +00:00
3378e7dabf Move num_items_in_batch to correct device before accelerator.gather (#40773)
add device
2025-09-10 18:49:42 +02:00
e5ecb03c92 Fix the issue that csm model cannot work with pipeline mode. (#39349)
* Fix the issue that csm model cannot work with pipeline mode.

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove batching inference

Signed-off-by: yuanwu <yuan.wu@intel.com>

* csm output is list of tensor

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update src/transformers/pipelines/text_to_audio.py

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Use different waveform key for different model

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix make style errors

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add csm tests

Signed-off-by: yuanwu <yuanwu@habana.ai>

* Update src/transformers/models/auto/tokenization_auto.py

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Signed-off-by: yuanwu <yuanwu@habana.ai>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-10 16:17:35 +00:00
abbed7010b Fix dotted model names (#40745)
* Fix module loading for models with dots in names

* quality check

* added test

* wrong import

* Trigger CI rerun after making test model public

* Update src/transformers/dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Update tests/utils/test_dynamic_module_utils.py

* Move test

* make fixup

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-09-10 14:34:56 +00:00
75202b0928 Read config pattern for Qwen3Next (#40792)
read it
2025-09-10 15:18:51 +02:00
7401cfa57c Use functools.cached_property (#40607)
* cached_property is avaiable in functools

Signed-off-by: cyy <cyyever@outlook.com>

* Remove cached_property

Signed-off-by: cyy <cyyever@outlook.com>

* Fix docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:15:40 +00:00
8ab2448707 Fix invalid PipelineParallel member (#40789)
Fix invalid enum member

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 12:06:36 +00:00
6c9f412105 Fix typos in tests and util (#40780)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-10 11:45:40 +00:00
0997c2f2ab Fix doc for PerceptionLMForConditionalGeneration forward. (#40733)
* Fix doc for PerceptionLMForConditionalGeneration forward.

* fix last nit

---------

Co-authored-by: raushan <raushan@huggingface.co>
2025-09-10 11:57:19 +02:00
a72e5a4b9d 🚨 Fix Inconsistant input_feature length and attention_mask length in WhisperFeatureExtractor (#39221)
* Update feature_extraction_whisper.py

* Reformat

* Add feature extractor shape test

* reformat

* fix omni

* fix new failing whisper test

* Update src/transformers/models/whisper/feature_extraction_whisper.py

* make style

* revert omni test changes

* add comment

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-09-10 09:38:47 +00:00
a5ecd94a3f Enable ruff on benchmark and scripts (#40634)
* Enable ruff on benchmark and scripts

Signed-off-by: cyy <cyyever@outlook.com>

* Cover benchmark_v2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* correct

* style

* style

---------

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-10 11:38:06 +02:00
08edec9f7d [processors] Unbloating simple processors (#40377)
* modularize processor - step 1

* typos

* why raise error, super call check it also

* tiny update

* fix copies

* fix style and test

* lost an import / fix copies

* fix tests

* oops deleted accidentally
2025-09-10 10:37:19 +02:00
c52889bd51 Remove reference of video_load_backend and video_fps for processor (#40719)
* Remove reference of video_load_backend and video_fps for processor

Signed-off-by: cyy <cyyever@outlook.com>

* Restore changes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-10 08:37:11 +00:00
3340ccbd40 Fix gpt-oss router_indices in EP (#40545)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix router indice

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix mod

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix masking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add safety cheking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix checking

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* enable 1 expert per rank

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix skip

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add ep plan in config

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add update ep plan

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm ep_plan and add comments

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-10 10:30:55 +02:00
b9282355be Adding Support for Qwen3-Next (#40771)
* Add Qwen3-Next.

* fix

* style

* doc

* simplify

* fix name

* lazy cache init to allow multi-gpu inference

* simplify

* fix config to support different hybrid ratio.

* remove last commit (redundant)

* tests

* fix test

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-09 23:46:57 +02:00
79fdbf2a4a [docs] CPU install (#40631)
* init

* feedback
2025-09-09 12:51:54 -07:00
37c14430c9 [pipeline] ASR pipeline kwargs are forwared to generate (#40375)
* tmp commit

* add test

* PR suggestion
2025-09-09 17:29:25 +00:00
d09fdf5e52 Fix crash when executing MambaCache sample code (#40557)
* Fix the sample code of MambaCache

* Update automatically generated code

* Fix FalconMambaCache documents

* minor doc fixes

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
2025-09-09 16:44:49 +00:00
d33c189e5a [RoPE] run RoPE tests when the model uses RoPE (#40630)
* enable rope tests

* no manual rope test parameterization

* Apply suggestions from code review

* Update tests/models/hunyuan_v1_dense/test_modeling_hunyuan_v1_dense.py

* PR comment: use generalist torch code to find the rope layer
2025-09-09 17:11:02 +01:00
71ac7ea048 [tests] update test_past_key_values_format and delete overwrites (#40701)
* tmp

* rm some overwrites
2025-09-09 16:40:04 +01:00
7aaef98cbe rm src/transformers/convert_pytorch_checkpoint_to_tf2.py (#40718)
* rm src/transformers/convert_pytorch_checkpoint_to_tf2.py

* doctest skip
2025-09-09 16:34:54 +01:00
de5cbe8b79 [deprecations] Remove generate-related deprecations up to v4.56 (#40729)
remove generate-related deprecations up to v4.56
2025-09-09 16:32:41 +01:00
1cdbbb3e9d Support sliding window in CB (#40688)
* CB example: better compare feature

* Cache managers, still issue w/ effective length

* WIP -- fix for effective length

* Renames

* Wroking, need better parity checks, we mind be missing 1 token

* Small fixes

* Fixed wrong attn mask and broke cache into pieces

* Warmup is slowing down things, disabling it

* Cache was too big, fixed

* Simplified index objects

* Added a profile option to the example

* Avoid calls to memory reporing tools

* Restore full attention read indices for better latency

* Adressed some TODOS and style

* Docstrings for cache managers

* Docstrings for Schedulers

* Refactor scheudlers

* [Important] Cache fix for sliding window, check with small sw size

* Updated doc for cache memory compute and cache as a whole

* Moved a todo

* Nits and style

* Fix for when sliding window is smaller than max batch per token

* Paged interface update

* Support for FLash in new API

* Fix example CB

* Fix bug in CB for paged

* Revert example

* Style

* Review compliance

* Style

* Styleeeee

* Removed NO_SLIDING_WINDOW

* Review #2 compliance

* Better art

* Turn cum_seqlens_k in a dict

* Attn mask is now a dict

* Update examples/pytorch/continuous_batching.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Adressed McPatate pro review

* Style and fix

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-09-09 15:51:11 +02:00
ed100211cb [generate] PromptLookupCandidateGenerator won't generate forbidden tokens (#40726)
* no longer flaky :)

* PR comments

* any token-blocking logits processor works

* ?

* default

* -_-

* create fake tensors once
2025-09-09 11:04:01 +00:00
82d66e5dd0 Fix: swanlab public.cloud.experiment_url api error (#40763)
fix
2025-09-09 09:28:13 +00:00
a871f6f58d Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing (#40215)
* Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing

* Fix fast processor output format and add comprehensive tests

* Fix trailing whitespace in test file

* Apply ruff formatting to test file

* simplify pair validation logic

* add superglue tests to fast image processor

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-08 21:08:02 +00:00
aee5000f16 Fix Bark failing tests (#39478)
* Fix vocab size for Bark generation.

* Fix Bark processor tests.

* Fix style.

* Address comments.

* Fix formatting.

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-08 20:24:51 +02:00
126264d015 🌐 [i18n-KO] Translated 'xclip.md' to Korean (#39594)
* feat: nmt draft

* fix: manual edits

* docs: ko: xclip.md

* feat: nmt draft

* fix: manual edits

* fix: Modify _toctree.yml file to reflect review

* fix: Modify _toctree.yml file to reflect review

* jungnerd_suggestion_modified_01 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* jungnerd_suggestion_modified_02 ko_xclip.md

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-09-08 11:19:10 -07:00
5a468e56b7 Fix continue_final_message in apply_chat_template to prevent substring matching issues (#40732)
* Fix continue_final_message parameter in apply_chat_template

* after run fixup

* Handle trim in the template

* after fixup

* Update src/transformers/utils/chat_template_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-08 17:25:12 +00:00
e8db153599 Fix inconsistency in SeamlessM4T and SeamlessM4Tv2 docs (#39364) 2025-09-08 10:01:44 -07:00
fd2a29d468 Fix more typos (#40627)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 16:05:40 +00:00
bb8e9cd675 Remove unnecessary tildes from documentation (#40748) 2025-09-08 08:56:35 -07:00
a9b313a0c2 docs: add continuous batching to serving (#40758)
* docs: tmp

* docs: add continuous batching to serving

* docs: reword after @lysandrejik review
2025-09-08 15:50:28 +00:00
2077f17547 feat: err when unsupported attn impl is set w/ --continuous_batching (#40618)
* feat: err when unsupported attn impl is set w/ `--continuous_batching`

* refactor: move defaults and support list to CB code

* feat: add action item in error msg

* fix(serve): add default attn implementation

* feat(serve): add log when `attn_implementation` is `None`

* feat: raise Exception when attn_implementation is not supported by CB
2025-09-08 14:31:49 +00:00
dc262ee6f5 remove FSDP prefix when using save_pretrained with FSDP2 (#40207)
* remove FSDP prefix when using save_pretrained with FSDP2

* Fix: use removeprefix correctly

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 14:52:31 +02:00
9ab6078323 remove gemmas eager training warning (#40744)
* removed warning

* removed remaining warnings
2025-09-08 14:41:52 +02:00
2a1eb5b508 Add BF16 support check for MUSA backend (#40576)
add musa bf16 supported

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-08 12:39:14 +00:00
7b8d40ea7a Set accepts_loss_kwargs to False for ConvNext(|V2)ForImageClassification (#40746) 2025-09-08 14:25:43 +02:00
def7558f74 Fix np array typing (#40741)
Fix typing

Signed-off-by: cyy <cyyever@outlook.com>
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-08 11:30:40 +00:00
44b3888d2a Fix order of mask functions when using and/or_mask_function (#40753)
fix order
2025-09-08 12:31:42 +02:00
3f7bda4209 [Continous Batching] fix do_Sample=True in continuous batching (#40692)
* fix do_Sample=True in continous batching

* added test

* fix top_p

* test

* Update examples/pytorch/continuous_batching.py
2025-09-08 10:30:15 +02:00
bb45d3631e refactor(serve): move request_id to headers (#40722)
* refactor(serve): move `request_id` to headers

* fix(serve): typo in middleware fn name

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-05 17:50:04 +02:00
12b8e10dbf Skip VitMatteImageProcessingTest::test_fast_is_faster_than_slow (#40713)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 17:36:20 +02:00
6b232618b6 Keypoint matching docs (#40541)
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
2025-09-05 17:24:56 +02:00
948bc0fa34 [Gemma Embedding] Fix SWA (#40700)
* fix gemma embedding flash attention

* fix sdpa

* fix atttempt number 2

* alternative gemma fix

* fix modular
2025-09-05 17:12:00 +02:00
828044cadb Add Optional typing (#40686)
* Add Optional typing

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Format

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 15:05:51 +00:00
e9d6a6907b [tests] remove overwrites of removed test (#40720)
rm tests from method moved to hub
2025-09-05 16:04:22 +01:00
96a5774f2e [serve] re-enable tests (#40717)
run tests
2025-09-05 15:15:34 +01:00
c76387e580 Fix arguments (#40605)
* Fix invalid arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typing

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self

Signed-off-by: cyy <cyyever@outlook.com>

* Add missing self and other fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

*  More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 13:50:04 +00:00
21f09032db 🔴 Update Glm4V to use config values (#40712)
* update to use config

* just fix it

* fixup want this to be reformatted
2025-09-05 13:19:50 +00:00
b62e5b6051 Fix parent classes of AllKwargsForChatTemplate (#40685)
Fix parent classes of AllKwargsForChatTemplate because the *Kwargs are members

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 11:08:51 +00:00
313effa7ad [onnx] use logical or for grounding dino mask (#40625)
* change |= operator to use torch logical or for friendly export to different backends

* change |= operator to use torch logical or for friendly export to different backends in grounding dino model

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
2025-09-05 10:55:20 +00:00
f3211b5db7 [moduar] Add missing self in post-process methods (#40711) 2025-09-05 10:49:52 +00:00
a2a8a3ca1e [tests] fix blip2 edge case (#40699) 2025-09-05 11:35:29 +01:00
4e195f1949 🚨 Allow check_model_inputs in core VLMs (#40342)
* allow `check_model_inputs` in core VLMs

* address comments

* fix style

* why this didnt fail prev?

* chec for Noneness instead

* batch update vlms

* fix some tests

* fix copies

* oops delete

* fix efficientloftr

* fix copies

* i am stupid, fix idefics

* fix GC

* return type and other comments

* we shouldn't manually change attention anymore

* fix style

* fix copies

* fix the test
2025-09-05 10:05:56 +00:00
93df343def Fix parent classes of ProcessingKwargs (#40676)
FIx parent classes of ProcessingKwargs

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-05 10:01:16 +00:00
89e103c15e feat(serve): add healthcheck test (#40697) 2025-09-05 11:56:34 +02:00
a2fffa505d Fetch more test data with hf_hub_download (#40710)
[test-all] tests

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-05 09:49:31 +00:00
4a88e81532 Add Fast Image Processor for ImageGPT (#39592)
* initial commit

* initial setup

* Overiding imageGPT specific functions

* imported is_torch_available and utilized it for importing torch in imageGPT fast

* Created init and ImageGPTFastImageProcessorKwargs

* added return_tensors, data_format, and input_data_format to ImageGPTFastImageProcessorKwargs

* set up arguments and process and _preprocess definitions

* Added arguments to _preprocess

* Added additional optional arguments

* Copied logic over from base imageGPT processor

* Implemented 2nd draft of fast imageGPT preprocess using batch processing

* Implemented 3rd draft of imageGPT fast _preprocessor. Pulled logic from BaseImageProcessorFast

* modified imageGPT test file to properly run fast processor tests

* converts images to torch.float32 from torch.unit8

* fixed a typo with self.image_processor_list in the imagegpt test file

* updated more instances of image_processing = self.image_processing_class in the test file to test fast processor

* standardized normalization to not use image mean or std

* Merged changes from solution2 branch

* Merged changes from solution2 test file

* fixed testing through baseImageGPT processor file

* Fixed check_code_quality test. Removed unncessary list comprehension.

* reorganized imports in image_processing_imagegpt_fast

* formatted image_processing_imagegpt_fast.py

* Added arg documentation

* Added FastImageProcessorKwargs class + Docs for new kwargs

* Reformatted previous

* Added F to normalization

* fixed ruff linting and cleaned up fast processor file

* implemented requested changes

* fixed ruff checks

* fixed formatting issues

* fix(ruff after merging main)

* simplify logic and reuse standard equivalenec tests

---------

Co-authored-by: Ethan Ayaay <ayaayethan@gmail.com>
Co-authored-by: chris <christine05789@gmail.com>
Co-authored-by: Ethan Ayaay <98191976+ayaayethan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-04 22:45:06 +00:00
9db11b728b Fetch one missing test data (#40703)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 23:05:23 +02:00
acd820561f Align assisted generate for unified signature in decoding methods (#40657)
* Squashed previous branch

* unify assisted generate to common decoding method signature

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review
2025-09-04 22:47:44 +02:00
16b821c542 Avoid T5GemmaModelTest::test_eager_matches_sdpa_inference being flaky (#40702)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 20:44:40 +00:00
519c2524af Fix broken Llama4 accuracy in MoE part (#40609)
* Fix broken Llama4 accuracy in MoE part

Llama4 accuracy is broken by a bug in
https://github.com/huggingface/transformers/pull/39501 . It forgot to
transpose the router_scores before applying it to routed_in, causing
Llama4 to generate garbage output.

This PR fixes that issue by adding back the transpose() and adding some
comments explaining why the transpose() is needed.

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>

* remove comment

---------

Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-04 22:14:44 +02:00
586dc5d06e [Glm4.5V] fix vLLM support (#40696)
* fix

* add a test case
2025-09-04 22:09:20 +02:00
ad2da3ea83 Fix self.dropout_p is not defined for SamAttention/Sam2Attention (#40667)
Fix dropout_p is not defined for SamAttention/Sam2Attention
2025-09-04 19:32:39 +02:00
e39f222096 Fix backward compatibility with accelerate in Trainer (#40668) 2025-09-04 18:15:15 +02:00
d8f670583e Change docker image to preview for the MI355 CI (#40693)
* Change docker image to preview for the MI355 CI

* Use pushed image
2025-09-04 17:23:09 +02:00
4cbca0d1af Fixing bug in Voxtral when merging text and audio embeddings (#40671)
* Fixing bug when replacing text-audio token placeholders with audio embeddings

* apply changes

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-09-04 15:11:23 +00:00
9a6c6568db feat: support request cancellation (#40599)
* feat: support request cancellation

* test: add cancellation test

* refactor: use exisitng fn to check req cancellation

* feat(cb): make cancellation thread safe

* refactor(serve): update test to use `requests` instead of `httpx`
2025-09-04 17:01:29 +02:00
87f38dbfce add: embedding model (#40694)
* Gemma 3 for Embeddings

* Style fixes

* Rename conversion file for consistency

* Default padding side emb vs gen

* Corrected 270m config

* style fixes

* EmbeddingGemma config

* TODO for built-in prompts

* Resolving the sentence similarity bug and updating the architecture

* code style

* Add query prompt for SentenceTransformers

* Code quality

* Fixing or_mask_function return types

* Adding placeholder prompts for document and passage

* Finalizing prompt templates

* Adding Retrieval ro preconfigured prompts

* Add Gemma 3 270M Config

* Correcting num_linear_layers flag default

* Export Sentence Transformer in correct dtype

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
2025-09-04 16:16:15 +02:00
5b0c01b5e2 Final test data cache - inside CI docker images (#40689)
* run

* build

* build

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 13:12:49 +00:00
1f3cc935cc Load a tiny video to make CI faster (#40684)
* load a tiny video to make CI faster

* add video in url_to_local_path
2025-09-04 14:49:00 +02:00
669230a86f fix broken offline mode when loading tokenizer from hub (#40669)
* fix broken offline mode when loading tokenizer from hub

* formatting

* make quality

* fix import order
2025-09-04 12:15:56 +00:00
91b34be9cf Add codebook_dim attribute to DacVectorQuantize for DacResidualVectorQuantize.from_latents() (#40665)
* Add instance attribute to DacVectorQuantize for use in DacResidualVectorQuantize.from_latents

* add from_latent tests

* style fix

* Fix style for test_modeling_dac.py
2025-09-04 11:29:53 +00:00
25b4a0d8ae Add sequence classification support for small Gemma 3 text models (#40562)
* add seq class for gemma3 text model

* add Gemma3TextForSequenceClassification to modeling file

* After run make fixup

* let's just check

* thiis is why it was crashing, tests were just failing...

* skip it, tested only for seq clf

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-04 09:44:59 +00:00
30a4b8707d CircleCI docker images cleanup / update / fix (#40681)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:42:18 +02:00
7f92e1f91a Mark Aimv2ModelTest::test_eager_matches_sdpa_inference_04_fp16_pad_right_sdpa_kernels as flaky (#40683)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 10:30:14 +02:00
ca9b36a9c1 Avoid night torch CI not run because of irrelevant docker image failing to build (#40677)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 09:06:37 +02:00
d40e7ea52d Skip more fast v.s slow image processor tests (#40675)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-04 06:35:44 +02:00
34595cf296 Even more test data cached (#40636)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 21:20:37 +00:00
f22ec7f174 Benchmarking V2: framework impl (#40486)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments

* Benchmarking v2

* Fix llama bench parameters

* Working checkpoint

* Readme touchups

* Remove unnecessary test

* Massage the framework a bit

* Small cleanup

* Remove unnecessary flushes

* Remove references to mock benchmark

* Take commit ID from CLI

* Address review comments

* Use Events for thread comms

* Tiny renaming
2025-09-03 22:26:32 +02:00
459c1fa47a refactor: use tolist instead of list comprehension calling .item() (#40646) 2025-09-03 19:25:29 +02:00
afd1393df1 Remove overwritten GitModelTest::test_beam_search_generate (#40666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:55:45 +02:00
68b9cbb7f5 Skip test_prompt_lookup_decoding_matches_greedy_search for qwen2_audio (#40664)
* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

* Skip `test_prompt_lookup_decoding_matches_greedy_search` for `qwen2_audio`

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 18:43:35 +02:00
55676d7d4c Fix warning for output_attentions=True (#40597)
* Fix attn_implementation for output_attentions

* remove setting attention, just raise warning

* improve message

* Update src/transformers/utils/generic.py
2025-09-03 16:25:13 +00:00
b67608f587 Skip test_fast_is_faster_than_slow for Owlv2ImageProcessingTest (#40663)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:49:10 +02:00
30d66dc3bc Update check_determinism inside test_determinism (#40661)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 17:30:39 +02:00
3f40ebf620 Allow custom args in custom_generate Callables and unify generation args structure (#40586)
* Squashed commit of the following:

commit beb2b5f7a04ea9e12876696db66f3589fbae10c5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 16:03:25 2025 +0200

    also standardize _get_stopping_criteria

commit 15c25663fa991e0a215a7f3cdcf13a9d3a989faa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 15:48:38 2025 +0200

    watch super.generate() usages

commit 67dd845be2202d191a54b2872f1cb3f71b74b7d6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:44:32 2025 +0200

    ops

commit 4655dfa28fd59d5dc083a41d8396de042d99858c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:41:36 2025 +0200

    wrong merge

commit 46478143994e7b27d51c972a7881e0fea3cb6e3c
Merge: a72c2c4b2f 8564e210ca
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:36:15 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into fix-custom-gen-from-function2

commit a72c2c4b2f9c0e09fe6ec7992d4d02bfa279da2a
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 14:04:59 2025 +0200

    ops5

commit e72f91411b961979bb3d271810f57905cee5b577
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 12:06:19 2025 +0200

    ops4

commit 12ca97b1078a42167143e0243036f6ef87d5fdac
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:58:59 2025 +0200

    ops3

commit 8cac6c60a318dd381793d4bf1ef3775823f3c95b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:43:03 2025 +0200

    ops2

commit 4681a7d5dc6c8b96a515d9d79f06380c096b9a9f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:40:51 2025 +0200

    ops

commit 0d72aa6cbd99a5933c5a95a39bea9088ee21e50f
Merge: e0d47e980e 5bb6186b8e
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:37:28 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 5bb6186b8efbd5fdb8e3464a22f958343b9c450c
Merge: 44973dac7d b0db5a02f3
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:36:30 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit 44973dac7df4b4e2111c71f5fac918be21f3de52
Merge: 1ddab4bee1 893d89e5e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 11:29:48 2025 +0200

    Merge commit '893d89e5e6fac7279fe4292bfa3b027172287162' into remove-constrained-bs

commit e0d47e980e26d32b028c2b402ccb71262637a7a7
Merge: 88128e4563 1ddab4bee1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:52:50 2025 +0200

    Merge branch 'remove-constrained-bs' into fix-custom-gen-from-function2

commit 88128e4563c0be583728e1d3c639bc93143c4029
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Mon Sep 1 10:44:38 2025 +0200

    fix custom generate args, refactor gen mode args

commit 1ddab4bee159f6c20722e7ff5cd41d5041fab0aa
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Sun Aug 31 21:03:53 2025 +0200

    fix

commit 6095fdda677ef7fbeb06c05f4f914a11b45257b4
Merge: 4a8b6d2ce1 04addbc9ec
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:49:16 2025 +0200

    Merge branch 'remove-constrained-bs' of github.com:manueldeprada/transformers into remove-constrained-bs

commit 4a8b6d2ce18b3a8b52c5261fea427e2416f65187
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 17:48:25 2025 +0200

    restore and deprecate beam obkects

commit 04addbc9ec62dd4f59d15128e8cd9499e2cda3bb
Merge: e800c7841e becab2c601
Author: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Date:   Thu Aug 28 14:38:29 2025 +0200

    Merge branch 'main' into remove-constrained-bs

commit e800c7841e5c46ce5698fc9be309d0808f85d23c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:38:10 2025 +0200

    tests gone after green

commit 33971d21ac40aef76a7e1122f4a98ef28beadbe8
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 14:07:11 2025 +0200

    tests green, changed handling of deprecated methods

commit ab303835c184d0a87789da7aed7d8de5ba85d867
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:58:01 2025 +0200

    tests fix

commit ec74274ca52a6aa0b5f300374fda838609680506
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 12:32:05 2025 +0200

    ops

commit 0fb19004ccd285dcad485fce0865b355ce5493e0
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:45:16 2025 +0200

    whoops

commit c946bea5e45aea021c8878c57fcabc2a13f06fe5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:35:36 2025 +0200

    testing...

commit 924c0dec6d9ea6b4890644fe7f711dc778f820bb
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:22:46 2025 +0200

    sweeep ready for tests

commit b05aa771d3994b07cd460cda74b274c9e4f315e6
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Thu Aug 28 11:13:01 2025 +0200

    restore and deprecate constraints

commit 9c7962d10efa7178b69d3c99e69663756e1cd979
Merge: fceeb383f9 c17bf304d5
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 20:44:21 2025 +0200

    Merge branch 'remove-group-bs' into remove-constrained-bs

commit c17bf304d5cf33af7f34f9f6057915d5f5821dae
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 17:00:50 2025 +0200

    fix test

commit d579aeec6706b77fcc24c1f6806cd7277d7db56e
Merge: 822efd8c3c ed5dd2999c
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 16:04:31 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 822efd8c3cf475d079e64293aa06e4ab59740fd7
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 15:59:51 2025 +0200

    aaand remove tests after all green!!

commit 62cb274a4acb9f24201902242f1b0dc4e46daac1
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:48:19 2025 +0200

    fix

commit c89c892e7b24a7d71831f2b35264456005030925
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Wed Aug 27 11:45:20 2025 +0200

    testing that hub works the same

commit fceeb383f99e4a836679d67b1d2a8520152eaf49
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 20:06:59 2025 +0200

    draft

commit 6a9b384078f3798587ba865ac7ddfefc9a79e41c
Merge: 8af3af13ab 58cebc848b
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 15:00:05 2025 +0200

    Merge branch 'main' of github.com:huggingface/transformers into remove-group-bs

commit 8af3af13abb85ca60e795d0390832f398a56c34f
Author: Manuel de Prada Corral <manueldeprada@gmail.com>
Date:   Tue Aug 26 11:55:45 2025 +0200

    Squashed commit remove-constrastive-search

* ops

* fix

* ops

* review

* fix

* fix dia

* review
2025-09-03 17:30:09 +02:00
a8f400367d Avoid attention_mask copy in qwen2.5 (#40658)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-03 15:17:22 +00:00
57f5668d0b Fix Metaclip modular conversion (#40660)
* Fix Metaclip modular conversion

* manually run check_copies
2025-09-03 16:13:50 +01:00
238a8274b4 feat(serving): add healthcheck (#40653) 2025-09-03 16:43:12 +02:00
f2416b4fd2 fix pipeline dtype (#40638)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 16:05:48 +02:00
5ea5c8179b Mark LongformerModelTest::test_attention_outputs as flaky (#40655)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 13:19:02 +00:00
fe1a9e0dba Remove TF/Flax examples (#40654)
* Remove TF/Flax examples

* Remove check_full_copies

* Trigger CI
2025-09-03 14:15:57 +01:00
5e2e496149 fix MetaCLIP 2 wrong link & wrong model names in the docstrings (#40565)
* fix MetaCLIP 2 wrong link & wrong model names in the documentation and docstrings

* ruff reformatted

* update files generated by modular

* update meta_clip2 to metaclip_2 to match the original

* _supports_flash_attn = False

---------

Co-authored-by: Yung-Sung Chuang <yungsung@meta.com>
2025-09-03 13:53:56 +01:00
03708ccf6f add DeepseekV3ForTokenClassification (#40641)
* add DeepseekV3ForTokenClassification

* fix typo

---------

Co-authored-by: json.bourne <json.bourne@kakaocorp.com>
2025-09-03 12:30:09 +00:00
c485c52db4 Skip test_prompt_lookup_decoding_matches_greedy_search for voxtral (#40643)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 11:45:29 +00:00
2bbf98a83d Fix: PIL image load in Processing utils apply_chat_template (#40622) 2025-09-03 13:06:05 +02:00
acc968c581 [CP] Add attention_mask to the buffer when the mask is causal (#40619)
Fix attention mask validation for context parallelism

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-03 10:19:35 +00:00
cb54ce4ec6 [auto-model] propagate kwargs (#40491)
propagate kwargs
2025-09-03 09:59:20 +00:00
ye
0f5e45a6d1 fix: gas for gemma fixed (#40591)
* fix: gas for gemma fixed

* feat: run fix-copies

* feat: added issue label
2025-09-03 08:44:14 +00:00
e690fe61e8 Fix too many requests in TestMistralCommonTokenizer (#40623)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-03 05:05:03 +02:00
00a8364271 🌐 [i18n-KO] Translated deepseek_v3.md to Korean (#39649)
* docs: ko: deepseek_v3.md

* feat: nmt draft

* fix: manual edits

* fix: glossary edits

* docs : 4N3MONE recommandced modified contents

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/deepseek_v3.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* add_toctree.yml

---------

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
2025-09-02 13:35:56 -07:00
ed49376a42 Remove random flag (#40629)
remove flag
2025-09-02 19:10:02 +02:00
d47ad91c3c Support TF32 flag for MUSA backend (#33187)
* Support MUSA (Moore Threads GPU) backend in transformers
Add accelerate version check, needs accelerate>=0.33.0

* Support TF32 flag for MUSA backend

* fix typo
2025-09-02 16:27:10 +00:00
a470f21396 Enable more ruff UP rules (#40579)
* Import Sequence from collections.abc

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff UP rules

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 17:29:59 +02:00
37103d6f22 Fix invalid typing (#40612)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 13:10:22 +00:00
4f542052b9 Remove unnecessary pillow version check (#40604)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-02 12:59:22 +00:00
8c60a7c385 Add collated reports job to Nvidia CI (#40470)
* Add collated reports job to Nvidia CI

* machine_type

* Move collated reports job to model_jobs

* Propagate repo id variable

* assifgn runner_type is self-scheduled-caller
2025-09-02 14:25:22 +02:00
97266dfd50 Fix flaky JambaModelTest.test_load_balancing_loss (#40617)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 13:58:16 +02:00
91be12bdc6 Avoid too many request caused by AutoModelTest::test_dynamic_saving_from_local_repo (#40614)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 12:08:52 +02:00
bbd8085b0b Fix processor chat template (#40613)
fix tests
2025-09-02 10:59:48 +02:00
b2b1c30b1b fix: continuous batching in transformers serve (#40479)
* fix: continuous batching in `transformers serve`

* fix: short circuit inner gen loop when prepare_next_batch prepared nothing

* docs: add comment explaining FastAPI lifespan

* test: add CB serving tests

* refactor: remove gen cfg max new tokens override bc unnecessary

* docs: add docstring for `ServeCommand::run`

* feat: use new `DecodeStream` API
2025-09-02 10:45:05 +02:00
8a091cc07c Disable cache for TokenizerTesterMixin temporarily (#40611)
* try no cache

* try no cache

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-02 08:40:04 +02:00
514b3e81b7 Multiple fixes to FA tests in AMD (#40498)
* Expectations for gemma3

* Fixes for Qwen2_5_VL tests

* Added expectation but underlying pb is still there

* Better handling of mrope section for Qwen2_5_vl

* Fixes for FA2 tests and reformat batch test for Qwen2_5_Omni

* Fix multi-device error in qwen2_5_omni

* Styel and repo-consistency

* Removed inherited test because fix in common

* slow tests fixes

* Style

* Fixes for qwen2_5_vl or omni for FA test
2025-09-01 20:49:50 +02:00
b3655507bb Pin torchcodec to 0.5 in AMD docker (#40598) 2025-09-01 20:39:55 +02:00
4da03d7f57 Reduce more test data fetch (#40595)
* example

* fix

* fix

* add to fetch script

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 18:07:18 +02:00
abf5900a76 [Tests] Fixup duplicated mrope logic (#40592)
cleanup duplicated logic
2025-09-01 17:22:34 +02:00
3beac9c659 Fix quite a lot of FA tests (#40548)
* fix_rope_change

* fix

* do it dynamically

* style

* simplify a lot

* better fix

* fix

* fix

* fix

* fix

* style

* fix
2025-09-01 16:42:50 +02:00
21e708c8fd Fix for missing default values in encoder decoder (#40517)
* Added default_value for is_updated and type check

* Forgot one

* Repo consistency
2025-09-01 16:11:23 +02:00
c99d43e6ec Fix siglip flaky test_eager_matches_sdpa_inference (#40584)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:17:25 +02:00
3c3dac3c12 Add Copilot instructions (#40432)
* Add copilot-instructions.md

* Fix typo

* Update .github/copilot-instructions.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-01 14:09:54 +01:00
2b71c5b7a6 Fix inexistent imports (#40580)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 13:05:00 +00:00
8e0b2c8baf Skip TvpImageProcessingTest::test_slow_fast_equivalence (#40593)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 15:03:34 +02:00
a543095c99 Fix typos (#40585)
Signed-off-by: cyy <cyyever@outlook.com>
2025-09-01 12:58:23 +00:00
8564e210ca 🚨 Remove Constrained Beam Search decoding strategy (#40518)
* Squashed remove-constrastive-search

* sweeep ready for tests

* testing...

* whoops

* ops

* tests fix

* tests green, changed handling of deprecated methods

* tests gone after green

* restore and deprecate beam obkects

* restore and deprecate constraint objects

* fix ci

* review
2025-09-01 12:34:48 +00:00
564be6d895 Support batch size > 1 image-text inference (#36682)
* update make nested image list

* fix make flat list of images

* update type anno

* fix image_processing_smolvlm

* use first image

* add verbose comment

* fix images

* rollback

* fix ut

* Update image_processing_smolvlm.py

* Update image_processing_idefics3.py

* add tests and fix some processors

* fix copies

* fix after rebase

* make the test cover chat templates

* sjip udop, no point in fixing it

* fix after rebase

* fix a few more tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: raushan <raushan@huggingface.co>
2025-09-01 12:26:07 +00:00
3bccb02616 🚨 Remove Group Beam Search decoding strategy (#40495)
* Squashed remove-constrastive-search

* testing that tests pass using hub

* fix

* aaand remove tests after all green!!
2025-09-01 13:42:48 +02:00
90953d5bc1 Fix custom generate relative imports (#40480) 2025-09-01 13:38:56 +02:00
2537ed4477 Update get_*_features methods + update doc snippets (#40555)
* siglip

* clip

* aimv2

* metaclip_2

* align

* align fixup

* altclip

* blip2 (make consistent)

* chineese clip

* clipseg

* flava

* groupvit

* owlv2

* owlvit

* vision_encoder

* clap

* x_clip

* fixup

* fix siglip2

* blip2

* fix blip2 tests (revert to original)

* fix docs
2025-09-01 12:37:43 +01:00
48ebae975e Fix llava image processor (#40588)
fix
2025-09-01 13:32:57 +02:00
db6821b79c Allow remi-or to run-slow (#40590)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 12:30:53 +02:00
6546f288a1 Fix CircleCI step passes in the case of pytest worker crash at test collection time (#40552)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:33:23 +02:00
cfed99d310 Fix test_eager_matches_sdpa_inference not run for CLIP (#40581)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-01 11:21:56 +02:00
1d742644c0 [qwen-vl] fix position ids (#40490)
* fix position ids

* fixup

* adjust tests since they are failing on main as well

* add a comment to make it clear
2025-09-01 09:10:41 +00:00
0b24507379 processor tests - use dummy videos (#40537)
* use dummy videos

* failing on main, new model merged had conflicts
2025-09-01 09:04:47 +00:00
b0db5a02f3 Set test_all_params_have_gradient=False for DeepseekV2ModelTest (#40566)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 22:46:31 +02:00
1363fceeec remove the redundant non maintained jieba and use rjieba instead (#40383)
* porting not maintained jieba to rjieba

* Fix format

* replaced the line with rjieba instead of removing it

* cut_all is not included as a parameter. cut_all is a seperate function rjieba

* rev

* jieba remove installation

* Trigger tests

* Update tokenization_cpm.py

* Update tokenization_cpm_fast.py

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-30 13:28:52 +02:00
36fddebcee pin pytest-rerunfailures<16.0 (#40561)
ping pytest-rerunfailures<16.0

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-30 12:58:44 +02:00
2d3b8863e8 Fix collated reports upload filename (#40556) 2025-08-30 09:35:51 +02:00
ce48e9cac0 Dev version 2025-08-29 20:17:34 +02:00
155fd926d2 Fix GptOssModelTest::test_assisted_decoding_matches_greedy_search_1_same (#40551)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
2025-08-29 15:53:53 +00:00
1067577ad2 fix gpt-oss out shape (#40535)
* fix out shape

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* reset gpt-oss modeling

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix copies

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 15:20:33 +00:00
7efb4c87ca Flaky CI is annoying (#40543)
* mark flaky

* and the non batch one
2025-08-29 16:47:44 +02:00
828a27fd32 Fix gpt-oss rope warning (#40550)
* fix

* fix print

* rm

* real fix

* fix

* style
2025-08-29 14:40:33 +00:00
74a24217f5 Add bfloat16 support detection for MPS in is_torch_bf16_gpu_available() (#40458)
* Add bfloat16 support detection for MPS (Apple Silicon) in is_torch_bf16_gpu_available

bfloat16 seems to have been supported for a few years now in Metal and torch.mps.

Make sure to allow it and not throw on bf16 usage with "Your setup doesn't support bf16/gpu." from TrainingArguments.

* Check bf16 support for MPS using torch method

Actually seems method exists: 5859edf113/torch/_dynamo/device_interface.py (L519)

It simply checks if you are on MacOs 14 or higher.

* Document Metal emulation for bf16 support

Add note about Metal emulation for bf16 support on M1/M2.

* Update bf16 support check for MPS backend

is_bf16_supported() not exposed even if defined on MPSInterface, use same approach as in accelerate pr.

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-29 14:37:15 +00:00
ffdd10fced Allow compression on meta device (#39039)
* disable gradient calculation for int weights

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* Update src/transformers/quantizers/quantizer_compressed_tensors.py

Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>

* updated model procession before/after weight loading

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* reformat

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

* fix style

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>

---------

Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
2025-08-29 15:49:15 +02:00
f0e778112f Clean-up kernel loading and dispatch (#40542)
* clean

* clean imporrts

* fix imports

* oups

* more imports

* more imports

* more

* move it to integrations

* fix

* style

* fix doc
2025-08-29 14:14:38 +02:00
f68eb5f135 Redundant code removal (#40534)
redundant code
2025-08-29 11:30:23 +00:00
d888bd435d Fix typos (#40511)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-29 11:25:33 +00:00
11a6b95553 Oupsy (#40544)
fix bump!
2025-08-29 12:59:49 +02:00
b07144ac27 tokenizers bump tokenizers version (#40540)
* bump tokenizers version

* use rc0

* ?

* fml

* update
2025-08-29 12:34:41 +02:00
008c0ba8e2 Fix SeamlessM4Tv2ModelWithTextInputTest::test_retain_grad_hidden_states_attentions (#40532)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 23:30:59 +02:00
89ef1b6e0b Set test_all_params_have_gradient=False for HunYuanMoEV1ModelTest (#40530)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 22:32:51 +02:00
2e0f1d6a37 [Qwen Omni/VL] Fix fa tests (#40528)
* fix

* style

* flaky flaky

* flaky flaky

* oopsie, we need the out of place for sure

* flaky flaky

* flaky flaky
2025-08-28 21:07:22 +02:00
68013c505a Improve Gemma3n model and tests (#39764) 2025-08-28 20:25:42 +02:00
ffcb344612 Lazy import torchcodec (#40526)
* lazy import

* parse version

* omg, we need to guard version parse as well
2025-08-28 18:57:14 +02:00
8c7f685079 Fix typo: 'casual' to 'causal' (#40374)
fix typo: 'casual' to 'causal'

Co-authored-by: demo <vamshika0210@gamil.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-28 09:17:37 -07:00
d61fab1549 skip some padding_matches_padding_free_with_position_ids for FA2 (#40521)
skip 1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 17:20:07 +02:00
31336ab750 Fix mistral3 tests after "[Kosmos 2.5] Rename checkpoints" (#40523)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 16:29:54 +02:00
851b8f281d [kernels] If flash attention2 is not installed / fails to import (cc on our cluster) default to kernels (#40178)
* first step if flash not installed but you set to use it

* try importing

* now default to using it

* update our tests as well

* wow yesterday I was not awake

* fixup

* style

* lol the fix was very very simple

* `RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
` for updated dockers

* push review comments

* fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-28 16:20:25 +02:00
de9e2d7a2e Skip some flex attn tests (#40519)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-28 15:43:38 +02:00
7e1aee4db6 [FA] Remaining Cleanup (#40424)
* fa cleanup

* flaky tests

* readd removed test and changeup comments to reflect the purpose

* flaky tests
2025-08-28 15:01:19 +02:00
893d89e5e6 [omni modality] support composite processor config (#38142)
* dump ugly option to check again tomorrow

* tiny update

* do not save as nested dict yet!

* fix and add tests

* fix dia audio tokenizers

* rename the flag and fix new model Evolla

* fix style

* address comments

* broken from different PRp

* fix saving layoutLM

* delete print

* delete!
2025-08-28 14:40:27 +02:00
becab2c601 Use the config for DynamicCache initialization in all modelings (#40420)
* update all

* remove the most horrible old code

* style
2025-08-28 14:32:30 +02:00
8acbbdcadf [serve] fix request_id unexpected (#40501)
* fix request-id in serving

* style

* fix
2025-08-28 14:16:28 +02:00
2300be3b41 sped up gguf tokenizer for nemotron test (#40509)
sped up tokenizer for nemotron test
2025-08-28 12:10:49 +00:00
b2b654afbf correct kes to keys. (#40489)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
2025-08-28 12:00:22 +00:00
476cd7bab1 [vision] Improve keypoint-matching models docs (#40497)
fix options and add inference_mode
2025-08-28 12:31:21 +01:00
1499f9e356 [Kosmos 2.5] Rename checkpoints (#40338) 2025-08-28 13:30:41 +02:00
10ddfb0be5 Add more missing arguments (#40354)
Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-28 12:21:51 +02:00
d10603f701 Add Apertus (#39381)
* init swissai model

* AutoModelForCausalLM

* AutoModelForCausalLM mapping

* qk norm and post ln optional

* fix wrong shape of qk norm: megatron uses head_dim

* automodel fixes

* minor fix in forward

* fix rope validation to accept llama3 scaling

* `SwissAIForTokenClassification` support

* Align `SwissAI` to v4.52.4

* Align `SwissAI` to v4.53.1

* Init CUDA xIELU

* `SwissAI*`->`Apertus*`

* ci fix

* check_docstring ignore ApertusConfig

* Licensing and placeholder tests

* Placeholder doc

* XIELU syntax

* `_xielu_python` optimization

* Fix xIELU

* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint

* Modular `Apertus`

* CUDA xIELU logging

* ci fix

* ci fix

* ci fix

* Update license

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update tests/models/apertus/test_modeling_apertus.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* `.utils.import_utils.is_torchdynamo_compiling`

* `Apertus` class ordering

* `past_key_value{->s}`, `make fix-copies`

* ci fix

* Remove unused configuration parameters

* `{beta,eps}` saved in checkpoint

* `{beta,eps}` Temporarily on CPU

* Suggestions

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* ci fix

* remove fx_compatible (deprecated)

* remove `rotary_embedding_layer`

As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.

* fully removing `Mask4DTestHard` class

Not needed (for now)

* switch to `dtype` instead of `torch_dtype`

Following this:
https://github.com/huggingface/transformers/pull/39782

* remove unused imports

* remove `cache_implementation="static"`

* +Apertus to `docs/source/en/_toctree.yml` for the doc builder

---------

Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
2025-08-28 11:55:43 +02:00
f9b9a5e884 Update quantization overview for XPU (#40331)
* update xpu quantization overview

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix aqlm tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update gguf support

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gguf tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu gguf precision error

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* replace deprecated models

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix import org

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update xpu ggml tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* revert wrong change

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix xpu tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* xpu optimum-quanto goes green

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-28 09:52:59 +00:00
b824f4986f fix typo (#40484)
* fix typo

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* csm & qwen omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* format

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

* Apply style fixes

* omni

Signed-off-by: guochenxu <guochenxu@modelbest.cn>

---------

Signed-off-by: guochenxu <guochenxu@modelbest.cn>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-28 08:31:25 +00:00
c9ff166718 Various AMD expectations (#40510)
* AMD expectations for qwen2

* Added more detailled excpectation to smolvlm

* Added AMD expectations to TableTransformer

* Style
2025-08-28 10:15:21 +02:00
721d4aee81 Include machine type in collated reports filename (#40514) 2025-08-28 09:28:12 +02:00
98289c5546 [modular] Classes can now be defined and referenced in arbitrary order (without bringing unwanted dependencies) (#40507)
* remove future class from dependency graph

* convert all
2025-08-27 23:06:10 +02:00
e3d8fd730e docs(pixtral): Update Pixtral model card to new format (#40442)
* docs(pixtral): Update Pixtral model card to new format

* docs(pixtral): Change cuda into auto for device_map

* docs(pixtral): Apply suggestions from review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review, changing mistral-community into Mistral AI

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Apply suggestions from review [!TIP] part

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Finalize model card with tested code examples

This commit finalizes the update for the Pixtral model card.

* Fix the hfoption by the right one

* @BryanBradfo docs(pixtral): Changing the redirection of bitsandbytes

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Add of ` to highlight the tokens

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* docs(pixtral): Move image block per final review

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-27 11:38:51 -07:00
821384d5d4 Fix the CI workflow of merge to main (#40503)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 18:35:12 +02:00
304225aa15 Collated reports: no need to upload artifact (#40502)
No need to upload collated reports as gh artifact
2025-08-27 18:31:55 +02:00
3c343c6601 [Whisper] Add rocm expected results to certain tests (#40482)
* Add rocm expected results to certain tests

* Specify rocm version in expectations so we know origin. Improved var names

* Update test var names
2025-08-27 16:19:11 +00:00
6350636964 Fix qwen2_moe tests (#40494)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 16:22:04 +02:00
52aaa3f500 [EfficientLoFTR] dynamic image size support (#40329)
* fix: reverted efficientloftr embeddings computation to inference time with lru cache

* fix: added dtype and device for torch ones and zeros creation

* fix: fixed embed height and width computation with aggregation

* fix: make style

* fix error message

* fix fa2 tests

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-27 15:05:08 +01:00
ed5dd2999c [ESM] support attention API (#40370)
* ESM supports attention API

* supports flags

* fix tests

* fix copiees

* another fixup needed after fixing tests

* fix tests and make sure Evolla copied everything

* fix

* order

* forgot about "is_causal" for fa2

* cross attention can't be causal
2025-08-27 15:39:04 +02:00
8b804311ba [modular] Remove ambiguity in all calls to parent class methods + fix dependency graph (#40456)
* fix in modular

* remove leftover print

* fix everything except when it's in assignment

* fix assignment as well

* more general

* better

* better

* better comment

* docstring

* cleaner

* remove base

* doc
2025-08-27 14:51:28 +02:00
a3afebbbbe [modular] Use multi-processing + fix model import issue (#40481)
* add mp and simplify a bit

* improve

* fix

* fix imports

* nit
2025-08-27 14:51:12 +02:00
75d6f17de6 Validate GptOssConfig rope config after it's fully initialized (#40474)
* Validate GptOssConfig rope config after it's fully initialized

Fixes #40461

* Remove whitespaces
2025-08-27 10:16:58 +01:00
80f4c0c6a0 CI when PR merged to main (#40451)
* up

* up

* up

* up

* up

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-27 10:56:18 +02:00
ff8b88a948 Fix nightly torch CI (#40469)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 22:02:15 +02:00
74ad608a2b Not to shock AMD team by the cancelled workflow run notification ❤️ 💖 (#40467) 2025-08-26 20:53:24 +02:00
c8c7623f20 Update SegFormer model card (#40417)
* Update SegFormer model card

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/segformer.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update the segformer model card

* Remove quantization example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-26 08:27:25 -07:00
78f32c3917 [pipeline] Add Keypoint Matching pipeline (#39970)
* feat: keypoint-matcher pipeline

* docs: added keypoint-matcher pipeline in docs

* fix: added missing statements for repo consistency

* docs: updated SuperGlue, LightGlue and EfficientLoFTR docs

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* test: fixed run_pipeline_test

* update pipeline typing and docs

* update tests

* update docs snippets

* Fix import error

* fix: pipeline init

* pt framework

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-26 15:26:57 +01:00
6451294f6f [RoPE] explicit factor > implicit factor in YaRN (#40320)
explicit factor > implicit factor
2025-08-26 14:58:28 +01:00
5a8ba87ecf [fast_image_processor] fix image normalization for resize (#40436) 2025-08-26 13:49:51 +00:00
VED
0ce6709e70 deci gguf support (#38669)
* deci gguf support

* make style

* tests for deci

* try except removed

* style

* try except removed
2025-08-26 13:43:17 +00:00
263d06fedc Fix extra template loading (#40455)
* Fix extra template loading

* Reformat

* Trigger tests
2025-08-26 14:01:01 +01:00
58cebc848b flash_paged: s_aux may not exist (#40434)
Some implementations (i.e.,
https://huggingface.co/kernels-community/vllm-flash-attn3) support an
`s_aux` arg for attention sinks, but others
(https://huggingface.co/kernels-community/flash-attn) do not. If s_aux
is present in the kwargs, we forward it, otherwise we don't.

The user will still get an error if they use a model like gpt-oss-20b
with an implementation that does not support `s_aux`, but models that
don't use it won't error out. For example, [this is currently
failing](399cd5c04b/examples/pytorch/continuous_batching.py (L16))
because we are sending `s_aux: None` in the dict.
2025-08-26 13:15:59 +02:00
34108a2230 Continuous batching refactor (#40426)
* Rework of the CB example

* Further rework of CB example

* Refactor PA cache, slice on tokens, add debug prints -- WIP

* Slice cache -- WIP

* Added a mechanism to check batched outputs in CB script

* Less logging, debug flag for slice, !better reset! -- WIP

* QOL and safety margins

* Refactor and style

* Better saving of cb example

* Fix

* Fixes and QOL

* Mor einformations about metrics

* Further logging

* Style

* Licenses

* Removed some comments

* Add a slice input flag

* Fix in example

* Added back some open-telemetry deps

* Removed some aux function

* Added FA2 option to example script

* Fixed math (all of it)

* Added a simple example

* Renamed core to classes

* Made allocation of attention mask optionnal

* Style
2025-08-26 13:01:42 +02:00
49e168ff08 🚨 Remove Contrastive Search decoding strategy (#40428)
* delete go brrr

* fix tests

* review
2025-08-26 12:31:46 +02:00
b8184b7ce9 Make cache_config not mandatory (#40316)
* Relaxed assumptions on cache_config

* Review compliance

* Style

* Styyyle

* Removed default and added args

* Rebase mishapfix

* Propagate args to TorchExportableModuleForDecoderOnlyLM

* Fix the test I wanted  fixed in this PR

* Added some AMD expectation related to cache tests
2025-08-26 12:06:17 +02:00
32fcc24667 rename get_cuda_warm_up_factor to get_accelerator_warm_up_factor (#40363)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-26 09:56:35 +00:00
f690a2a1e0 [video processors] decode only sampled videos -> less RAM and faster processing (#39600)
* draft update two models for now

* batch update all VLMs first

* update some more image processors

* update

* fix a few tests

* just make CI green for now

* fix copies

* update once more

* update

* unskip the test

* fix these two

* fix torchcodec audio loading

* maybe

* yay, i fixed torchcodec installation and now can actually test it

* fix copies deepseek

* make sure the metadata is returrned when users request it

* add docs

* update

* fixup

* Update src/transformers/audio_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/glm4v/video_processing_glm4v.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* update

* what if we set some metadata attr to `None`

* fix CI

* fix one test

* fix 4 channel test

* fix glm timestemps

* rebase gone wrong

* raise warning once

* fixup

* typo

* fix copies

* ifx smolvlm test

* this is why torch's official benchmark was faster, set threads to `0`

* Apply style fixes

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-26 11:38:02 +02:00
64ae6e6b1d fix qwen25-vl grad acc (#40333)
* fix qwen25—vl grad acc

* fix Qwen2_5_VLForConditionalGeneration for accepts_loss_kwargs

* fix ci

* fix ci

* fix typo

* fix CI
2025-08-26 09:30:06 +00:00
6d2bb1e04d [Trainer] accelerate contextparallel support in trainer (#40205)
* initial context_parallel_size support in trainer

* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens

* use parallelism_config.cp_enabled

* add parallelism_config to trainer state

* warn when auto-enabling FSDP

* fix some reviews

* WIP: somewhat matching loss

* Feat: add back nested_gather

* Feat: cleanup

* Fix: raise on non-sdpa attn

* remove context_parallel_size from TrainingArguments

* if we have parallelism_config, we defer to get_state_dict from accelerate

* fix form review

* Feat: add parallelism config support

* Chore: revert some unwanted formatting changes

* Fix: check None

* Check none 2

* Fix: remove duplicate import

* Update src/transformers/trainer.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/training_args.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Fin

* require accerelate 1.10.1 and higer

---------

Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-26 09:28:48 +00:00
63caaea1fb Refactor ViT-like models (#39816)
* refactor vit

* fix

* fixup

* turn off FX tests

* AST

* deit

* dinov2

* dinov2_with_registers

* dpt

* depth anything (nit)

* depth pro (nit)

* ijepa

* ijepa (modular)

* prompt_depth_anything (nit)

* vilt (nit)

* zoedepth (nit)

* videomae

* vit_mae

* vit_msn

* vivit

* yolos

* eomt

* vitpose

* update auto backbone

* disable `fx` and export tests (dnov2, dpt, ijepa, vit, vitpose)

* fix kwargs for backbone

* fix

* convnext

* fixup

* update convnext layernorm

* fix-copies layer_norm

* convnextv2

* explicit output_hidden_states for models with backbones

* explicit hidden states collection for dinov2

* tests fixed

* fix DPT as well

* fix dinov2 with registers

* add comment
2025-08-26 11:14:06 +02:00
922e65b3fc Fix non FA2 tests after FA2 installed in CI docker image (#40430)
* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

* up

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-26 10:36:50 +02:00
e68146fbe7 Fix collated reports model name entry (#40441) 2025-08-25 20:36:01 +00:00
8ce633cc75 InternVL MI325 test expectations (#40387)
* Adjust ROCm expectations

* MI355

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-25 22:00:35 +02:00
7637d298b3 Fix collated reports uploading (#40440) 2025-08-25 21:49:59 +02:00
fa59cf9c9f Fix https://github.com/huggingface/transformers/issues/40292 (#40439)
* Fix https://github.com/huggingface/transformers/issues/40292

* Trigger tests

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-25 20:12:57 +01:00
f0e87b436d Fix collated reports model directory traversal (#40437)
Fix model dir traversal
2025-08-25 18:01:58 +00:00
ef406902bf Gemma3 text fixes: Add expectations for MI325 (#40384)
* Add expectations for MI325

* Ruff

* Adjust CUDA expectations as well

* Another attempt for CUDA expectations
2025-08-25 19:57:50 +02:00
c81723d31b 🌐 [i18n-KO] Translated models.md to Korean (#39518)
* docs: ko: models.md

* feat: nmt draft

* fix: manual edits

* Resolved _toctree.yaml conflict during merge from main

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Apply suggestions from code review

* fix: update toctree

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-25 09:17:08 -07:00
6b5eab70e4 Remove working-dir from collated reports job (#40435) 2025-08-25 18:14:35 +02:00
1763ef2951 [docs] remove last references to transformers TF classes/methods (#40429)
* halfway through tasks

* complete

* Update utils/check_docstrings.py
2025-08-25 16:30:59 +01:00
eac4f00bdf Fix typo and improve GPU kernel check error message in MXFP4 quantization (#40349) (#40408)
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:55 +00:00
d8f2edcc46 Add tokenizer_kwargs argument to the text generation pipeline (#40364)
* Add `tokenizer_kwargs`  arg to text generation pipeline.

* chore: re-run CI

* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline

* Fix `tokenizer_encode_kwargs` doc string.

* Fix note related to `tokenizer _kwargs` in text generation pipeline

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-25 15:21:19 +00:00
1a35d07f56 Update collated reports working directory and --path (#40433) 2025-08-25 15:18:26 +00:00
399cd5c04b Fix modular for modernbert-decoder (#40431)
* fix the modular

* CI
2025-08-25 16:50:49 +02:00
ea8d9c8f06 🚨 Remove DoLa decoding strategy (#40082)
* remove dola generation strategy

* add fast test
2025-08-25 16:33:27 +02:00
6bf6f8490c [Mxfp4] Add a way to save with a quantization method (#40176)
* add a test

* tempdir

* fix import issue[

* wow I am tired

* properly init

* i am not super familiar with quantizer api :|

* set to TRUE fro now

* full support

* push current changes

* will clean this later but the imports are a shitshow here

* this correctly saves the block and scales but forward seems broken

* quanitze was not correct

* fix storage

* why were bias even included

* finally!

* style

* fix style

* remove print

* lazy import

* up

* not sure what happens this works now?

* holy molly it was not so far

* okay this seems to work!

* workings!!!

* allow save_pretrained to create PR

* Apply suggestions from code review

* fixup

* add deqyabtze fakse as wek

* working new

* fix

* rm swizzle and unswizzle during saving

* rm print

* Update src/transformers/modeling_utils.py

* fix

* style

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
2025-08-25 16:27:19 +02:00
04c2bae3a8 Fix label smoothing incompatibility with multi-label classification (#40296)
* Fix label smoothing incompatibility with multi-label classification (#40258)

* Improve label smoothing multi-label check based on reviewer feedback

- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
2025-08-25 14:23:31 +00:00
3b5b9f6518 Fix processing tests (#40379)
* fix tests

* skip failing test in generation as well

* grounding dino was overwritten

* one more overwritten code

* clear comment
2025-08-25 14:50:54 +02:00
a0a37b3250 Gpt oss optim (#40304)
* enable fast index selecting

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix gpt-oss tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check tensor

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-08-25 14:36:33 +02:00
d73181b3fc Fix UnboundLocalError in WER metric computation (#40402)
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.

Co-authored-by: pranam-gf <pranam@goodfin.com>
2025-08-25 12:02:22 +00:00
11e12a715a Fix typo: 'seperator' to 'separator' in variable names (#40389)
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py

These typos were in variable names used for parsing path components in weight conversion scripts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-25 11:56:30 +00:00
40299134a8 Fix CI (hunyuan moe does not support fullgraph) (#40423)
fix flag
2025-08-25 12:01:28 +02:00
a2b37bfd58 Fix typo: 'casual' -> 'causal' in code and documentation (#40371) (#40407) 2025-08-25 09:32:15 +00:00
0031c044f8 [docs] flax/jax purge (#40372)
flax/jax purge
2025-08-25 10:25:00 +01:00
14b89fed24 fix to accept cumulative_seqlens from TransformersKwargs in FA (#40194)
* fix to the typings which are unmatched to FA function signature

cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching

It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`

* format changes by ruff

* Update src/transformers/integrations/flash_paged.py

unused function arg in `PagedAttentionCache.update`

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* revert continuous_batching signiture, which is more meaningful

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-25 11:00:13 +02:00
ba095d387d 🧹 🧹 🧹 Get set decoder cleanup (#39509)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* this should be explicit

* fix Nth ESM exception

* tryout decoder

* this as well

* revert again

* 🧠

* aaah ESM has two modelings aaah

* broom broom

* format

* wrong copies

* copies

* modular cleanups

* format

* modularities

* wrong mergefix

* seriously

* align with new model

* new model
2025-08-25 10:57:56 +02:00
2c55c7fc94 Reactivate a lot of tests skipped for no reason anymore (#40378)
* reactivate all the tests

* some tests still failing
2025-08-25 10:44:43 +02:00
4f9b4e62bc Run FA2 tests in CI (#40397)
up

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-23 12:30:18 +02:00
28ca27cb2b HF papers in doc (#40381)
* HF papers

* clean

* Update src/transformers/models/gemma3n/configuration_gemma3n.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:07:08 -07:00
7d88f57fc6 Update README_zh-hans.md (#40380)
Fix a typo.
2025-08-22 18:22:26 +00:00
29ddcacea3 Rework the Cache documentation (#40373)
* start working the doc

* remove gemma2

* review
2025-08-22 17:06:28 +02:00
dab66f15a1 Chat Template Doc Fixes (#40173)
* draft commit

* draft commit

* Fixup chat_extras too

* Update conversations.md

* Update the toctree and titles

* Update the writing guide!

* Use @zucchini-nlp's suggestion

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-22 15:48:33 +01:00
0a21e870c7 Bug Fix: Dynamically set return_lse flag in FlexAttention (#40352)
* bug fix - return_lse dynamically set

* addressed compatibility with return type - flex_attention_forward

* rename variables

* revert changes to commits
2025-08-22 13:49:26 +00:00
894b2d84b6 Add GptOssForTokenClassification for GPT-OSS models (#40190)
* Add GptOssForTokenClassification for GPT-OSS models

* After run make fixup
2025-08-22 15:14:46 +02:00
56d68c6706 Addiing ByteDance Seed Seed-OSS (#40272)
add seed oss
2025-08-22 14:54:28 +02:00
8a6908c10d fix(example): align parameter names with the latest function definition for gdino (#40369) 2025-08-22 12:27:58 +00:00
7db228a92a [configuration] allow to overwrite kwargs from subconfigs (#40241)
allow to overwrite kwargs from subconfigs
2025-08-22 13:31:25 +02:00
19ffe0219d [processor] move commonalities to mixin (#40339)
* move commonalities to mixin

* revert - unrelated

* fix copies

* fix style

* comments
2025-08-22 13:04:43 +02:00
d8f6d3790a ⚠️⚠️ Use dtype instead of torch_dtype everywhere! (#39782)
* update everywhere

* style

* pipelines

* switch it everywhere in tests

* switch it everywhere in docs

* switch in converters everywhere

* update in examples

* update in model docstrings

* style

* warnings

* style

* Update configuration_utils.py

* fix

* Update configuration_utils.py

* fixes and add first test

* add pipeline tests

* Update test_pipelines_common.py

* add config test

* Update test_modeling_common.py

* add new ones

* post rebase

* add new

* post rebase adds
2025-08-22 12:34:16 +02:00
9c25820978 [pipelines] add support to skip_special_tokens in the main text generation pipelines (#40356)
* add support to skip_special_tokens in pipelines

* add test

* rm redundant
2025-08-22 10:12:46 +00:00
5c40e7a225 Change multimodal data links to HF hub (#40309)
change multimodal data links to HF hub
2025-08-22 11:50:04 +02:00
e018b77c89 wav2vec2 fixes (#40341)
* Changed datasets to avoid a datasets error

* Changed back split to test
2025-08-22 11:32:29 +02:00
d7fe3111ff Fix idefics3 vision embeddings indices dtype (#40360)
fix idefics3 vision embeddings

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 11:10:45 +02:00
cf487cdf1f HunYuan opensource (#39606)
* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (#5)

* fix attention

* use llamamodel

* fix code

* Fix qk (#6)

* fix qk_norm

* fix

* fix modual

* Fix moe (#7)

* fix some moe code

* fix einsum

* try top1

* use top1

* Fix rotary (#8)

* fix rotary

* fix modeling

* fix modular

* fix testcode

* remove A13B unit test

* Fix moe v1 (#9)

fix moe & gate

* Fix gate norm (#10)

* add norm_topk_prob

* Fix testcase (#11)

* fix&skip test

* Fix testcase (#12)


* skip testcase

* Fix norm topk (#13)

* hardcode norm_topk_prob

* fix testcase

---------

Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Mingji Han <mingjihan@tencent.com>
2025-08-22 07:59:58 +00:00
8365f70e92 DOCS: Clarification on the use of label_names as an argument to TrainingArguments (#40353)
* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-21 17:19:04 -07:00
7c1169e21f [4/N]more docs to device agnostic (#40355)
* more docs to device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* more

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 1

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* 2

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-08-21 10:22:26 -07:00
9568b506ed [generate] handle support for cache classes when num enc layers != num dec layers (#40277)
* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o
2025-08-21 17:35:18 +01:00
7f38068ae0 Qwen2.5-VL test fixes for ROCm (#40308) 2025-08-21 18:13:07 +02:00
cb1df4d26a [FA] Fix some model tests (#40350)
* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit
2025-08-21 18:08:21 +02:00
f46f29dd7c Remove more PyTorch 2.2 compatible code (#40337)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 15:19:53 +00:00
128f42d370 [detection] use consistent dtype for Conditional and DAB DETR positional embeddings (#40300)
fix: use consistent dtype for sine positional embeddings
2025-08-21 15:49:56 +01:00
2121d09239 [serve] add cors warnings (#40112)
* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-21 14:32:36 +01:00
b40b834ab1 Clean up XCodec and other codecs (#40348)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-08-21 15:32:00 +02:00
75aa7c7252 [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification (#35991)
* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion
2025-08-21 15:16:03 +02:00
04b751f07d Fix attention vizualizer (#40285)
* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans
2025-08-21 13:13:35 +00:00
cyn
1e1db12304 (small) fix conditional for input_ids and input_embeds in marian (#40045)
* (small) fix conditional for input_ids and input_embeds in marian

* address comment
2025-08-21 15:13:14 +02:00
7f2f53424e Update test_spm_converter_bytefallback_warning (#40284)
fff

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-21 14:09:28 +02:00
11a49dd9e3 T5 test and target device fixes (#40313)
* Fix cache setup related issues

* Fix target-device-related issues

* Ruff

* Address review comments
2025-08-21 14:07:29 +02:00
c4513a9fe6 Fix links in Glm4vMoe configuration classes to point to the correct H… (#40310)
* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository

* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
2025-08-21 11:42:53 +00:00
c7e6f9a485 Fix an infinite loop bug in recursive search of relative imports (#40326)
Fix bug in recursive search of relative imports
2025-08-21 11:39:43 +00:00
e95441bdb5 add type hints (#40319)
* add basic type hints to import module

* run make fixup

* remove optional

* fixes

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-21 12:19:59 +01:00
5c88d8fbcc Fix: Only call Trainer.align_special_tokens if model has "config" attribute (#40322)
* Only call Trainer.align_special_tokens if model has "config" attribute

* Add efficient test for training a model without model.config

* Reformat
2025-08-21 12:06:42 +01:00
c031f6f994 [docs] remove TF references from /en/model_doc (#40344)
* models up to F

* models up to M

* all models
2025-08-21 11:53:21 +01:00
7b060e5eb7 Add missing arguments to class constructors (#40068)
* Add missing arguments

Signed-off-by: cyy <cyyever@outlook.com>

* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 10:22:38 +00:00
6ad7f29461 Fix deprecation warning version (#40343)
fix
2025-08-21 12:18:23 +02:00
adf84aec21 Add DeepseekV3ForSequenceClassification for Deepseek V3 models (#40200)
* Add Sequence Classification Support for Deepseek v3 model DeepseekV3ForSequenceClassification

* After run make fixup
2025-08-21 12:01:33 +02:00
1e2e28f3c8 Change Qwen2RMSNorm to RMSNorm from PyTorch (#40066)
* Unify Qwen2RMSNorm definitions and use RMSNorm from PyTorch

Signed-off-by: cyy <cyyever@outlook.com>

* subclass RMSNorm

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-21 11:58:35 +02:00
022af24fcc Fix qwen-omni processor text only mode (#40336)
* Fix qwen-omni processor text only mode

* remove try except

---------

Co-authored-by: yuekaiz <yuekaiz@mgmt1-login.cm.cluster>
2025-08-21 11:57:32 +02:00
c99ed492c7 [docs] remove flax references from /en/model_doc (#40311)
* 1st commit

* all models up to D

* all models up to G

* all models up to M

* all remaining models
2025-08-21 10:52:54 +01:00
c2e3cc24e0 Fix chunked attention mask with left-padding (#40324)
* add fix

* add test

* raise proper warning for older versions

* fix

* fix and add 2nd test

* fix for flex and torch 2.5
2025-08-21 10:52:49 +02:00
242bb2cafc One cache class to rule them all (#40276)
* remove all classes

* fix generate

* start replacing everywhere

* finish removing everywhere

* typo

* typo

* fix

* typo

* remove num_layers=1

* CI

* fix all docstrings

* review

* style
2025-08-20 19:36:11 +02:00
1054494dd6 Update notification service amd_daily_ci_workflows definition (#40314) 2025-08-20 17:49:46 +02:00
139cd91713 Fix: Apply get_placeholder_mask in Ovis2 (#40280)
* Refactor special image mask

* Refactor get_placeholder_mask method

* Revert "Refactor special image mask"

This reverts commit 9eb1828ae930329656d6f323a510c5e6033e1f85.

* Fix

* Revert "Refactor get_placeholder_mask method"

This reverts commit 07aad6484bb08d6351d5b605e9db574d28edcd15.
2025-08-20 17:12:10 +02:00
5d906740d2 Update CI with nightly torch workflow file (#40306)
* fix nightly ci

* Apply suggestions from code review

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-08-20 16:59:00 +02:00
4977ec2ae8 [GPT OSS] Refactor the tests as it was not properly checking the outputs (#40288)
* it was long due!

* use the official kernel

* more permissive

* update the kernel as well

* mmm should it be this?

* up pu

* fixup

* Update test_modeling_gpt_oss.py

* style

* start with 20b
2025-08-20 16:47:41 +02:00
3b7230124b No more natten (#40287)
get rid off natten

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 16:10:15 +02:00
2df0c323cb byebye torch 2.1 (#40317)
* Bump minimum torch version to 2.2

* Remove is_torch_greater_or_equal_than_2_2

* update versions table

* Deprecate is_torch_sdpa_available (except for backward compat), remove require_torch_sdpa
2025-08-20 15:03:46 +01:00
c50f140be2 Add back _tp_plan attribute (#39944)
* Update modeling_utils.py

* make sure we update with the module's plan

* use public api

* oups

* update

* fix failing test

* Update src/transformers/integrations/tensor_parallel.py

* Update src/transformers/integrations/tensor_parallel.py

* fix

* make the API more friendly!

* fix tests

* fix styling

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-20 15:29:55 +02:00
a97213d131 Qwen2.5-Omni test fixes (#40307)
Updated expectations, and mp tests
2025-08-20 14:48:30 +02:00
ca543f822f Add support for Florence-2 (#38188)
* init

* add modular

* fixup

* update configuration

* add processing file

* update auto files

* update

* update modular

* green setup_and_quality ci

* it works

* fix some tests

* commit florence2

* update test

* make test cases done - 16 left

* style

* fix few test cases

* fix some tests

* fix init test

* update florence2 vision style

* hope is green

* fix init test

* fix init

* update modular

* refactor vision module

* fix: channel attention use dynamic scale

* update modular

* update

* update attention mask

* update

* fix naming

* Update src/transformers/models/florence2/processing_florence2.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* spatial block works

* more beautiful

* more more beautiful

* merge main

* merge main and fixup

* fix typing hint

* update modeling

* fix eager matches sdpa

* fix style

* fix compile test - all green

* remove florence2 language

* remove Florence2LanguageModel things

* fix style

* update florence2 model

* override prepare encoder_decoder for generation

* add weight conversion script

* rewrite channel attention to use sdpa

* eleminate 1 tranpose op

* support fa2

* fix quality check

* chore: reformat `test_modeling_florence2.py`

* some refactor for processor

* some refactor for processor

* update naming convention and remove BC

* make it pass the test

* fix: correct Embedding Cosine

* update comments and docstring

* support input_embeds

* support input embeds ideally

* fix style

* fix style

* fix style again :D

* add test prcoessor

* refactor processor and add test for processor

* reformat test processor

* make fixup

* fix schema check

* remove image_token

* ensure image token in tokenizer and fix integration tests

* fix processor test

* add more integration tests for large model and rename test_processor to test_processing

* test_assisted_decoding_sample should pass

* update doc and make model work with image text to text pipeline

* docs: add sdpa bagde

* resolve cyril's comments

* fix import torch error

* add helper get_placeholder_mask

* inherit from llava

* florence2 may not _supports_attention_backend because of bart ...

* move florence2 model card to multimodal

* let base model always return_dict

* fix style

* tiny update doc

* set   _checkpoint_conversion_mapping = {}

* fix code quality

* support flex and compile graph and move external func to internal func

* remove condition because it always true

* remove window funcs

* move post processor config out

* fix ci

* new intro to trigger test

* remove `kernel_size` argument

---------

Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 14:28:06 +02:00
959239debc Remove unnecessary contiguous calls for modern torch (#40315) 2025-08-20 12:24:14 +00:00
7d2aa5d6e6 🚨 [Flash Attention] Fix sliding window size (#40163)
* swa fix

* add comment, make fix symmetrical

* modify fa inference test to force swa correctness check

* fixup comment
2025-08-20 14:23:14 +02:00
3128db6927 chore: fix typo in find_executable_batch_size to match new 0.9 ratio (#40206) 2025-08-20 12:18:06 +00:00
ca0aaa8c74 [fix] Pass adamw optimizer parameters to StableAdamW (#40184)
* fix: pass adamw optimizer parameters to StableAdamW

* add test for stable_adamw initialization with trainer arguments

* address copilot suggestion

* fix: update weight_decay handling in stable_adamw kwargs

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-20 11:52:23 +00:00
a01f38b364 Fix GOT-OCR2 and Cohere2Vision image processor patches caculation (#40312)
fix got-ocr patches caculation

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-20 13:13:58 +02:00
a5f0b505a0 Remove OTel SDK dependencies (#40305) 2025-08-20 12:31:44 +02:00
d0f1a6ec36 Clean up X-Codec. (#40271)
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.
2025-08-20 12:16:28 +02:00
da9452a592 [docs] delete more TF/Flax docs (#40289)
* delete some TF docs

* update documentation checks to ignore tf/flax

* a few more removals

* nit

* Update utils/check_repo.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-08-20 10:44:14 +01:00
a4e1fee44d [FA] Fix dtype in varlen with position ids (#40295)
fix
2025-08-20 11:15:55 +02:00
126bc03b4e Allow to be able to run torch.compile tests with fullgraph=True (#40164)
* fix

* address comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-20 10:42:33 +02:00
1d46091737 Add MetaCLIP 2 (#39826)
* First draft

* Make fixup

* Use eos_token_id

* Improve tests

* Update clip

* Make fixup

* Fix processor tests

* Add conversion script

* Update docs

* Update tokenization_auto

* Make fixup

* Use check_model_inputs

* Rename to lowercase

* Undo CLIP changes

* Address comment

* Convert all checkpoints

* Update auto files

* Rename checkpoints
2025-08-20 09:25:43 +02:00
0f9c9088d0 [3/3] make docs device agnostic, all en docs for existing models done (#40298)
docs to device agnostic cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-08-19 21:01:27 -07:00
eaa48c81e9 make model docs device agnostic (2) (#40256)
* doc cont.

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* more models

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/quicktour.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mixtral.md

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 13:10:03 -07:00
42fe769928 SmolVLM test fixes (#40275)
* Fix SmolVLM tests

* Add the proper CUDA expectations as well

* Split 'A10 and A100 expectations

* Ruff

---------

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-19 21:22:06 +02:00
4c017465bd Adjust ROCm test output expectations (#40279)
Adjust ROCm output expectations
2025-08-19 21:21:45 +02:00
0f9ce43687 Standardize BertGeneration model card (#40250)
* Standardize BertGeneration model card: new format, usage examples, quantization

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/bert-generation.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply reviewer feedback: update code examples

* Add missing code example

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:22:13 -07:00
6ceb13fb22 SmolVLM and InternVL: Ensure pixel values are converted to the correct dtype for fp16/bf16 (#40121)
* Ensure pixel values are converted to the correct dtype for fp16/bf16

* add to modular
2025-08-19 10:39:08 -07:00
92f40da608 Update model card for gpt neox japanese (#39862)
* Update GPT-NeoX-Japanese model card

* Apply suggestions from code review

* Update gpt_neox_japanese.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 09:18:46 -07:00
3a4b2756cf docs: Update TrOCR model card to new format (#40240)
* docs: Update TrOCR model card to new format

* Updated Sugegestions
2025-08-19 09:17:45 -07:00
46d38546f3 Standardize RAG model card (#40222)
* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge

* Standardize RAG model card

Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig
2025-08-19 09:16:10 -07:00
bd96e1e1cc docs(layoutlm): add missing id=usage to <hfoptions> tag in LayoutLM model card (#40273)
docs(layoutlm): add missing 'id=usage' to <hfoptions> tag in LayoutLM model card
2025-08-19 09:14:43 -07:00
8636b309e6 Fix chat CLI GPU loading and request_id validation issues (#40230) (#40232)
* Fix chat CLI GPU loading and request_id validation issues (#40230)

This commit addresses two critical bugs in the transformers chat CLI:

1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
   - Chat CLI now automatically uses GPU when available instead of defaulting to CPU
   - Matches the behavior of the underlying serving infrastructure

2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
   - Fixes "Unexpected keys in the request: {'request_id'}" error on second message
   - Allows request_id to be properly sent and validated by the server

Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message

* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming
2025-08-19 15:33:44 +00:00
bebeccb06a fix which routing method (#40283) 2025-08-19 16:35:13 +02:00
249d7c6929 Update image_processing_perception_lm_fast.py to allow for proper override of vision_input_type (#40252)
* Update image_processing_perception_lm_fast.py

Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.

* Update processing_perception_lm.py to match kwargs vision input type

* Update image_processing_perception_lm_fast.py kwargs to signature args
2025-08-19 11:41:27 +00:00
r0
57bb6db6ee Skipping pytree registration in case fsdp is enabled (#40075)
* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Skipping pytree registration in case fsdp is enabled

* Beauty changes

* Beauty changes

* Moved the is_fsdp_available function to import utils

* Moved is_fsdp_available to integrations.fsdp

* Added pytree registration inside dynamic cache class

* Making ci/cd lords happy

* Adding a check if DynamicCache is already a leaf

* Adding try/catch for multiple initializations of DynamicCache in test suites

* Moving dynamic cache pytree registration to executorch

* Adding try catch back
2025-08-19 11:58:05 +02:00
5b3b7ea472 Add Kosmos-2.5 (#31711)
Add Microsoft Kosmos-2.5

---------

Co-authored-by: kirp@umich.edu <tic-top>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-19 11:56:03 +02:00
c93594e286 [detection] fix correct k_proj weight and bias slicing in D-FINE (#40257)
Fix: correct k_proj weight and bias conversion in D-FINE
2025-08-19 09:44:37 +00:00
2f1a8ad4ba Fix setting attention for multimodal models (#39984)
* fix

* use non-explicit `None`

* keep previously set attn if exists
2025-08-19 11:35:11 +02:00
a2e76b908b 🚨🚨 Switch default compilation to fullgraph=False (#40137)
* switch default

* docstring

* docstring

* rework tests and remove outdated restrictions

* simplify

* we need a check for static cache

* fix

* rename var

* fix

* revert

* style

* rename test
2025-08-19 11:26:22 +02:00
2b59207a72 Fix slow static cache export tests (#40261) 2025-08-19 11:24:07 +02:00
56c44213b3 [detection] fix attention mask for RT-DETR-based models (#40269)
* Fix get_contrastive_denoising_training_group attention

* Add bool attention_mask conversion
2025-08-19 09:15:56 +00:00
5d9a715e30 set inputs_embeds to None while generate to avoid audio encoder forward in generation process (#40248)
* set inputs_embeds to None while generate to avoid audio encoder forward in generation process

* set input_features to none instead

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-08-19 08:45:57 +00:00
28746cdc7b Remove MI300 CI (#40270)
Remove MI300 CI (in history if we need it back)
2025-08-19 08:23:39 +00:00
debc92e60a Skip broken tests (#40157)
skip these tests
2025-08-19 10:04:08 +02:00
6b5bd11723 docs: Update OLMo model card (#40233)
* Updated OLMo model card

* Update OLMo description

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli typo

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix cli example

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Add bitsandbytes info

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 13:35:39 -07:00
e472efb9ac Fix benchmark workflow (#40254)
Correct init_db.sql path

Co-authored-by: Akos Hadnagy <akoshuggingface@mi325x8-123.atl1.do.cpe.ice.amd.com>
2025-08-18 18:14:16 +00:00
59862209ca Correct typo and update notes in docs Readme (#40234)
* Correct typo and update notes in docs readme

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:31:12 -07:00
a7eabf1dde Model card for NLLB (#40074)
* initializing branch and draft PR

* updated model card .md file

* minor

* minor

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* resolving comments + adding visuals

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

suggestion

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/nllb.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* NllbTokenizerFast and NllbTokenizer added

* endline

* minor

* Update nllb.md

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-18 10:05:59 -07:00
01c03bf4ee fix: Catch correct ConnectionError for additional_chat_templates (#39874)
* fix: Catch correct ConnectionError for additional_chat_templates

* fix: don't catch timeout

* fix: formatting
2025-08-18 17:25:47 +01:00
2bcf9f6c7e Fixes for EncoderDecoderCache (#40008)
* Add expectation to t5 for rocm 9.4

* Made EncoderDecoderCache compatible with nn.DataParallel

* Fixed t5gemma EncoderDecoderCache

* Added todos in autoformer

* Ruff

* Init is self-contained

* Review compliance

* Fixed kwargs init of EncoderDecoderCache
2025-08-18 17:51:05 +02:00
aa45824919 [CI] Fix repo consistency (#40249)
* fix

* doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 17:32:17 +02:00
d6fad86d23 [serve] guard imports (#39825)
guard imports
2025-08-18 16:28:10 +01:00
MQY
7a0ba0d7d8 [typing] fix type annotation error in DepthPro model image processor (#40238)
* fix type annotation error in DepthPro model image processor

* fix

* run make fix-copies
2025-08-18 15:42:13 +01:00
00b4dfb786 Add chat_template (jinja2) as an extra dependency (#40128)
* add jinja2 as a dependency

* Make jinja2 a core dependency in install_requires

- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* Make jinja2 a core dependency in install_requires

* Make jinja2 an extra dependency instead of adding a core dep

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-08-18 14:31:40 +00:00
f417a1aad4 remove transpose_for_scores call in ESM-2 (#40210)
* remove transpose_for_scores call

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

* fix copied evolla code

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-08-18 14:28:59 +00:00
a36d51e801 🚨 Always return Cache objects in modelings (to align with generate) (#39765)
* watch the world burn

* fix models, pipelines

* make the error a warning

* remove kwargs and return_legacy_cache

* fix reformer
2025-08-18 16:26:35 +02:00
57e230cdb2 Fix more pylint warnings (#40204)
Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 14:17:16 +00:00
47938f8f8d Add Ovis2 model and processor implementation (#37088)
* Add Ovis2 model and processor implementation

* Apply style fixes

* Add unit tests for Ovis2 image processing and processor

* Refactor image processing functions for clarity and efficiency

* Add Ovis2 ImageProcessorFast

* Refactor Ovis2 code

* Refactor Ovis2 model components and update processor functionality

* Fix repo consistency issues for Ovis2: docstring, config cleanup

* Update Ovis2 model integration tests

* Update Ovis2 configuration and processing classes for improved documentation

* Remove duplicate entry for 'ovis2' in VLM_CLASS_NAMES

* Fix conflict

* Fix import order

* Update image processor class names

* Update Ovis2 model structure

* Refactor Ovis2 configuration

* Fix typos

* Refactor Ovis2 model classes and remove unused code

* Fix typos

* Refactor Ovis2 model initialization

* Fiix typos

* Remove Ovis2 model mapping from MODEL_MAPPING_NAMES in modeling_auto.py

* Add license and update type hints

* Refactor token function and update docstring handling

* Add license

* Add Ovis2 model support and update documentation

* Refactor Ovis2 model structure and enhance multimodal capabilities

* Update Ovis2 weight mapping for consistency and clarity in key patterns

* Remove unused 'grids' parameter from Ovis2 model and Update processing logic to handle image grids more efficiently.

* Refactor Ovis2 model test structure to include Ovis2Model

* Add optional disable_grouping param to Ovis2ImageProcessorFast

* Refactor type hints in Ovis2 modules

* Add licensing information in Ovis2 modules and tests

* Refactor Ovis2 model by removing unused methods

* Refactor Ovis2 model tests by renaming test classes and removing skipped tests

* Refactor Ovis2 model output classes

* Refactor Ovis2 weight conversion and Update model embedding classes

* Refactor Ovis2 model imports and remove unused functions

* Enhance vision configuration extraction in Ovis2 weight conversion

* Refactor Ovis2 model's forward method to remove interpolation option

* Update Ovis2 model documentation

* Refactor Ovis2 model input handling and tokenizer configuration

* Update return type hints in Ovis2 model

* Remove commented-out code

* fix config for tests and remove key mappings

* Update tokenizer configuration to use add_special_tokens method

* skip torchscript

* Fix image placeholder generation in Ovis2Processor

* Refactor Ovis2 model to rename visual_table to visual_embeddings_table

* Enhance Ovis2 model by adding vision_feature_select_strategy parameter

* Refactor Ovis2 model weights conversion and architecture

* Refactor Ovis2 model by removing vision_feature_select_strategy parameter

* Update Ovis2 model examples

* Refactor Ovis2 model

* Update Ovis2 model

* Update Ovis2 model configuration

* Refactor Ovis2 model test setup

* Refactor flash attention support

* Refactor

* Fix typo

* Refactor

* Refactor model classes

* Update expected output in Ovis2

* Refactor docstrings

* Fix

* Fix

* Fix

* Update input in tests

* Fix

* Fix get_decoder method

* Refactor

* Refactor Ovis2

* Fix

* Fix

* Fix test

* Add get_placeholder_mask

* Refactor Ovis2 model tests

* Fix

* Refactor

* Fix

* Fix

* Fix Ovis2 test

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-18 16:05:49 +02:00
2fe43376cd AMD scheduled CI ref env file (#40243)
* Reference env-file to be used in docker running the CI

* Disable MI300 CI for now
2025-08-18 15:23:27 +02:00
e4bd2c858d Fix ESM token_dropout crash when using inputs_embeds instead of input_ids (#40181)
* fix: Error after calling ESM model with input embeddings not input ids

* propagate changes to other models
2025-08-18 13:22:10 +00:00
6333eb986a Fix more typos (#40212)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-18 12:52:12 +00:00
e5886f9194 [SAM 2] Change checkpoints in docs and tests (#40213)
* change checkpoints in docs and tests

* add notebook
2025-08-18 11:21:34 +02:00
eb2f9da096 fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function (#40130)
* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* fix similar errer at qwen2_vl and do make fix-copies

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>

* Apply style fixes

---------

Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-18 08:59:25 +00:00
6ce8f05375 Use correct model_input_names for PixtralImageProcessor (#40226)
add image_sizes to model_input_names
2025-08-18 08:06:52 +00:00
2914ceca20 Revert "Pin torch to 2.7.1 on CircleCI for now" + Final fix for too long with no output (#40201)
* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"

This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-18 08:40:53 +02:00
cd22550692 docs: Update LayoutLM model card according to new standardized format (#40129)
* docs: Update LayoutLM model card with standardized format

* Apply suggestions from code review

This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Address remaining review comments

* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header

* Update layoutlm.md

* Update layoutlm.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-15 09:33:47 -07:00
05000aefe1 Fix GPT-OSS swiglu_limit not passed in for MXFP4 (#40197)
Add swiglu_limit = 7.0
2025-08-15 17:04:25 +02:00
3f4c85fef0 Add X-Codec model (#38248)
* add working x-codec

* nit

* fix styling + copies

* fix docstring

* fix docstring and config attribute

* Update args + config

* update convertion script

* update docs + cleanup

* Ruff fix

* fix doctrings
2025-08-15 16:24:12 +02:00
29e4e35927 Benchmarking improvements (#39768)
* Start revamping benchmarking

* Start refactoring benchmarking

* Use Pandas for CSV

* import fix

* Remove benchmark files

* Remove sample data

* Address review comments
2025-08-15 15:59:11 +02:00
de437d0d7a Update: add type hints to check_tokenizers.py (#40094)
* Update check_tokenizers.py

chore(typing): add type hints to check_tokenizers script

- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)

* Update check_tokenizers.py

chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py

- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic
2025-08-15 12:41:28 +00:00
28a03fb78a Fix various Pylint warnings (#40107)
Tidy code

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:40:12 +00:00
ec85d2c44f Avoid CUDA stream sync (#40060)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:37:15 +00:00
c7afaa5b44 Remove _prepare_flash_attention_from_position_ids (#40069)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:35:03 +00:00
c167faa081 Fix typos (#40175)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-15 12:10:26 +00:00
5068fcd9a8 Add repr to EncoderDecoderCache (#40195)
* add repr

* oups
2025-08-15 12:57:49 +02:00
421175685d Fix fsdp for generic-task models (#40191)
* remove abc inheritance

* add fast test
2025-08-15 12:28:16 +02:00
4912d5b490 fix to avoid modifying a view in place (#40162)
* fix to avoid modifying a view in place

* add backward test in tensor parallel

* add test to test_modelig_gpt_oss.py

* linting
2025-08-15 10:30:49 +02:00
cc9997878a make model doc device agnostic (#40143)
* make model doc device agnostic

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update align.md

* Update aya_vision.md

* Update byt5.md

* refine

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update granitevision.md

* Update src/transformers/pytorch_utils.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* add doc

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* 3 more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 23:31:31 -07:00
85fce2e54c [MINOR:TYPO] Update base.py (#40169)
* [MINOR:TYPO] Update base.py

All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)

Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns:  `ValueError: The task does not provide any default models for options ('EN', 'FR')`

It might be a good idea to allow for uppercase, but that's for another issue.

* [MINOR:TYPO] Update __init__.py
2025-08-14 22:53:57 -07:00
52c6c1bb6e Update dynamic attnt setter for multimodals (#39908)
* update

* fix the test for DepthPro

* PR comments

* wait, I didn't delete this in prev commit?

* fix

* better way

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-08-14 21:46:13 +02:00
31b6e6e1da Pin torch to 2.7.1 on CircleCI for now (#40174)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-14 20:19:35 +02:00
b02f2d8b6a Add dates to the model docs (#39320)
* added dates to the models with a single hf papers link

* added the dates for models with multiple papers

* half of no_papers models done

* rest of no_papers models also done, only the exceptions left

* added copyright disclaimer to sam_hw, cohere, cohere2 + dates

* some more fixes, hf links + typo

* some new models + a rough script

* the script looks robust, changed all paper links to hf

* minor change to handle technical reports along with blogs

* ran make fixup to remove the white space

* refactor
2025-08-14 10:08:46 -07:00
8a658ac119 Standardize BARTpho model card: badges, new examples, fixed broken im… (#40051)
* Standardize BARTpho model card: badges, new examples, fixed broken image section, and links (#36979)Update bartpho.md

* Update bartpho.md

Removed non-required/unsupported sections: Quantization, Attention visualizer, and Resources (plus stray tokenizer header).

Added code snippets which were suggested

* Update bartpho.md

Updated with necessary tags

* Update bartpho.md

* Update bartpho.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-14 09:55:27 -07:00
2b6cbedeb2 Add GptOssForSequenceClassification for GPT-OSS models (#40043)
* Add GptOssForSequenceClassification

* Tiny fix

* make fixup

* trigger CI rerun

* Check config type instead

---------

Co-authored-by: Yuefeng Zhan <yuefzh@microsoft.com>
2025-08-14 18:32:14 +02:00
b834cb8138 build: Add fast image processor tvp (#39529)
* build: add TvpImageProcessorFast

- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.

* fix: TvpImageProcessorFast with new resize method and update processing logic

* build: add TvpImageProcessorFast

* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests

- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.

* fix: Enhance TvpFastImageProcessorKwargs and update documentation

- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.

* fix: tested now with python 3.9

* fix: remove tvp kwargs from docs

* simplify processing

* remove import and fix tests

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-14 15:48:18 +00:00
6f259bc83e Fix docs typo (#40167)
* DINOv3 model

* working version

* linter revert

* linter revert

* linter revert

* fix init

* remove flex and add convert to hf script

* DINOv3 convnext

* working version of convnext

* adding to auto

* Dinov3 -> DINOv3

* PR feedback

* complete convert checkpoint

* fix assertion

* bf16 -> fp32

* add fast image processor

* fixup

* change conversion script

* Use Pixtral attention

* minor renaming

* simplify intermediates capturing

* refactor DINOv3ViTPatchEmbeddings

* Refactor DINOv3ViTEmbeddings

* [WIP] rope: remove unused params

* [WIP] rope: rename period -> inv_freq for consistency

* [WIP] rope: move augs

* change inv_freq init (not persistent anymore)

* [WIP] rope: move coords to init

* rope - done!

* use default LayerScale

* conversion: truncate expected outputs

* remove commented code

* Refactor MLP layers

* nit

* clean up config params

* nit docs

* simplify embeddings

* simplify compile compat lru_cache

* fixup

* dynamic patch coords

* move augmentation

* Fix docs

* fixup and type hints

* fix output capturing

* fix tests

* fixup

* fix auto mappings

* Add draft docs

* fix dtype cast issue

* add push to hub

* add image processor tests

* fixup

* add modular

* update modular

* convert and test convnext

* update conversion script

* update prefix

* Update LayerNorm

* refactor DINOv3ConvNextLayer

* rename

* refactor convnext model

* fix doc check

* fix docs

* fix convnext config

* tmp fix for check docstring

* remove unused arg

* fix tests

* (nit) change init

* standardize gated MLP

* clear namings and sat493m

* fix tensors on different devices

* revert linter

* pr

* pr feedbak ruff format

* missing headers

* fix code snippet and collection link in docs

* DINOv3 description

* fix checkpoints in tests

* not doc fixes in configs

* output_hidden_states

* x -> features

* remove sequential

---------

Co-authored-by: Cijo Jose <cijose@meta.com>
2025-08-14 17:29:53 +02:00
41980ce93e [bugfix] fix flash-attention2 unavailable error for Ascend NPU (#40151)
* [bugfix] fix flash-attention2 unavailable error for Ascend NPU

* remove redundant apply_rotary_emb usage

* fix ruff check error

* pad_input and unpad_input use same implementation as fa2

* rollback redundant codes

* fix ruff check error

* optimize fa2 judgement logic
2025-08-14 14:21:39 +02:00
eba1d62091 [FA2] Fix it finally - revert fa kwargs preparation (#40161)
revert
2025-08-14 13:39:11 +02:00
1c5d2f7fb6 Replace self.tokenizer by self.processing_class (#40119) 2025-08-14 13:24:55 +02:00
cfe52ff4db [Continous Batching] set head_dim when config.head_dim is None (#40159)
* set head_dim when config.head_dim is None

* use model's actual TP setting
2025-08-14 13:23:27 +02:00
c47544b16f Fix CI: Use correct import in SAM for torchvision InterpolationMode (#40160)
fix ci
2025-08-14 10:53:23 +00:00
22e89e5385 [efficientloftr] fix bugs and follow original cross attn implementation strictly (#40141)
* fix: changed is_causal to be False

* fix: Added original cross attention bug

* fix: fixed the way bordel removal is computed

* fix: added missing normalization on coarse features

* test: fixed integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-14 10:42:59 +01:00
252364fd8e [Cohere2Vision] remove unused arg (#40103)
* remove unused arg

* remove the arg from test as well
2025-08-14 09:10:25 +00:00
e446372f76 Create self-scheduled-amd-mi355-caller.yml (#40134) 2025-08-14 01:33:45 +02:00
be1ab5103f Update Dockerfiles to install packages inside a virtual environment (#39098)
* Removed un-necessary virtual environment creation in Dockerfiles.

* Updated Dockerfiles to install packages in a virtual environment.

* use venv's python

* update

* build and trigger

* trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* build and trigger

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:51:52 +02:00
591708d9ce Add pytest marker: torch_compile_test and torch_export_test (#39950)
* new marker

* trigger CI

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-13 23:47:15 +02:00
12e49cda32 Fix quantized cache with only cache_implementation in generate (#40144)
* fix args

* comment
2025-08-13 23:21:41 +02:00
e651ae0a32 🌐 [i18n-KO] Translated gemma3.md to Korean (#39865)
* docs: ko: gemma3.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>

* fix: resolve suggestions

---------

Co-authored-by: Chaewon Song <chaewon1019@ewhain.net>
2025-08-13 13:25:20 -07:00
0f9c2595cd updated visualBERT modelcard (#40057)
* updated visualBERT modelcard

* fix: Review for VisualBERT card
2025-08-13 12:47:32 -07:00
412c9c3030 Remove an old badly designed test (#40142)
remove it
2025-08-13 20:47:00 +02:00
eb5768a86e [docs] Fix ko toctree (#40138)
Update _toctree.yml
2025-08-13 11:24:58 -07:00
68a13cd4a6 Add Segment Anything 2 (SAM2) (#32317)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* nit fixes + optimization

* split sam2 and sam2_video in two models

* PR review part 1

* fix None for default slow processor of sam2

* remove unecessary code path in sam2_video

* refactor/simplify RoPE

* replace embedding module list with embedding matrix

* fix tests

* remove kernel

* nit

* use lru_cache for sine_pos_embeddings

* reorder sam2_video methods

* simplify sam2_video

* PR review part 1

* simplify sam2 video a lot

* more simplification

* update integration tests with updated conftest

* more explicit config for hieradet

* do post_processing outside of sam2 video model

* Improve Sam2VideoVisionRotaryEmbedding

* fix tests

* update docs and fix mask2former/oneformer

* avoid unnecessary reshapes/permute

* fix device concatenating points

* small dtype fix

* PR review

* nit

* fix style and finish up doc

* fix style

* fix docstrings

* fix modular

---------

Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-13 14:18:05 -04:00
25ad9c8c92 Fix Janus (#40140)
fix
2025-08-13 20:12:21 +02:00
bec6926696 gpt oss is important (#40139) 2025-08-13 19:49:54 +02:00
ab9108517a 🌐 [i18n-KO] Translated pipelines.md to Korean (#39577)
* docs: ko: pipelines.md

* feat: gpt draft

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/main_classes/pipelines.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update _toctree.yml

* Update _toctree.yml

번역 문서 수정

* Update pipelines.md

ToC 수정

* Update pipelines.md

---------

Co-authored-by: xhaktm <tnwjd318@hs.ac.kr>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-13 10:26:17 -07:00
20c6b478cd 🚨 Use lru_cache for sine pos embeddings MaskFormer (#40007)
* use lru_cache for sine pos embeddings maskformer

* fix calls to pos embed

* change maxsize to 1
2025-08-13 17:05:22 +00:00
6b728f1830 🌐 [i18n-KO] Translated grounding-dino.md to Korean (#39861)
* docs: ko: grounding-dino.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* Update docs/source/ko/model_doc/grounding-dino.md

Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>

* docs: add AP explanation for better readability

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Kim Juwon <81630351+Kim-Ju-won@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-13 10:01:05 -07:00
127e33f759 🌐 [i18n-KO] Translated optimizers.md to Korean (#40011)
* docs: ko: optimizers.md

* feat: optimizers draft

* fix: manual edits

* docs: ko: update optimizers.md

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

* Update docs/source/ko/optimizers.md

Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>

* docs: ko: final updates to optimizers and toctree

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
Co-authored-by: Jaehyeon Shin <108786184+skwh54@users.noreply.github.com>
2025-08-13 10:00:47 -07:00
ac52c77a66 🌐 [i18n-KO] Translated gpt2.md to Korean (#39808)
* docs: ko: bamba.md

* feat: nmt draft

* fix: manual edits

* docs: ko: gpt2.md

* feat: nmt draft

* fix: manual edits

* Remove bamba.md from docs/source/ko/model_doc/

* Update _toctree.yml
2025-08-13 10:00:25 -07:00
5337f3052d 🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults (#40135)
* working?

* fix tests
2025-08-13 17:57:41 +02:00
e4223fa915 🌐 [i18n-KO] Translated main_classes/optimizer_schedules.md to Korean (#39713)
* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
2025-08-13 08:23:09 -07:00
9e21e50241 🌐 [i18n-KO] Translated jamba.md to Korean (#39890)
* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>

---------

Co-authored-by: Minseo Kim <75977640+luckyvickyricky@users.noreply.github.com>
2025-08-13 08:22:28 -07:00
486844579b 🌐 [i18n-KO] Translated main_classes/processors.md to Korean (#39519)
* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2025-08-13 08:21:38 -07:00
f445caeb0f Fix hidden torchvision>=0.15 dependency issue (#39928)
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit
2025-08-13 15:13:42 +00:00
11537c3e0c [trainer] handle case where EOS token is None in generation_config (#40127)
* handle case where EOS token is None in gen config

* update eli5 dataset
2025-08-13 15:57:17 +01:00
8ef5cd6579 DOCS: Add missing space in SECURITY.md (#40087) 2025-08-13 12:57:37 +00:00
ebceef343a Collated reports (#40080)
* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report
2025-08-13 14:48:15 +02:00
e78571f5ce decoding_method argument in generate (#40085)
* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-13 12:45:50 +00:00
8d19231bca [serve] allow array content inputs for LLMs (#39829)
fix bug; add tests
2025-08-13 11:26:19 +01:00
34a1fc6426 Fix QuantoQuantizedCache import issues (#40109)
* fix quantoquantized
2025-08-13 10:22:59 +00:00
060b86e21d changed xLSTMRMSNorm to RMSNorm (#40113)
* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <nikita@Nikitas-MacBook-Pro.local>
2025-08-13 11:10:42 +02:00
849c3778c6 [bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (#39975)
* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda
2025-08-13 09:58:50 +02:00
85d536a93b 🌐 [i18n-KO] Translated tiny_agents.md to Korean (#39913)
* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
2025-08-12 22:54:16 -07:00
31ab7168ff remove sequence parallel in llama4 (#40084) 2025-08-13 00:12:45 +02:00
a1a4fcd03e Add model card for MobileViT (#40033)
* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-12 11:36:59 -07:00
e5e73e4b95 [docs] Add reference to HF-maintained custom_generate collections (#39894)
decoding -> generation; add collections
2025-08-12 17:38:00 +01:00
0ce24f5a88 Fix Causality Handling in Flash Attention to Support Bidirectional Attention (#39707)
Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-08-12 16:16:28 +00:00
83dbebc429 [trainer] ensure special tokens in model configs are aligned with tokenizer at train time (#38441)
* tmp commit

* add test

* make fixup

* reset warns/info in test
2025-08-12 16:32:07 +01:00
9977cf1739 [Flash Attention] Fix flash attention integration (#40002)
* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment
2025-08-12 15:24:10 +00:00
b6ba595543 Default to dequantize if cpu in device_map for mxfp4 (#39993)
* default to dq if cpu

* an other check

* style

* revert some changes
2025-08-12 16:48:52 +02:00
a5fac1c394 Fix error on importing unavailable torch.distributed (#40038)
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
2025-08-12 16:30:51 +02:00
085e02383c Fix Qwen3 MoE GGUF architecture mismatch (#39976)
* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Jinuk Kim <jusjinuk@snu.ac.kr>
2025-08-12 13:38:48 +00:00
2ce0dae390 Switch the order of args in StaticCache (for BC and future logic) (#40100)
* switch order for BC and future logic

* in generate as well
2025-08-12 15:30:44 +02:00
f7cbd5f3ef Fix regression in mllama vision encoder (#40083)
fix mllama vision encoder

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-12 15:29:45 +02:00
35dc88829c Replace logger.warning with logger.warning_once in GradientCheckpointingLayer (#40091) 2025-08-12 15:26:47 +02:00
b1b46555cd Re-apply make style (#40106)
make style
2025-08-12 15:02:16 +02:00
a07b5e90f2 feat: add is_fast to ImageProcessor (#39603)
* feat: add `is_fast` to ImageProcessor

* test_image_processing_common.py 업데이트

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* feat: add missing BaseImageProcessorFast import

* fix: `issubclass` for discriminating subclass of BaseImageProcessorFast

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-08-12 12:14:57 +00:00
952fac100d Enable SIM rules (#39806)
* Enable SIM rules

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-08-12 12:14:26 +00:00
41d1717882 New DynamicSlidingWindowLayer & associated Cache (#40039)
* start adding the layer

* style

* improve

* modular

* fix

* fix

* improve

* generate integration

* comment

* remove old one

* remove

* fix

* fix

* fix

* fix all recompiles

* fix

* doc

* fix

* add text config check

* fix encoderdecoder cache

* add it for all models with sliding/hybrid support

* revert

* start fixing

* prophetnet

* fsmt

* fix ddp_data

* add test for mistral

* improve mistral test and add gemma2 test

* docstrings
2025-08-12 14:09:52 +02:00
ab455e0d88 Audio encodings now match conv2d weight dtype in Gemma3nAudioSSCPConvBlock (#39743)
audio encodings now match conv weight dtype in Gemma3nAudioSSCPConvBlock
2025-08-12 12:08:28 +00:00
4b3a1a62cc Causal loss for ForConditionalGeneration (#39973)
* feat: add ForConditionalGeneration loss to LOSS_MAPPING

* consistent spelling of "recognized"
2025-08-12 14:03:09 +02:00
f6b6e17719 Add glm4.5&&glm4.5V doc (#40095)
* Docs: GLM-4-MoE & GLM-4V-MoE pages

* Docs: polish GLM-4V-MoE intro, remove placeholders; pin image

* Docs

---------

Co-authored-by: wujiahan <lambert@gmail.com>
2025-08-12 11:44:53 +00:00
1c5e17c025 Update Glm4V processor and add tests (#39988)
* update GLm4V and add tests

* Update tests/models/glm4v/test_processor_glm4v.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* remove min/max pixels for BC

* fix video tests

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-08-12 13:40:54 +02:00
913c0a8c33 [docs] Zero Shot Object Detection Task (#40096)
* refactor zsod task docs

* keeping the image guided od section

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update docs/source/en/tasks/zero_shot_object_detection.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
2025-08-12 11:43:38 +01:00
c6fbfab61b [fix] batch inference for llava_onevision (#40021)
* [fix] llava onevision batch inference

* style

* cannot pass inconsistent list & handle text-only case
2025-08-12 11:01:00 +02:00
86bb1fcd26 Revert FA2 kwargs construction (#40029)
* revert

* use imports

* went way too high in imports level

* style
2025-08-12 10:48:35 +02:00
3ff2e984d2 Fix PerceptionLM image preprocessing for non-tiled image input. (#40006)
* Fix PerceptionLM image preprocessing for non-tiled image input.

* Add test for single tile vanilla image processing.

* ruff format

* recover missing test skip

* Simplify test.

* minor test name fix
2025-08-12 08:40:22 +00:00
4668ef1459 Update notification service MI325 (#40078)
add mi325 to amd_daily_ci_workflows
2025-08-12 10:22:52 +02:00
1cea763ba4 feat: extract rev in attn_implementation kernels via @ (#40009)
* feat: extract rev in attn_implementation kernels via @

* fix: adjust for ruff

* fix: update regex and add explanatory comment

* fix: move attn_implementation kernel doc

* fix: remove extra line
2025-08-11 15:14:13 -04:00
e29919f993 [GPT Big Code] Fix attention scaling (#40041)
* fix

* update integration tests

* fmt

* add regression test
2025-08-11 19:01:31 +00:00
eca703026e chore: standardize DeBERTa model card (#37409)
* chore: standardize DeBERTa model card

* Apply suggestions from code review in docs

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: Update deberta.md with code cleanup suggestions

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/deberta.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update deberta.md

* Update deberta.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-11 10:30:37 -07:00
43001fd3c6 Fix time_spent in notification_service.py. (#40081)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-11 18:30:58 +02:00
5521c62b89 added Textnet fast image processor (#39884)
* feat: add fast image processor implementation for TextNet model

* chore: override to_dict method to TextNetImageProcessorFast for slow processor compatibility tests

* chore: update init method

* chore: coding and style checks

* chore: fixed code quality issue

* chore: override resize to handle size_divisor, move all preprocessing logic to child class

* fix: autoImageProcessor issue for textnet

* chore: cleanup

* simplify resize

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-08-11 11:44:31 -04:00
6b70d79b61 Fix repo consistency (#40077)
fix
2025-08-11 15:26:22 +02:00
7dd82f307b guard on model.eval when using torch.compile + FSDP2 (#37413)
guard on model.eval

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-11 13:22:42 +02:00
68eb1a9a63 Remove deprecated cache-related objects (#40035)
remove them
2025-08-11 10:30:14 +02:00
480653d271 fix: move super().__init__ after vision_config init in Mistral3Config (#40063)
fix: move super().__init__ after vision_config init in Mistral3Config (#40062)
2025-08-11 09:21:54 +02:00
502f253e20 [gemma3] update conversion key mapping (#39778)
update conversion key mapping
2025-08-11 09:21:13 +02:00
3124d1b439 [qwen-vl] fix beam search with videos (#39726)
* fix

* fix copies
2025-08-11 09:21:04 +02:00
1372a5b8c4 fix: resolve triton version check compatibility on windows (#39986)
* fix: resolve triton version check compatibility on windows

* style: remove trailing space

* fix: fix typo

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-08-11 08:53:19 +02:00
99c747539e unpin torchcodec==0.5.0 and use torch 2.8 on daily CI (#40072)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-10 22:27:39 +02:00
b59140b696 Update HuBERT model card according to template (#39742)
* Update HuBERT model card according to template

Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section.

* Address review comments and changes requested to hubert.md

* Update hubert.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-10 11:32:45 -07:00
f4d57f2f0c Revert "fix notification_service.py about time_spent" (#40044)
Revert "fix `notification_service.py` about `time_spent` (#40037)"

This reverts commit d2ba153b29feb9cc0e9818c1ce63a07679b47250.
2025-08-08 22:32:24 +02:00
7b20915f4e GLM-4.5V Model Support (#39805)
* init

* update

* uupdate

* ruff

* t patch is 2 defalut not 1

* draft

* back

* back1

* update

* config update

* update using glm-41 format

* add self.rope_scaling = config.rope_scaling

* update config

* update

* remove the processor

* update

* fix tests

* update

* for test

* update

* update 2126

* self.rope_scaling is missing in GLM4MOE lets add it

* update

* update

* Update modular_glm4v_moe.py

* change config

* update apply_multimodal_rotary_pos_emb

* format

* update

* Delete 3-rollout_qas_thinking_answers.py

* use right name

* update with place holder

* update

* use right rotary

* Update image_processing_glm4v_fast.py

* rope_config_validation needs to rewrite the entire config file in modular

* update

* changed name

* update

* Update modeling_glm4v_moe.py

* _init_weights shoud be add in Glm4vMoePreTrainedModel

* remove use_qk_norm

* Update modular_glm4v_moe.py

* remove use_qk_norm as it is not use

* fix style

* deprecations are not needed on new models

* fix merge issues

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-08-08 17:39:52 +02:00
d2ba153b29 fix notification_service.py about time_spent (#40037)
temp

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-08 17:11:16 +02:00
f639c0c780 Bnb failling tests (#40026)
* initial commit

* style

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-08 16:28:00 +02:00
a96cccd0dd Tie weights recursively on all submodels (#39996)
* recursive call

* add missing keys

* remove bad keys
2025-08-08 16:03:16 +02:00
a78263dbb5 fix 2025-08-08 15:32:23 +02:00
dc11a3cbb2 [core] Refactor the Cache logic to make it simpler and more general (#39797)
* Simplify the logic quite a bit

* Update cache_utils.py

* continue work

* continue simplifying a lot

* style

* Update cache_utils.py

* offloading much simpler

* style

* Update cache_utils.py

* update inits

* Update cache_utils.py

* consistemncy

* Update cache_utils.py

* update generate

* style

* fix

* fix

* add early_initialization

* fix

* fix mamba caches

* update

* fix

* fix

* fix

* fix tests

* fix configs

* revert

* fix tests

* alright

* Update modeling_gptj.py

* fix the constructors

* cache tests

* Update test_cache_utils.py

* fix

* simplify

* back to before -> avoid compile bug

* doc

* mistral test

* llama4 test dtype

* Update test_modeling_llama4.py

* CIs

* Finally find a nice impl

* Update cache_utils.py

* Update cache_utils.py

* add lazy methods in autodoc

* typo

* better doc

* Add detailed docstring for lazy init

* CIs

* style

* fix
2025-08-08 14:47:21 +02:00
95510ab018 Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991) (#40024)
* Fix missing None default values for Gemma3n model in get_placeholder_mask (#39991)

* Switched definition of optional from| None to Optiona[] (Issue #39991)

---------

Co-authored-by: Laurenz Ruzicka <Laurenz.Ruzicka@ait.ac.at>
2025-08-08 10:43:42 +00:00
5c3fb7f731 Harmonize past_key_value to past_key_valueS everywhere (#39956)
* all modulars and llama

* apply modular

* bert and gpt2 copies

* fix imports

* do it everywhere

* fix import

* finalize it

* fix

* oups set it in modular

* style

* fix

* Add 1 version to deprecation cycle

* Update modeling_layers.py
2025-08-08 11:52:57 +02:00
2469cce621 Fix an annoying flaky test (#40000)
annoying flaky test
2025-08-08 10:32:51 +02:00
fe1bf82159 Higgs modules_to_not_convert standardization (#39989)
fix higgs
2025-08-08 10:22:59 +02:00
b374c3d12e Fix broken image inference for Fuyu model (#39915)
* fix fuyu

Signed-off-by: Isotr0py <2037008807@qq.com>

* oops

Signed-off-by: Isotr0py <2037008807@qq.com>

* run test on GPU

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* clean unused

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* revert

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

* add fuyu multimodal test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-08 07:21:49 +00:00
4d57c39007 pin torchcodec==0.5.0 for now with torch 2.7.1 on daily CI (#40013)
* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 23:05:39 +02:00
3e0333fa4a Update expected output values after #39885 (part 2) (#40015)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 22:52:53 +02:00
12f248bced Raising error when quantizing a quantized model (#39998)
* error when quantizing a quantized model

* style
2025-08-07 20:37:25 +00:00
efaf3714dc docs: fix duplication in 'en/optimizers.md' (#40014) 2025-08-07 13:28:43 -07:00
ca4cbb1e3f unpin torch<2.8 on circleci (#40012)
update

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 21:31:17 +02:00
78922577e9 FA2 can continue generation from cache (#39843)
* add fa2 support to continue generation from cache

* update q-len
2025-08-07 19:26:23 +02:00
9bfbdd2945 Fix default values of getenv (#39867)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-07 17:25:40 +00:00
692d336908 Fix HGNetV2 Model Card and Image Classification Pipeline Usage Tips (#39965)
* fix hgnet docs and image-classification pipeline

* use positional argument

* fix dit close hfoptions tag

* fix alphabet order

* fix hgnnet modular docstring

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hgnet_v2.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix: hgnet reference

* change hgnet to en doc

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-07 09:33:29 -07:00
0659214196 fix: remove CHAT_TEMPLATE import in tests for deepseek-vl (#40003)
* remove CHAT_TEMPLATE import in tests

* update and use prepare_processor_dict
2025-08-07 16:19:36 +00:00
27997eeb8d Fix missing video inputs for PerceptionLM. (#39971)
* Fix missing video inputs for PerceptionLM.

* Minor fix for vanilla input image (only C,H,W, no tiles dim).

* Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)."

This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.
2025-08-07 15:54:45 +00:00
bf1bd6ac1f Fix int4 quantized model cannot work with cpu (#39724)
* Fix int4 quantized model cannot work with cpu

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Update the comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

* update

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-07 15:24:00 +00:00
43d3b1931a Update expected output values after #39885 (part 1) (#39990)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-07 16:00:28 +02:00
d5a0809707 Fix consistency (#39995)
* modular

* fix
2025-08-07 15:52:40 +02:00
b347e93567 [typing] Fix return typehint for decoder and inv_freq annotation (#39610)
* fix return typehint for decoder and annotate inv_freq

* fix modular

* Fix consistency

* Move annotation on class level

* missing annotations

* add comment
2025-08-07 14:10:22 +01:00
7188e2e28c Bump transformers from 4.48.0 to 4.53.0 in /examples/tensorflow/language-modeling-tpu (#39967)
Bump transformers in /examples/tensorflow/language-modeling-tpu

Bumps [transformers](https://github.com/huggingface/transformers) from 4.48.0 to 4.53.0.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v4.48.0...v4.53.0)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 4.53.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-08-07 12:13:48 +01:00
2b19a06692 Fix gemma3n feature extractor's incorrect squeeze (#39919)
* fix gemma3n squeeze

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-07 18:34:28 +08:00
555cbf5917 [Idefics] fix device mismatch (#39981)
fix
2025-08-07 11:12:04 +02:00
597ed1a11d Various test fixes for AMD (#39978)
* Add amd expectation in internvl

* Add amd expectation to llama

* Added bnb decorator for a llava test that requires bnb

* Added amd expectation for mistral3

* Style
2025-08-07 10:57:04 +02:00
6121e9e46c Support input_embeds in torch exportable decoders (#39836)
* Support input_embeds in torch exportable decoders

* Hybrid cache update

* Manually change some callsites

* AI changes the rest of the call sites

* Make either input_ids/inputs_embeds mandatory

* Clean up

* Ruff check --fix

* Fix test

* pr review

* Revert config/generation_config changes

* Ruff check
2025-08-07 08:51:31 +00:00
cdeaad96b7 [superglue] Fixed the way batch mask was applied to the scores before match assignment computation (#39968)
fix: mask filling to score was wrong
2025-08-07 09:49:39 +01:00
2593932f10 Gemma3 fixes (#39960)
* Fix multiple devices issue

* Added expectations for rocm 9.4

* Ruff
2025-08-07 09:57:21 +02:00
513f76853b Modular fix: remove the model name in find_file_type (#39897)
* remove the model name in the class name

* add comment
2025-08-06 23:31:07 +00:00
743bb5f52e chore: update Deformable_Detr model card (#39902)
* chore: update Deformable_Detr model card

* fix: added pipeline, automodel examples and checkpoints link

* Update deformable_detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-06 12:45:14 -07:00
ac0b468465 [bugfix] fix flash_attention_2 unavailable error on Ascend NPU (#39844) 2025-08-06 17:48:52 +00:00
cf243a1bf8 Fix fix_and_overwrite mode of utils/check_docstring.py (#39369)
* bug in fix mode of check_docstring
2025-08-06 19:37:25 +02:00
6902ffa505 remove triton_kernels dep with kernels instead (#39926)
* remove dep

* style

* rm import

* fix

* style

* simplify

* style
2025-08-06 19:31:20 +02:00
cb2e0df2ec [image processor] fix glm4v (#39964)
* fix glm4v image process

* Update src/transformers/models/glm4v/image_processing_glm4v.py

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-06 17:46:58 +01:00
9ab75fc428 fix typo (#39936)
* fix typo

* fix modular instead

* fix

---------

Co-authored-by: y.korobko <y.korobko@tbank.ru>
2025-08-06 16:21:24 +00:00
43b3f58875 Fix grammatical error in MoE variable name: expert_hitted → expert_hit, hitted_experts → hit_experts (#39959)
* Fix grammatical error: expert_hitted -> expert_hit in MoE implementations

* Fix grammatical error: hitted_experts -> hit_experts in MoE implementation
2025-08-06 15:45:19 +00:00
dff6185d61 docs: fix typo in 'quantization-aware training' (#39904) 2025-08-06 14:52:43 +00:00
c7844c7a8e Enable gpt-oss mxfp4 on older hardware (sm75+) (#39940)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-06 13:39:21 +00:00
dd70a8cb9d Fix MXFP4 quantizer validation to allow CPU inference with dequantize option (#39953)
* Fix MXFP4 quantizer validation to enable CPU dequantization

Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.

* Add tests for MXFP4 quantizer CPU dequantization validation

* fix: format mxfp4 test file with ruff
2025-08-06 15:20:41 +02:00
82eb67e62a [docs] ko toc fix (#39927) 2025-08-06 10:12:34 +00:00
9e76a6bb54 circleci: pin torch 2.7.1 until torchcodec is updated (#39951)
circleci torch 2.7.1

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-06 11:18:00 +02:00
910b319357 Fix CI: Tests failing on CPU due to torch.device('cpu').index being None (#39933)
replace routing_weights.device.index with a
2025-08-06 10:22:43 +02:00
369c99d0ce Avoid utils/check_bad_commit.py failing due to rate limit (requesting api.github.com) (#39918)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-05 21:52:20 +02:00
b771e476a8 [CI] post-GptOss fixes for green CI (#39929) 2025-08-05 20:04:59 +02:00
eb6e26acf3 Dev version 2025-08-05 18:09:30 +02:00
c54203a32e gpt_oss last chat template changes (#39925)
Last chat template changes
2025-08-05 18:08:08 +02:00
7c38d8fc23 Add GPT OSS model from OpenAI (#39923)
* fix

* nice

* where i am at

* Bro this works

* Update src/transformers/integrations/tensor_parallel.py

* cleanups

* yups that was breaking

* Update src/transformers/models/openai_moe/modeling_openai_moe.py

* gather on experts and not mlp

* add changes for latest convert branch

* adds options to get output_router_logits from config

* bring chat temlate + special tokens back into the script.

* initial commmit

* update

* working with shards

* add model.safetensors.index.json

* fix

* fix

* mxfp4 flag

* rm print

* Fix PAD/EOS/BOS (#18)

* fix pad/eos/bos

* base model maybe one day

* add some doc

* special tokens based on harmony.

* add in tokenizer config as well.

* prepare for rebase with main

* Fix for initialize_tensor_parallelism  now returning 4-tuple

```
[rank0]:   File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]:     model = AutoModelForCausalLM.from_pretrained(
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]:     return model_class.from_pretrained(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]:     tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```

* mxfp4

* mxfp4 draft

* fix

* fix import

* draft

* draft impl

* finally working !

* simplify

* add import

* working version

* consider blocks and scales

* device mesh fix

* initial commit

* add working dequant + quant logic

* update

* non nan, gibberish output

* working EP + quantization finally !

* start cleaning

* remove reversing process

* style

* some cleaning

* initial commmit

* more cleaning

* more cleaning

* simplify

* more cleaning

* rm duplicated function

* changing tp_plan

* update tp plan check

* add loading attribute

* dequantizing logic

* use subfunctions

* import cleaning

* update_param_name

* adds clamped swiglu

* add clamping to training path

* simplify dequant logic

* update

* Bad merge

* more simplifications & tests

* fix !

* fix registering custom attention

* fix order

* fixes

* some test nits

* nits

* nit

* fix

* Clamp sink logits

* Clean

* Soft-max trick

* Clean up

* p

* fix deepspeed

* update both modeling and modular for cleanup

* contiguous

* update tests

* fix top_k router call

* revert renaming

* test nits

* small fixes for EP

* fix path for our local tests

* update as I should not have broken that!

* fix the loss of mixtral

* revert part of the changes related to router_scores, kernel probably no ready for that!

* deleting a small nit

* update arch

* fix post processing

* update

* running version but not expected output

* moving to cuda

* initial commit

* revert

* erroring when loading on cpu

* updates

* del blocks, scales

* fix

* style

* rm comm

* comment

* add comment

* style

* remove duplicated lines

* Fix minor issue with weight_map conversion script

* fix sampling params

* rename to final name

* upate pre-final version of template

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* fix batched inference

* serve fixes

* swizzle !

* update final chat template by Matt.

* fix responses; pin oai

* sinplify

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* fix

* Use ROCm kernels from HUB

* Make kernel modes explicit

* update final chat template by Matt. x2

* Thanks Matt for his tireless efforts!

Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>

* Fix installation

* Update setup.py

Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>

* allow no content

* fix: update message handling in write_tokenizer function

* Fix template logic for user message role

* last nits for CB and flash_paged!

* there was one bad merge

* fix CB (hardcode for now, its just using kv groups instead)

* fix

* better fix for device_map

* minor device fix

* Fix flash paged

* updates

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* Revert "remove dtensors, not explicit (#39840)"

This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* fix merge

* fix

* Fix line break when custom model indentity

* nits testing

* to locals first and pass sliding window to flash paged

* register modes for MegaBlocksMoeMlp

* add integration test in fixtures -> now update the tests to use it!

* update integration tests

* initial fix

* style and update tests

* fix

* chore(gpt oss): remove mlp_bias from configuration

It was just a leftover.

* stats

* Integration tests

* whoops

* Shouldn't move model

* Ensure assistant messages without thinking always go to "final" channel

* More checks to ensure expected format

* Add pad_token_id to model configuration in write_model function (#51)

* Add oai fix fast tests (#59)

* Fix some fast tests

* Force some updates

* Remove unnecessary fixes

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py

* reasoning -> Reasoning

* Add additional integration tests

* fixup

* Slight fixes

* align chat template with harmony

* simplify

* Add comment

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* torch testing assert close

* Revert fixup

* skip 2 test remove todo

* merge

* padding side should be left for integration tests

* fix modular wrt to changes made to modeling

* style

* isort

* fix opies for the loss

* mmmm

---------

Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
2025-08-05 18:02:18 +02:00
738c1a3899 🌐 [i18n-KO] Translated cache_explanation.md to Korean (#39535)
* update: _toctree.yml

* docs: ko: cache_explanation.md

* feat: nmt draft

* fix: apply yijun-lee's comments

* fix: apply 4N3MONE's comments

* docs: update cache_position

* docs: update cache-storage-implementation

* update: add h2 tag in cache-position

---------

Co-authored-by: taehyeonjeon <xogus294@gmail.com>
2025-08-05 08:20:13 -07:00
d2ae766836 Export SmolvLM (#39614)
Export SmolVLM for ExecuTorch
2025-08-05 16:20:23 +02:00
c430047602 [docs] update object detection guide (#39909)
* Update object_detection.md

* Update object_detection.md
2025-08-05 14:07:21 +00:00
dedcbd6e3d run model debugging with forward arg (#39905)
* run model debugging a lot simpler

* fixup

* Update src/transformers/utils/generic.py

* fixup

* mode syle?

* guard a bit
2025-08-05 15:46:19 +02:00
20ce210ab7 Revert "remove dtensors, not explicit (#39840)" (#39912)
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.

* update

* style?
2025-08-05 15:12:14 +02:00
2589a52c5c Fix aria tests (#39879)
* fix aria tests

* awful bug

* fix copies

* fix tests

* fix style

* revert this
2025-08-05 13:48:47 +02:00
6e4a9a5b43 Fix eval thread fork bomb (#39717) 2025-08-05 10:50:32 +00:00
98a3c49135 Replace video_fps with fps in tests (#39898)
Signed-off-by: cyy <cyyever@outlook.com>
2025-08-05 10:39:55 +00:00
1af1071081 Fix misleading WandB error when WANDB_DISABLED is set (#39891)
When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment,
the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb."

This was confusing because wandb was actually installed, just disabled via the environment variable.

The fix detects this specific case and provides a clear, actionable error message explaining
the conflict and how to resolve it.
2025-08-05 10:18:18 +00:00
78ef84921b Avoid aliasing in cond's branches for torch 2.8 (#39488)
Avoid alaising in cond's branches

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 11:18:11 +02:00
9e676e6a0e [qwen] remove unnecessary CUDA sync in qwen2_5_vl (#39870)
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-05 08:54:16 +00:00
392be3b282 fix test_working_of_tp failure of accelerate ut (#39828)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-08-05 08:52:57 +00:00
cc5de36454 [Exaone4] Fixes the attn implementation! (#39906)
* fix

* fix config
2025-08-05 09:29:16 +02:00
00d47757bf Reorder serving docs (#39634)
* Slight reorg

* LLMs + draft VLMs

* Actual VLM examples

* Initial responses

* Reorder

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/tiny_agents.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/open_webui.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/cursor.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update docs/source/en/serving.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Responses API

* Address Pedro's comments

---------

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-08-05 08:43:06 +02:00
8c4ea670dc chore: update DETR model card (#39822)
* Update model card for DETR

* fix: applied suggested changes

* fix: simplified pipeline and modified notes and resources

* Update detr.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 12:25:53 -07:00
0bd91cc822 Add support for ModernBertForMultipleChoice (#39232)
* implement ModernBertForMultipleChoice

* fixup, style, repo consistency

* generate modeling_modernbert

* add tests + docs

* fix test
2025-08-04 20:45:43 +02:00
801e869b67 send some feedback when manually building doc via comment (#39889)
* fix

* fix

* fix

* Update .github/workflows/pr_build_doc_with_comment.yml

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-08-04 18:20:48 +00:00
ee7eb2d0b1 Update cohere2 vision test (#39888)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 20:08:18 +02:00
3bafa128dc [DOCS] : Improved mimi model card (#39824)
* [DOCS] : Improved mimi model card

* Removed additional header

* Review: addressed feedback

* Update mimi.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-08-04 10:07:06 -07:00
192acc2d0f Fix link to models in README (#39880)
Update README.md
2025-08-04 09:34:41 -07:00
7dca2ff8cf [typing] better return type hint for AutoModelForCausalLM and AutoModelForImageTextToText (#39881)
* Better return type hint for  AutoModelForCausalLM and AutoModelForImageTextToText

* fix imports

* fix
2025-08-04 15:03:53 +00:00
3edd14610e Set torch.backends.cudnn.allow_tf32 = False for CI (#39885)
* fix

* fix

* [test all]

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 16:55:16 +02:00
e3505cd4dc Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor (#39858)
Replace Tokenizer with PreTrainedTokenizerFast in ContinuousBatchProcessor
2025-08-04 16:39:19 +02:00
380b2a0317 Rework add-new-model-like with modular and make test filenames coherent (#39612)
* remove tf/flax

* fix

* style

* Update add_new_model_like.py

* work in progress

* continue

* more cleanup

* simplify and first final version

* fixes -> it works

* add linter checks

* Update add_new_model_like.py

* fix

* add modular conversion at the end

* Update add_new_model_like.py

* add video processor

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* fix

* Update image_processing_auto.py

* Update image_processing_auto.py

* fix post rebase

* start test filenames replacement

* rename all test_processor -> test_processing

* fix copied from

* add docstrings

* Update add_new_model_like.py

* fix regex

* improve wording

* Update add_new_model_like.py

* Update add_new_model_like.py

* Update add_new_model_like.py

* start adding test

* fix

* fix

* proper first test

* tests

* fix

* fix

* fix

* fix

* modular can be used from anywhere

* protect import

* fix

* Update add_new_model_like.py

* fix
2025-08-04 14:41:09 +02:00
5fb5b6cfaf Fix quant docker for fp-quant (#39641)
* fix quant docker

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-04 11:57:08 +00:00
16d6faef9a [core] Fix attn_implementation setter with missing sub_configs (#39855)
* fix

* add sub_configs

* remove case for attention setter

* fix None

* Add test

* Fix sub-configs

* fix tests_config

* fix consistency

* fix fsmt

* fix
2025-08-04 11:35:09 +01:00
2a9febd632 Add support for including in-memory videos (not just files/urls) in apply_chat_template (#39494)
* added code for handling video object ,as dictionary of frames and metadata, in chat template

* added new test where videos are passed as objects (dict of frames, metadata) in the chat template

* modified hardcoded video_len check that does not match with increased number of tests cases.

* Modify hardcoded video_len check that fails with increased number of tests

* update documentation of multi-modal chat templating with extra information about including video object in chat template.

* add array handling in load_video()

* temporary test video inlcuded

* skip testing smolvlm with videos that are list of frames

* update documentation & make fixup

* Address review comments
2025-08-04 11:49:42 +02:00
0d511f7a77 Use comment to build doc on PRs (#39846)
* try

* try

* try

* try

* try

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-08-04 10:24:45 +02:00
4819adbbaa Refactor label name handling for PEFT models in Trainer class (#39265)
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-08-04 06:29:57 +00:00
166fcad3f8 Improve is_wandb_available function to verify WandB installation (#39875)
Improve `is_wandb_available` function to verify WandB installation by checking for a key attribute
2025-08-04 08:22:52 +02:00
6dfd561d9c remove dtensors, not explicit (#39840)
* remove dtensors, not explicit

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>

* style

* fix test

* update

* as we broke saving try to fix

* output layouts should exit

* nit

* devicemesh exists if it was distributed

* use _device_mesh of self

* update

* lol

* fix

* nit

* update

* fix!

* this???

* grumble grumble

* ?

* fuck me

---------

Co-authored-by: 3outeille <3outeille@users.noreply.github.com>
2025-08-01 22:02:47 +02:00
b727c2b20e Allow TrackioCallback to work when pynvml is not installed (#39851)
Allow TrackioCallback to work when pynvml is not installed
2025-08-01 18:57:25 +02:00
1ec0feccdd [image-processing] deprecate plot_keypoint_matching, make visualize_keypoint_matching as a standard (#39830)
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models

* refactor: added copied from

* fix: make style

* fix: repo consistency

* fix: make style

* docs: added missing method in SuperGlue docs
2025-08-01 16:29:57 +00:00
7b4d9843ba Add fast image processor Janus, Deepseek VL, Deepseek VL hybrid (#39739)
* add fast image processor Janus, deepseek_vl, deepseek_vl_hybrid

* fix after review
2025-08-01 12:20:08 -04:00
88ead3f518 Fix responses add tests (#39848)
* Quick responses fix

* [serve] Fix responses API and add tests

* Remove typo

* Remove typo

* Tests
2025-08-01 18:06:08 +02:00
6ea646a03a Update ux cb (#39845)
* clenaup

* nits

* updates

* fix logging

* push updates?

* just passexception

* update

* nits

* fix

* add tokencount

* style
2025-08-01 16:50:28 +02:00
3951d4ad5d Add MM Grounding DINO (#37925)
* first commit

Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.

TODO: Some tests are failing so that needs to be fixed.

* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model

* cleaned up a hack in the conversion script

* Fixed the expected values in integration tests

Cross att masking and cpu-gpu consistency tests are still failing however.

* changes for make style and quality

* add documentation

* clean up contrastive embedding

* add mm grounding dino to loss mapping

* add model link to config docstring

* hack fix for mm grounding dino consistency tests

* add special cases for unused config attr check

* add all models and update docs

* update model doc to the new style

* Use super_kwargs for modular config

* Move init to the _init_weights function

* Add copied from for tests

* fixup

* update typehints

* Fix-copies for tests

* fix-copies

* Fix init test

* fix snippets in docs

* fix consistency

* fix consistency

* update conversion script

* fix nits in readme and remove old comments from conversion script

* add license

* remove unused config args

* remove unnecessary if/else in model init

* fix quality

* Update references

* fix test

* fixup

---------

Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-01 15:43:23 +01:00
50145474b7 [typecheck] proper export of private symbols (#39729)
* Export private symbols

Signed-off-by: cyy <cyyever@outlook.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/__init__.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix format

Signed-off-by: cyy <cyyever@outlook.com>

* Add a comment for exported symbols

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-08-01 13:36:47 +01:00
c962f1515e [attn_implementation] remove recursive, allows custom kernels with wrappers (#39823)
* fix?

* fixme and style

* Update src/transformers/modeling_utils.py

* update

* update

* fix

* small fixees

* nit

* nits

* fix init check?

* fix

* fix default

* or fucks me

* nits

* include a small nit

* does this make it hapy?

* fixup

* fix the remaining ones
2025-08-01 12:18:28 +02:00
d3b8627b56 [VLMs] split out "get placeholder mask" to helper (#39777)
* batch upidate all models

* update

* forgot about llava onevision

* update

* fix tests

* delete file

* typo

* fix emu3 once and forever

* update cohere2 vision as well
2025-08-01 08:01:06 +00:00
a115b67392 Fix tp cb (#39838)
* fixes

* one more
2025-08-01 09:59:04 +02:00
2c0af41ce5 Fix bad markdown links (#39819)
Fix bad markdown links.
2025-07-31 09:14:14 -07:00
4fcf455517 Fix broken links (#39809)
Replace links in the form of `[text]((url))` to `[text](url)`. This is
the correct format of a url in the markdown.
2025-07-31 13:23:04 +00:00
b937d47455 [cohere2 vision] move doc to multimodal section (#39820)
move doc to multimodal section
2025-07-31 15:13:02 +02:00
6ba8a1ff45 Update documentation for Cohere2Vision models (#39817)
* Update docs with pipeline example

* Add Cohere2Vision to list of vision models

* Sort models
2025-07-31 11:58:45 +00:00
e1688d28d3 [Model] Cohere2 Vision (#39810)
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025

* update and add modualr file

* update processors and check with orig impl later

* delete unused files

* image processor reduce LOC and re-use GotOCR2

* update the config to use modular

* model tests pass

* processor fixes

* check model outputs decorator

* address one more comment

* Update tokens. Temp - need to read from tokenizer'

* fix for multi-gpu

* Fix image token handling

* upadte image token expansion logic

* fix a few issues with remote code loading

* not related but modular forces us to change all files now

* Add overview and code sample to cohere vision docs

* add scripts. TMP.

* Update inference script

* Create script

* set dtype in export script

* TO revert: modular export fix

* Fix scripts

* Revert "TO revert: modular export fix"

This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.

* Use modular weights

* Upload to hub

Removed OOD weights ad script

* Updated docs

* fix import error

Update docs

Added pipeline test

* Updated docs

* Run modular script

remove modular for config

Added patch_size

Added docstrings in modular

Fix OOM

Add docs, fixup integration tests. 8-gpu passing

* tiny updates

* address comments + fixup

* add test for chat template

* check model outputs workaround

* aya vision fix check model inputs

* Revert "add test for chat template"

This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.

* reveert more changes

* last revert

* skip and merge

* faulty copy from

---------

Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
2025-07-31 10:57:34 +00:00
6c3f27ba61 [docs] fix korean docs yet again (#39813)
fix korean docs yet again
2025-07-31 09:13:25 +00:00
cb289ad243 feat(tokenization): add encode_message to tokenize messages one by one (#39507)
* feat(tokenization): add encode_message to tokenize messages one by one

* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.

* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.

* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.

* Docs fix

* Revert changes in docstring of pad()

* Revert changes in docstring

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.

* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.

---------

Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-31 10:55:45 +02:00
4f93cc9174 fix: providing a tensor to cache_position in model.generate kwargs always crashes because of boolean test (#39300)
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous

* test cache_position

* move test

* propagate changes

---------

Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
2025-07-30 17:30:28 +00:00
9b3203f47b Add callback to monitor progress in whisper transcription (#37483)
* Add callback to monitor progress in whisper transcription

* Added `` around variables, rewording

* Add example of `monitor_progress`.

---------

Co-authored-by: Eric B <ebezzam@gmail.com>
2025-07-30 17:40:53 +02:00
7abb5d3992 Update mT5 model card (#39702)
* Update mt5 model card

* Fix casing of model title

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:35:04 -07:00
1019b00028 Update model card for Cohere2 (Command R7B) (#39604)
* Update model card for Cohere2 (Command R7B)

* fix: applied suggested changes
2025-07-30 08:34:26 -07:00
ecbb5ee194 standardized BARThez model card (#39701)
* standardized barthez model card according to template

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/barthez.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* suggested changes to barthez model card

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-30 08:33:13 -07:00
8e077a3e45 Fix re-compilations for cross attention cache (#39788)
fix recompilations for cross attn cache
2025-07-30 14:52:03 +02:00
1e0665a191 Simplify conditional code (#39781)
* Use !=

Signed-off-by: cyy <cyyever@outlook.com>

* Use get

Signed-off-by: cyy <cyyever@outlook.com>

* Format

* Simplify bool operations

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:32:10 +00:00
b94929eb49 Fix an invalid condition (#39762)
Fix an invalid judgement

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-30 12:19:17 +00:00
bb2ac66453 fix chameleonvision UT failure (#39646)
* fix chameleonvision UT failure

Signed-off-by: matrix.yao@intel.com <Yao Matrix>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: matrix.yao@intel.com <Yao Matrix>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: root <Yao Matrix>
2025-07-30 12:09:26 +00:00
5348445dfa Super tiny update (#39727)
super tiny update
2025-07-30 12:21:41 +02:00
54cbea5615 more info in model_results.json (#39783)
more info

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-30 11:43:10 +02:00
01d5f94695 [ASR pipline] fix with datasets 4.0 (#39504)
* fix

* handle edge case

* make
2025-07-30 08:13:40 +00:00
8ab21be570 enable static cache on vision encoder decoder (#39773)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-30 08:10:46 +00:00
67cfe11528 Fix Evolla and xLSTM tests (#39769)
* fix all evolla

* xlstm
2025-07-30 09:51:55 +02:00
ec4033457e Don't set run_name when none (#39695)
* Don't set run_name when none

* revert

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-30 01:39:29 +00:00
551a89a4a3 Standardize CLAP model card format (#39738)
* Standardize CLAP model card format

* Apply review feedback

* Remove Resources section
2025-07-29 14:13:04 -07:00
da70b1389a docs: Update EfficientLoFTR documentation (#39620)
* docs: Update EfficientLoFTR documentation

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-29 13:54:44 -07:00
ddd2100767 Fix OmDet test after arg deprecation (#39766)
fix arg name
2025-07-29 22:10:36 +02:00
4abb053b6c Remove python3.7 reference from doc link (#39706) 2025-07-29 09:17:13 -07:00
33aa49df9d [docs] Ko doc fixes after toc update (#39660)
* update docs

* doc builder working

* make fixup
2025-07-29 17:05:26 +01:00
c4e2069898 Fix Cache.max_cache_len max value for Hybrid models (#39737)
* fix gemma

* fix min

* fix quant init issue

* fix gemma 3n

* skip quant cache test

* fix modular

* new test for Gemma

* include cyril change

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-29 17:12:50 +02:00
075dbbceaa fix(trainer): Correct loss scaling for incomplete gradient accumulation steps (#39659)
* Fix issue[#38837]: wrong loss scaled in last step of epoch

* chore: trigger CI

* Update src/transformers/trainer.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/modeling_flash_attention_utils.py

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

---------

Co-authored-by: taihang <taihang@U-2RHYVWX7-2207.local>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
2025-07-29 17:12:31 +02:00
1d061536cf 🌐 [i18n-KO] Translated how_to_hack_models.md to Korean (#39536)
* docs: ko: how_to_hack_models.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:09:16 -07:00
43fe41c0a8 🌐 [i18n-KO] Translated perf_train_gpu_one.md to Korean (#39552)
* docs: ko: perf_train_gpu_one.md

* feat: nmt draft

* fix: manual edits

* fix: Manually added missing backticks

* Update docs/source/ko/perf_train_gpu_one.md

fix: remove space between heading and GPU anchor

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: clarify table headers to indicate training speed boost and memory savings

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix: improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

* Update docs/source/ko/perf_train_gpu_one.md

fix : rephrase explanation of data preloading to improve readability

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-07-29 08:08:57 -07:00
9f38763731 🌐 [i18n-KO] Translated pipeline_gradio.md to Korean (#39520)
* docs: ko: pipeline_gradio.md

* feat: nmt draft

* fix: manual edits

* docs: ko: pipeline_gradio.md
2025-07-29 08:04:30 -07:00
f72311796b 🌐 [i18n-KO] Translated tokenizer.md to Korean (#39532)
* docs: ko: tokenizer.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestions

Co-authored-by: Yijun Lee <yijun-lee@users.noreply.github.com>

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* fix: resolve suggestions

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-29 08:04:14 -07:00
d346d46752 🌐 [i18n-KO] Translated tvp.md to Korean (#39578)
* docs: ko: tvp.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

* fix: manual edits

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
2025-07-29 08:04:00 -07:00
2f59c15b33 🌐 [i18n-KO] Translated albert.md to Korean (#39524)
* docs: ko: albert.md

* feat: nmt draft

* fix: manual edits
2025-07-29 08:03:40 -07:00
98386dcee9 🌐 [i18n-KO] Translated main_classes/peft.md (#39515)
* docs: ko: main_classes/peft.md

* feat: nmt draft

* docs: add missing TOC to documentation for `PeftAdapterMixin` section

Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).

* fix: Improve naturalness of purpose expression in Korean

Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.

* fix: Simplify plural form and make expression more concise

Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.

* fix: Replace technical term '주입' with more natural '적용'

Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:

'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise

'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.

* fix: update toctree path for PEFT to lowercase

Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.

* docs: update as per reviewer feedback after rebase
2025-07-29 08:03:17 -07:00
1ad216bd7d [modenbert] fix regression (#39750)
* fix regression

* add FA2 test
2025-07-29 16:58:59 +02:00
379209b603 add libcst to extras["testing"] in setup.py (#39761)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 16:58:51 +02:00
abf101af1f Fix version issue in modeling_utils.py (#39759)
fix version issue
2025-07-29 16:15:30 +02:00
8db4d79161 Enable xpu allocator on caching_allocator_warmup (#39654)
* add xpu allocator

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix typo

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix variable name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* rm useless default value

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-29 16:06:52 +02:00
fb141e2c90 Support loading Qwen3 MoE GGUF (#39638)
* support loading qwen3 gguf

* qwen3moe test cases

* fix whitespaces

* fix ggml tests
2025-07-29 13:44:44 +00:00
ccb2e0e03b Fix GPT2 with cross attention (#39754)
* fix

* use new mask API

* style

* fix copies and attention tests

* fix head pruning tests
2025-07-29 15:40:31 +02:00
dfd616e658 Avoid OOM when other tests are failing (#39758)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 15:35:44 +02:00
65df73aa88 AMD disable torchcodec (#39757)
Temporarily disable torchcodec installation because of bizarre segfault
2025-07-29 13:07:25 +00:00
63b3200779 Use --gpus all in workflow files (#39752)
gpu all

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 14:53:33 +02:00
95faabf0a6 Apply several ruff SIM rules (#37283)
* Apply ruff SIM118 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM910 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Apply ruff SIM101 fix

Signed-off-by: cyy <cyyever@outlook.com>

* Format code

Signed-off-by: cyy <cyyever@outlook.com>

* More fixes

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-29 11:40:34 +00:00
cf97f6cfd1 Fix mamba regression (#39728)
* fix mamba regression

* fix compile test
2025-07-29 12:44:28 +02:00
66984ed4f6 Update IMPORTANT_MODELS list (#39734) 2025-07-29 12:34:57 +02:00
de8d0cec30 update GemmaIntegrationTest::test_model_2b_bf16_dola again (#39731)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-29 11:42:55 +02:00
85d5aeb324 Fix: add back base model plan (#39733)
* Fix: add back base model plan

* Fix: typo

* fixup

* remove unused import

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-29 11:37:33 +02:00
2a90193dd8 [Fix] import two missing typos in models/__init__.py for typo checking (#39745)
* [Fix] import lost gemma3n for type checking in vscode

* [Fix] import missing qwen2_5_omni typo

* [Refactor] sort by ascii order
2025-07-29 11:35:22 +02:00
f2aca3eccc fix cache inheritance (#39748)
* fix cache inheritance

* styule
2025-07-29 11:24:44 +02:00
f3598a95c7 extend more trainer test cases to XPU, all pass (#39652)
extend more trainer test cases to XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-07-29 10:51:00 +02:00
75794792ad BLIPs clean-up (#35560)
* blips clean up

* update processor

* readability

* fix processor length

* fix copies

* tmp

* update and fix copies

* why keep these, delete?

* fix test fetcher

* irrelevant comment

* fix tests

* fix tests

* fix copies
2025-07-29 10:03:06 +02:00
4f8f51be4e Add Fast Segformer Processor (#37024)
* Add Fast Segformer Processor

* Modified the params according to segformer model

* modified test_image_processing_Segformer_fast args

- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class

* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast

* fixed code_quality

* added recommended fixes and tests to make sure everything processess smoothly

* Fixed SegmentationMapsLogic

- modified the preprocessing of segmentation maps to use tensors
- added batch support

* fixed some mismatched files

* modified the tolerance for tests

* use modular

* fix ci

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 19:22:32 +00:00
c353f2bb5e Superpoint fast image processor (#37804)
* feat: superpoint fast image processor

* fix: reran fast cli command to generate fast config

* feat: updated test cases

* fix: removed old model add

* fix: format fix

* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* fix: ported to torch and made requested changes

* fix: removed changes to init

* fix: init fix

* fix: init format fix

* fixed testcases and ported to torch

* fix: format fixes

* failed
test case fix

* fix superpoint fast

* fix docstring

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-28 18:15:06 +00:00
14adcbd937 Fix AMD dockerfile for audio models (#39669) 2025-07-28 19:05:41 +02:00
1c6b47451d Fix cache-related tests (#39676)
* fix

* fix kyutai at last

* fix unrelated tests and copies

* update musicgen as well

* revert tensor

* fix old test failures

* why it wasn't added?
2025-07-28 17:30:11 +02:00
fc2bd1eac0 Fix Layer device placement in Caches (#39732)
* fix device placement

* style

* typo in comment
2025-07-28 16:37:11 +02:00
7623aa3e5f Fix Qwen2AudioForConditionalGeneration.forward() and test_flash_attn_kernels_inference_equivalence (#39503)
* Add missing cache_position argument.

* Pass cache_position to language model.

* Overwrite prepare_inputs_for_generation.

* Set model to half precision for Flash Attention test.

* Cast model to bfloat16.
2025-07-28 16:35:08 +02:00
28f2619868 skip Glm4MoeModelTest::test_torch_compile_for_training (#39670)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:30:40 +02:00
88aed92b59 Update QAPipelineTests::test_large_model_course after #39193 (#39666)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-28 16:26:49 +02:00
da823fc04e mllama outputs refactor (#39643)
* mllama outputs refactor

* forgot kwargs

* fix output

* add can_record_outputs

* correct @check_model_inputs placement

* ruff and copies

* rebase

* feedback

* only return hidden_states

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
2025-07-28 15:59:20 +02:00
686bb3b098 Remove all expired deprecation cycles (#39725)
* remove all deprecation cycles

* style

* fix

* remove

* remove

* fix

* Update modular_dpt.py

* back

* typo

* typo

* final fix

* remove all args
2025-07-28 15:43:41 +02:00
a0fa500a3d [CI] Add Eric to comment slow ci (#39601)
add to ci
2025-07-28 13:24:00 +00:00
4c7da9fedf PATCH: add back n-dim device-mesh + fix tp trainer saving (#39693)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

* Fix: saving of distributed model in trainer

* Fix: distributed saving with trainer

* Feat: add pure tp saving

* Only require tp dim if ndim > 1

* Fix: default to None

* Fix: better comments/errors

* Fix: properly check tp_size attribute

* Fix: properly check for None in tp_size

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-28 12:29:58 +00:00
cbede2969b Add self-hosted runner scale set workflow for mi325 CI (#39651) 2025-07-28 13:32:25 +02:00
b56d721397 [configuration] remove redundant classmethod (#38812)
* remove redundant classmethod

* warning message, add space between words

* fix tests

* fix copies
2025-07-28 10:38:48 +00:00
02ea23cbde update ernie model card (#39657)
* update ernie model doc

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address ruff format error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

* address check_repository_consistency error reported by ci

Signed-off-by: Zhang Jun <jzhang533@gmail.com>

---------

Signed-off-by: Zhang Jun <jzhang533@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-28 10:21:18 +00:00
8b237b8639 [processors] add tests for helper fn (#39629)
* add tests for helpers

* duplicate test for each model

* why llava next video has no helper

* oops must have been in the commit

* fix test after rebase

* add copy from
2025-07-28 09:41:58 +00:00
6638b3642d xpu optimization for generation case (#39573)
* xpu optimization for generation case

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-28 11:34:58 +02:00
5c15eb55d2 fix(tokenization): check token.content for trie (#39587)
fix: check token.content for trie
2025-07-28 11:28:56 +02:00
6a61e16626 Fix missing initialization of FastSpeech2Conformer (#39689)
* fix missing initialization of FastSpeech2Conformer

* switch order and reactivate tests

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-28 10:47:39 +02:00
a6393e7d28 fix missing model._tp_size from ep refactor (#39688)
* fix missing model._tp_size from ep refactor

* restore setting device_mesh too
2025-07-26 12:26:36 +02:00
18a7c29ff8 More robust tied weight test (#39681)
* Update test_modeling_common.py

* remove old ones

* Update test_modeling_common.py

* Update test_modeling_common.py

* add

* Update test_modeling_musicgen_melody.py
2025-07-25 22:03:21 +02:00
c3401d6fad dev version 4.55 2025-07-25 21:11:20 +02:00
97f8c71f52 Add padding-free to Granite hybrid moe models (#39677)
* start fixing kwarg handling

* fmt

* updates padding free tests

* docs

* add missing kwargs modeling_granitemoe.py

* run modular util

* rm unrelated changes from modular util
2025-07-25 20:10:50 +02:00
d6e9f71a6e Fix tied weight test (#39680)
Update test_modeling_common.py
2025-07-25 20:09:33 +02:00
5da6ad2731 fix break for ckpt without _tp_plan (#39658)
* fix break for ckpt without _tp_plan

* Update src/transformers/modeling_utils.py

* Update src/transformers/modeling_utils.py

---------

Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 20:03:48 +02:00
c06d4cd6ce Add EXAONE 4.0 model (#39129)
* Add EXAONE 4.0 model

* Refactor EXAONE 4.0 modeling code

* Fix cache slicing on SWA + FA2

* Fix cache slicing on FA2 + HybridCache

* Update EXAONE 4.0 modeling code for main branch

* Update o_proj for asymmetric projection

* Address PR feedback

* Add EXAONE 4.0 docs

* Update EXAONE 4.0 modeling code for main branch

* update

* fix updates

* updates

* fix

* fix

* fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-25 19:58:28 +02:00
3e4d584a5b Support typing.Literal as type of tool parameters or return value (#39633)
* support `typing.Literal` as type of tool parameters

* validate the `args` of `typing.Literal` roughly

* add test to get json schema for `typing.Literal` type hint

* fix: add `"type"` attribute to the parsed result of `typing.Literal`

* test: add argument `booleanish` to test multi-type literal

* style: auto fixup
2025-07-25 17:51:28 +00:00
300d42a43e Add ep (#39501)
* EP + updates

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>

* remove unrelated change

* not working yet but let's see where it goes!

* update the api a bit

* udpate

* where I am at for now

* fix ep

* refactor the API

* yups

* fix

* fixup

* clean modeling

* just support llama4 for now!

* properly avoid

* fix

* nits

* Update src/transformers/models/llama4/modeling_llama4.py

* Update src/transformers/integrations/tensor_parallel.py

* style

* ,,,,

* update

---------

Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
2025-07-25 19:46:17 +02:00
abaa043d60 bad_words_ids no longer slow on mps (#39556)
* fix: bad_words_ids no longer slow on mps

* fix: SequenceBiasLogitsProcessor slow `_prepare_bias_variables` method

* fix: re-adding a deleted comment

* fix: bug in no_bad_words_logits

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 19:45:41 +02:00
6630c5b714 Add xlstm model (#39665)
* Add xLSTM cleanly with optimizations.

* Fix style.

* Fix modeling test.

* Make xLSTM package optional.

* Fix: Update torch version check.

* Fix: Bad variable naming in test.

* Fix: Import structure cleaning with Ruff.

* Fix: Update docstrings.

* Fix: Mitigate unused config attr tests by explicit usage.

* Fix: Skip tests, if xlstm library is not installed.

* Feat: Enable longer context window for inference by chunking.

* Fix: Make training test pass by lowering target accuracy.

* Chore: Increase test verbosity for failing generation test.

* Update docs/source/en/model_doc/xlstm.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix: Make xlstm available even without CUDA.

* Chore: Remove unnecessary import.

* Fix: Remove BOS insertion.

* Chore: Improve xLSTMCache documentation.

* Integrate basic xLSTM fallback code.

* Chore: Remove unnecessary import.

* Chore: Remove duplicate LayerNorm.

* chore: update copyright, minor reformatting

* fix: refactor mLSTMStateType due to missing torch import

* fix: add missing import

* Chore: Replace einops.

* fix: apply ruff formatting

* fix: run `make fix-copies` to re-generate dummy_pt_objects.py

* fix: make type hints Python 3.9 compatible

* fix: remove obsolete import

* fix: remove obsolete method from docs

* chore: remove obsolete `force_bos_token_insert` from config

* Chore: Remove duplicated xLSTMCache class.

* Fix: Formatting of modeling_xlstm.py

* Chore: Remove xlstm package requirement from test. Re-add update_rnn_state.

* Fix: Update xLSTMCache docstring.

* Feat: Add proper initialization of xLSTM.

* Chore: Re-format files.

* Chore: Adapt format.

* Fix: xLSTMCache import restructuring.

* Fix: Add __all__ lists to modeling and configuration files.

* Chore: Reformat.

* Fix: Remove unnecessary update_rnn_state function.

* Fix: Undo test accuracy quickfix.

* Fix: Update copyright year, remvoe config copy.

* Chore: Flatten all internal configs to xLSTMConfig.

* Fix: Unused config variables check.

* Chore: Remove unnecessary imports.

* Fix: Unify xlstm cache argument from batch_size to max_batch_size.

* Chore: Remove bad default arg value for xLSTMCache.

* Chore: Rename core configuration arguments to HF default in xLSTM.

* Chore: Fix formatting.

* Fix: xLSTM Cache config access.

* Fix: Update xlstm tests for config update.

* Feat: Re-add embbeding_dim, num_blocks config options for compat with xLSTM-7B.

* Fix: Configuration xLSTM python3.9 syntax.

* Fix: Difference to main in test_utils.py assertion.

* Fix: Bad syntax in xlstm config for python3.9.

* Fix: xLSTMConfig docstring.

* Fix: xLSTMConfig docstring.

* Fix typing issues in xLSTM and BeiT, Paligemma.

* Fix: Exclude xLSTM from test cache utils.

* Chore: Fix style.

* Chore: Fix format.

* Chore: Remove unnecessary LayerNorm, NormLayer layer abstractions.

* Chore: Remove asserts and replace with ValueErrors.

* Chore: Update __init__.py structure of xLSTM.

* Chore: Clean xLSTM initialization of weights.

* Fix index names in modeling_xlstm.py

* Update xlstm model test typing annotations.

* Fix: Remove all asserts.

* Revert changes to the main __init__.py

* Fix: Move xLSTMCache to modeling_xlstm.py

* Fix: Remove xLSTMForCausalLM mapping from modeling_auto.py

* Remove xLSTMCache from dummy_pt_objects.py

* Fix: Remove extended torchdynamo compilation check integrating cuda graph captures.

* Revert test_cache_utils.py xLSTM change.

* Fix: Move xLSTM init functions before init call.

* Remove xLSTMCache from generation utils.

* Fix: Clean xLSTM init functionality for recursive calls.

* Fix: Move xLSTMCache before its first call.

* Fix formatting.

* Add partial docstring for xLSTMModel forward.

* Fix xLSTMCache docstring in xLSTMModel.

* Remove xLSTMCache from public documentation. Update auto_docstring.

* Remove all agressive shape comments

* style

* Fix names

* simplify

* remove output_hidden_states

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* Update test_modeling_xlstm.py

* Update modeling_xlstm.py

* Update modeling_xlstm.py

* fix

* fix

* style

* style

---------

Co-authored-by: Korbinian Poeppel <korbinian.poeppel@nx-ai.com>
Co-authored-by: Korbinian Pöppel <37810656+kpoeppel@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Sebastian Böck <sebastian.boeck@nx-ai.com>
Co-authored-by: Korbinian Poeppel <poeppel@ml.jku.at>
2025-07-25 19:39:17 +02:00
ed9a96bc6d Use auto_docstring for perception_lm fast image processor (#39679) 2025-07-25 17:32:48 +00:00
d913b39ef3 fix: HWIO to OIHW (#39200)
* fix: HWIO to OIHW

* Bug in attention type

* Conversion script docstring

* style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-07-25 19:23:15 +02:00
a26f0fabb8 Fix auto_docstring crashing when dependencies are missing (#39564)
* add try except to not crash auto_docstring when some dependency are missing

* safeguard None value in placeholder dict
2025-07-25 19:19:23 +02:00
69cff312f5 Add support for DeepseekAI's DeepseekVL (#36248)
* upload initial code

* update deepseek-vl adaptor

* update hierarchy of vision model classes

* udpate aligner model

* add text model

* Added Image Processor

* Added Image Processor

* Added Image Processor

* apply masks

* remove projection; add aligner

* remove interpolate_pos_encoding

* remove unused params in config

* cleaning

* Add the __init__ file

* added processing deepseek_vl class

* modified the deepseek-vl processor

* modified the deepseek-vl processor

* update __init__

* Update the image processor class name

* Added Deepseek to src/transformers/__init__.py file

* Added Deepseek to image_processing_auto.py

* update the __init__ file

* update deepseek_vl image processor

* Update Deepseek Processor

* upload fast image processor

* Revert "upload fast image processor"

This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.

* update image processor

* flatten heirarchy

* remove DeepseekVLModel

* major update (complete modeling)

* auto modeling and other files

* formatting

* fix quality

* replace torchvision in modeling

* set default do_normalize to False

* add fast image processor template using tool

* update image processors

* add fast image processor to other files

* update liscense

* Added deepseek image testcases

* update image test

* update processor

* write CHAT_TEMPLATE

* update model for processor

* fix processor

* minor fixes and formatting

* fix image processing and tests

* fix interpolation in sam

* fix output_attentions in DeepseekVLModel

* upload test_modeling

* fix tests because of vocab size

* set use_high_res_vision=False in tests

* fix all modeling tests

* fix styling

* remove explicit background_color from image processors

* added test_processor

* added test_processor

* fix processor tests

* update docs

* update docs

* update docs

* update conversion script

* Fixed typos

* minor fixes from review

- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025

* fix type in config docstring

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* update get_image_features

* fix config

* improve DeepseekVLImageProcessor.preprocess

* return image_hidden_states

* use AutoTokenizer and AutoImageProcessor in Processor

* fix model outputs

* make num_image_tokens configurable

* fix docstring of processor

* move system prompt to chat template

* fix repo consistency

* fix return_dict

* replace SamVisionEncoder with SamVisionModel

* update to remove deepcopy

* 🛠️  Major Architectural Changes (Adds DeepseekVLHybrid)

* fix quality checks

* add missing hybrid in auto modeling

* run make style

* update sam_hq

* update high_res_size in test

* update docs following #36979

* update code with auto_docstring

* update conversion scripts

* fix style

* fix failing test because of tuple

* set weights_only=True in conversion script

* use safetensors.torch.load_file instead of torch.load in conversion script

* make output_dir optional in conversion script

* fix code snippets in docs (now the examples work fine)

* integration tests for DeepseekVL

* update expected texts

* make style

* integration tests for DeepseekVLHybrid

* fix class name

* update expected texts for hybrid

* run "make style"

* update since changes in main

* run make-style

* nits since changes in main

* undo changes in sam

* fix tests

* fix tests; update with main

* update with main: output_attention/output_hidden_states

* fix copied part in deepseek_vl

* run fix-copies

* fix output_hidden_states

* sam: fix _init_weigths

* use modular for DeepseekVL

* make image processor more modular

* modular: use JanusPreTrainedModel

* janus: provide kwargs in loss

* update processors in conversion script

* Revert "sam: fix _init_weigths"

This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.

* run fix-copies

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2025-07-25 19:18:50 +02:00
a98bbc294c Add missing flag for CacheLayer (#39678)
* fix

* Update cache_utils.py
2025-07-25 19:12:13 +02:00
45c7bfb157 Add evolla rebase main (#36232)
* add evolla

* adding protein encoder part

* add initial processing test

* save processor

* add docstring

* add evolla processor

* add two test

* change vision to protein

* change resampler to sequence_compressor

* change vision to protein

* initial update for llama

* add initial update for llamaForCausalLM

* add `test_processor`, `test_saprot_output`, `test_protein_encoder_output`

* change evolla, but still working on it

* add test_single_forward

* pass test_attention_outputs

* pass test_hidden_states_output

* pass test_save_load and test_from_pretrained_no_checkpoint

* pass test_cpu_offload

* skip some tests

* update new progress

* skip test_model_is_small

* pass test_model_weights_reload_no_missing_tied_weights

* pass test_model_get_set_embeddings

* pass test_cpu_offload

* skip test_resize_embeddings

* add pipeline_model_mapping

* remote old setUp

* pass processor save_pretrained and load_pretrained

* remove pooling layer

* pass test_inputs_embeds_matches_input_ids

* pass test_model_is_small

* pass test_attention_outputs

* pass test_initialization

* pass test_model_get_set_embeddings

* pass test_single_forward

* skip test_disk_offload_bin and test_disk_offload_safetensors

* fix most tests

* pass test_protein_encoder_output

* remove useless code

* add EvollaForProteinText2Text

* pass test_saprot_output

* pass all EvollaModelTest test and remove processor test

* add processor test to its own file

* skip is_training since esm skipped it and the saprot code causes error when setting is_training True

* pass processor tests

* solve all except config

* pass most cases

* change init

* add doc to `configuration_evolla.py`

* remove image_processing test

* remove extra processor test

* remove extra modules

* remove extra modules

* change all configs into one config

* pass all evolla test

* pass `make fixup`

* update short summary

* update Evolla-10B-hf

* pass check_dummies.py and check_code_quality

* fix  `tests/models/auto/test_tokenization_auto.py::AutoTokenizerTest::test_model_name_edge_cases_in_mappings`

* remove dummy codes

* change format

* fix llava issue

* update format

* update to solve llama3 access issue

* update to make forward right

* solve processor save load problem from instructblip solution

* remove unexpected file

* skip `test_generation_tester_mixin_inheritance`

* add `test_single_forward_correct` and `test_inference_natural_language_protein_reasoning`

* add `modular_evolla.py`

* solved issue #36362

* run `make fixup`

* update modular

* solve float32 training

* add fix

* solve `utils/check_docstrings.py`

* update

* update

* update

* remove other files and replace sequential and einsum

* add use case in document

* update the models

* update model

* change some wrong code

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Update src/transformers/models/evolla/modular_evolla.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix issues mentioned in PR

* update style and rearrange the placement

* fix return_dict argument issue

* solve SaProtConfig issue

* Solve EvollaSaProtRotaryEmbedding issue

* solve attention_mask issue

* solve almosst all issues

* make style

* update config

* remove unrelated pickle file

* delete pickle files

* fix config

* simplify a lot

* remove past k-v from encoder

* continue work

* style

* skip it from init

* fix init

* fix init

* simplify more

* fill in docstrings

* change test for generation

* skip test

* fix style

---------

Co-authored-by: Chenchen Han <13980209828@163.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-25 19:11:57 +02:00
2670da66ce update expected outputs for whisper after #38778 (#39304)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-25 16:48:10 +00:00
4b125e2993 fix kyutai tests (#39416)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-07-25 18:42:04 +02:00
4f17bf0572 Fixes the BC (#39636)
* fix

* update

* Update src/transformers/utils/generic.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* fixup

* fixes

* fix more models

* fix fix fix

* add embedding to more models

* update

* update

* fix

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-07-25 18:41:21 +02:00
ddb0546d14 Delete bad rebasing functions (#39672)
* remove outdated stuff

* remove comment

* use register

* remove finally clause (to allow further check if fallback to sdpa)

* general exception

* add wrapper

* revert check

* typo
2025-07-25 18:28:09 +02:00
a91653561e [Ernie 4.5] Post merge adaptations (#39664)
* ernie 4.5 fixes

* Apply style fixes

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-25 17:36:18 +02:00
5d0ba3e479 [CI] revert device in test_export_static_cache (#39662)
* revert device

* add todo
2025-07-25 15:36:12 +00:00
850bdeaa95 Fix ModernBERT Decoder model (#39671)
fix
2025-07-25 16:20:12 +01:00
17f02102c5 🚨[Fast Image Processor] Force Fast Image Processor for Qwen2_VL/2_5_VL + Refactor (#39591)
* init

* Force qwen2VL image proc to fast

* refactor qwen2 vl fast

* fix copies

* Update after PR review and update tests to use return_tensors="pt"

* fix processor tests

* add BC for min pixels/max pixels
2025-07-25 11:11:28 -04:00
f90de364c2 Rename huggingface_cli to hf (#39630)
* Rename huggingface_cli to hf

* hfh
2025-07-25 14:10:04 +02:00
3b3f9c0c46 fix(voxtral): correct typo in apply_transcription_request (#39572)
* fix(voxtral): correct typo in apply_transcription_request

* temporary wrapper: apply_transcrition_request

* Update processing_voxtral.py

* style: sort imports in processing_voxtral.py

* docs(voxtral): fix typo in voxtral.md

* make style

* doc update

---------

Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-07-25 12:09:44 +00:00
2a82cf06ad make fixup (#39661) 2025-07-25 11:27:45 +00:00
e3760501b0 [docs] fix ko cache docs (#39644)
fix ko docs
2025-07-25 10:06:03 +01:00
91f591f7bc Make pytorch examples UV-compatible (#39635)
* update release.py

* add uv headers in some pytorch examples

* rest of pytorch examples

* style
2025-07-25 10:46:22 +02:00
c46c17db57 revert change to cu_seqlen_k and max_k when preparing from position_ids (#39653) 2025-07-25 10:28:22 +02:00
4600c27c4f Fix: explicit not none check for tensors in flash attention (#39639)
fix: explicit not none check for tensors
2025-07-25 10:09:14 +02:00
c392d47c9b [attention] fix test for packed padfree masking (#39582)
* fix most tests

* skip a few more tests

* address comments

* fix chameleon tests

* forgot to uncomment

* qwen has its own tests with images, rename it as well
2025-07-25 07:44:52 +00:00
565c035a2e Add owlv2 fast processor (#39041)
* add owlv2 fast image processor

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class

* change references to owlVit to owlv2 in docstrings for post process methods

* change type hints from List, Dict, Tuple to list, dict, tuple

* remove unused typing imports

* add disable grouping argument to group images by shape

* run make quality and repo-consistency

* use modular

* fix auto_docstring

---------

Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-07-25 02:40:11 +00:00
5a81d7e0b3 revert behavior of _prepare_from_posids (#39622)
* revert behavior of _prepare_from_posids

* add back cu_seqlens_k and max_k for inference
2025-07-24 20:31:00 +02:00
ad6fd2da0e [Voxtral] values for A10 runners (#39605)
* values for A10 runners

* make

* as for Llava

* does not apply to Voxtral
2025-07-24 18:52:35 +02:00
4741e1f1b7 [timm] new timm pin (#39640) 2025-07-24 16:01:59 +00:00
12b612830d [efficientloftr] fix model_id in tests (#39621)
fix: wrong EfficientLoFTR model id in tests
2025-07-24 10:41:06 +01:00
947a37e8f5 Update recent processors for vLLM backend (#39583)
* update recent models and make sure it runs withh vLLM

* delete!
2025-07-24 10:29:27 +02:00
7b897fe583 [Docs] Translate audio_classification.md from English to Spanish (#39513)
* Docs: translate audio_classification to Spanish

* Update audio_classification.md

* Remove space
* Normalize backticks

* Update audio_classification.md

* Apply corrections recommended by aaronjimv

* Update _toctree.yml

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 15:55:13 -07:00
9b7244f189 standardized YOLOS model card according to template in #36979 (#39528)
* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* standardized YOLOS model card according to template in #36979

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/yolos.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* replaced YOLOS architecture image, deleted quantization and AttentionMaskVisualizer sections

* removed cli section

* Update yolos.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 11:00:25 -07:00
ec8a09a5fe Feature/standardize opt model card (#39568)
* docs: Standardize OPT model card with enhanced details

* Remove incorrect link from OPT model card

* Address review feedback on OPT model card

* Update opt.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-23 10:57:48 -07:00
c5a80dd6c4 🔴 Fix EnCodec internals and integration tests (#39431)
* EnCodec fixes and update integration tests.

* Apply padding mask when normalize is False.

* Update comment of copied function.

* Fix padding mask within modeling.

* Revert padding function.

* Simplify handling of padding_mask.

* Address variable codebook size.

* Add output for padding for consistency with original model, fix docstrings.

* last_frame_pad_length as int

* Update example code.

* Improve docstring/comments.

* Shorten expected output.

* Consistent docstring.

* Parameterize tests.

* Properties for derived variables.

* Update expected outputs from GitHub runner.

* Consistent outputs with runner GPUs.
2025-07-23 19:39:27 +02:00
7a4e2e7868 Fix DAC integration tests and checkpoint conversion. (#39313)
* Fix DAC (slow) integration tests.

* Fix DAC conversion.

* Address comments

* Sync with main, uncomment nn.utils.parametrizations.weight_norm.

* Update DAC integration tests with expected outputs.

* Added info about encoder/decoder error and longer decoder outputs.

* Parameterize tests.

* Set expected values to GitHub runners.
2025-07-23 19:21:26 +02:00
596a75f6e9 Move openai import (#39613) 2025-07-23 19:05:39 +02:00
a0e5a7d34b Transformers serve VLM (#39454)
* Add support for VLMs in Transformers Serve

* Raushan comments

* Update src/transformers/commands/serving.py

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Quick fix

* CPU -> Auto

* Update src/transformers/commands/serving.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fixup

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 17:03:18 +02:00
ea56eb6bed Fix important models CI (#39576)
* relax test boundaries and fix from config

* eager is always supported.
2025-07-23 16:24:29 +02:00
0fe03afeb8 Fix typos and grammar issues in documentation and code (#39598)
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-07-23 12:43:11 +00:00
82603b6cc2 Allow device_mesh have multiple dim (#38949)
* Feat: something

* Feat: initial changes

* tmp changes to unblock

* Refactor

* remove todo

* Feat: docstring

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:27:36 +00:00
10c990f7e2 enable triton backend on awq xpu (#39443)
* enable triton backend on awq xpu

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* Update src/transformers/quantizers/quantizer_awq.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* fix dtype check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix format

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix check

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-23 12:10:38 +00:00
e7e6efcbbd [idefics3] fix for vLLM (#39470)
* fix idefics3 for vllm tests

* fix copies
2025-07-23 14:00:43 +02:00
a62f65a989 fix moe routing_weights (#39581)
* fix moe routing_weights

* fix ernie4_5_moe routing_weights

* fix integration test

---------

Co-authored-by: llbdyiu66 <llbdyiu66@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-23 11:20:23 +00:00
623ab01039 FP-Quant support (#38696)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-07-23 11:41:10 +02:00
eb1a007f7f Rename supports_static_cache to can_compile_fullgraph (#39505)
* update all

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* apply suggestions

* fix copies

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-23 09:35:18 +00:00
b357cbb19d [Trackio] Allow single-gpu training and monitor power (#39595)
Allow not distributed and monitor power
2025-07-23 11:22:50 +02:00
019b74977d Generic task-specific base classes (#39584)
* first shot

* Update modeling_layers.py

* fix mro order

* finalize llama

* all modular and copied from from llama

* fix
2025-07-23 10:49:47 +02:00
5dba4bc7b2 Fix DynamicCache and simplify Cache classes a bit (#39590)
* fix

* use kwargs

* simplify

* Update cache_utils.py

* Update cache_utils.py

* Update test_cache_utils.py

* fix

* style
2025-07-23 10:13:45 +02:00
d9b35c635e Mask2former & Maskformer Fast Image Processor (#35685)
* add maskformerfast

* test

* revert do_reduce_labels and add testing

* make style & fix-copies

* add mask2former and make fix-copies
TO DO:
	add test for mask2former

* make fix-copies

* fill docstring

* enable mask2former fast processor

* python utils/custom_init_isort.py

* make fix-copies

* fix PR's comments

* modular file update

* add license

* make style

* modular file

* make fix-copies

* merge

* temp commit

* finish up maskformer mask2former

* remove zero shot examples

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-23 02:47:47 +00:00
6e9972962f 🎯 Trackio integration (#38814)
* First attempt

* fix

* fix

* Enhance TrackioCallback to log GPU memory usage and allocation

* Enhance Trackio integration in callbacks and training arguments documentation

* re order

* remove unused lines

* fix torch optional
2025-07-22 14:50:20 -07:00
c6d0500d15 [WIP] Add OneformerFastImageProcessor (#38343)
* [WIP] OneformerFastImageProcessor

* update init

* Fully working oneformer image processor fast

* change Nearest to Neares exact interpolation where needed

* fix doc

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-07-22 20:41:39 +00:00
4884b6bf41 Fix link in "Inference server backends" doc (#39589)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-22 16:44:08 +00:00
075a65657a Torchdec RuntimeError catch (#39580)
* fix

* fix

* maybe better

* style
2025-07-22 18:35:03 +02:00
2936902a76 [Paged-Attention] Handle continuous batching for repetition penalty (#39457)
* Handle continuous batching for repetition penalty

* fix last scores and with token mask creation

* add test

* Update src/transformers/generation/continuous_batching.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/generation/logits_process.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* remove unneeded cast

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-22 18:13:40 +02:00
cbcb8e6c1f updated mistral3 model card (#39531)
* updated mistral3 model card (#1)

* updated mistral3 model card

* applying suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* made all changes to mistral3.md

* adding space between paragraphs in docs/source/en/model_doc/mistral3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* removing duplicate in mistral3.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* adding 4 backticks to preserve formatting

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 09:01:55 -07:00
601260fd96 Update docs/source/ko/_toctree.yml (#39516)
docs: update `docs/source/ko/_toctree.yml`
2025-07-22 09:00:42 -07:00
c338fd43b0 [cache refactor] Move all the caching logic to a per-layer approach (#39106)
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2
2025-07-22 16:10:25 +02:00
b16688e96a General weight initialization scheme (#39579)
* general + modulars from llama

* all modular models

* style and fix musicgen

* fix

* Update configuration_musicgen.py

* Update modeling_utils.py
2025-07-22 16:04:20 +02:00
015b62bf3e Add AMD GPU expectations for LLaVA tests (#39486)
* Add AMD GPU expectation to llava tests

* FMT

* Remove debug print

* Address review  comments
2025-07-22 14:01:54 +00:00
efceeaf267 Kernels flash attn (#39474)
* use partial to wrap around `transformers` utils!

* try to refactor?

* revert one wrong change

* just a nit

* push

* reverter watever was wrong!

* some nits

* fixes when there is no attention mask

* bring the licence back

* some fixes

* nit

* style

* remove prints

* correct dtype

* fa flags for testing

* update

* use paged attention if requested!

* updates

* a clone was needed, not sure why

* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute

* simplify and improve?

* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else

* fix!

* protect kernels import

* update

* properly parse generation config being passed

* revert and update

* add two tests

* some fixes

* fix test FA2

* takes comment into account

* fixup

* revert changes

* revert the clone, it is only needed because the metal kernel is not doing it?

* [docs] update attention implementation and cache docs (#39547)

* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* fix mps on our side for now

* Update src/transformers/integrations/flash_paged.py

* no qa

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:41:06 +02:00
b62557e712 Add AMD expectations to Mistral3 tests (#39481)
Add AMD expectations to mistral3 tests
2025-07-22 15:40:16 +02:00
1806583390 [docs] Create page on inference servers with transformers backend (#39550)
* draft docs on inference servers

* Update docs/source/en/_toctree.yml

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* update

* dic build failed

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/transformers_as_backend.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* apply last suggestions

---------

Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:31:10 +02:00
cd98c1fee3 [docs] update attention implementation and cache docs (#39547)
* update docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* applu suggestions

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-22 15:06:43 +02:00
ef99537f37 Add AMD test expectations to DETR model (#39539)
* Add AMD test expectations to DETR model

* Fix baseline expectation

* Address review comments

* Make formatting a bit more consistent
2025-07-22 12:07:10 +00:00
30567c28e8 [timm_wrapper] add support for gradient checkpointing (#39287)
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification

* ruff fix

* refactor + add test for not supported model

* ruff

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 11:07:52 +00:00
a44dcbe513 Fixes needed for n-d parallelism and TP (#39562)
Handle non-DTensors cases in TP Layers

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-22 10:24:59 +00:00
0cae633ce1 Bump AMD container for 2.7.1 PyTorch (#39458)
* Bump AMD container for 2.7.1 PyTorch

* Forgot to update pinned packages
2025-07-22 12:11:38 +02:00
a88ea9cbc8 Add EfficientLoFTR model (#36355)
* initial commit

* Apply suggestions from code review

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: various typos, typehints, refactors from suggestions

* fix: fine_matching method

* Added EfficientLoFTRModel and AutoModelForKeypointMatching class

* fix: got rid of compilation breaking instructions

* docs: added todo for plot

* fix: used correct hub repo

* docs: added comments

* fix: run modular

* doc: added PyTorch badge

* fix: model repo typo in config

* fix: make modular

* fix: removed mask values from outputs

* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor

* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list

* fix: reformat

* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride

* doc: added q, kv aggregation kernel size and stride doc to config

* refactor: converted efficientloftr implementation from modular to copied from mechanism

* tests: overwrote batching_equivalence for "keypoints" specific tests

* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils

* fix: make fix-copies

* fix: make style

* fix: update rope function to make meta tests pass

* fix: rename plot_keypoint_matching to visualize_output for clarity

* refactor: optimize image pair processing by removing redundant target size calculations

* feat: add EfficientLoFTRImageProcessor to image processor mapping

* refactor: removed logger and updated attention forward

* refactor: added auto_docstring and can_return_tuple decorators

* refactor: update type imports

* refactor: update type hints from List/Dict to list/dict for consistency

* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching

* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]

* fix typing

* fix some typing issues

* nit

* a few more typehint fixes

* Remove output_attentions and output_hidden_states from modeling code

* else -> elif to support efficientloftr

* nit

* tests: added EfficientLoFTR image processor tests

* refactor: reorder functions

* chore: update copyright year in EfficientLoFTR test file

* Use default rope

* Add docs

* Update visualization method

* fix doc order

* remove 2d rope test

* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py

* fix docs

* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py

* update gradient

* refactor: removed unused codepath

* Add motivation to keep postprocessing in modeling code

* refactor: removed unnecessary variable declarations

* docs: use load_image from image_utils

* refactor: moved stage in and out channels computation to configuration

* refactor: set an intermediate_size parameter to be more explicit

* refactor: removed all mentions of attention masks as they are not used

* refactor: moved position_embeddings to be computed once in the model instead of every layer

* refactor: removed unnecessary hidden expansion parameter from config

* refactor: removed completely hidden expansions

* refactor: removed position embeddings slice function

* tests: fixed broken tests because of previous commit

* fix is_grayscale typehint

* not refactoring

* not renaming

* move h/w to embeddings class

* Precompute embeddings in init

* fix: replaced cuda device in convert script to accelerate device

* fix: replaced stevenbucaille repo to zju-community

* Remove accelerator.device from conversion script

* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module

* fix: removed unused attributes in configuration

* fix: missing self

* fix: refactoring and tests

* fix: make style

---------

Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-07-22 10:53:16 +01:00
3bc726b381 [gemma3] fix bidirectional image mask (#39396)
* fix gemma3 mask

* make compile happy, and use only torch ops

* no full attention between images

* update tests

* fix tests

* add a fast test
2025-07-22 10:04:56 +02:00
fbeaf96f9e Update OLMoE model card (#39344)
* Update OLMoE model card

* Checks Test

* Add license and code

* Update docs/source/en/model_doc/olmoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update olmoe.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:41:01 -07:00
641aaed7c0 Update modernbertdecoder docs (#39453)
* update docs with paper and real model

* nit

* Apply suggestions from code review

Thanks to @stevhlui!

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Remove usage examples, add quantization

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-21 16:40:22 -07:00
049a674e68 [CI] Fix post merge ernie 4.5 (#39561)
fix repo consistency
2025-07-21 20:56:24 +02:00
b3ebc761e2 [Fast image processors] Improve handling of image-like inputs other than images (segmentation_maps) (#39489)
* improve handlike of other image-like inputs in fast image processors

* fix issues with _prepare_images_structure

* update sam image processor fast

* use dict update
2025-07-21 14:12:14 -04:00
b4115a426e [Ernie 4.5] Add ernie text models (#39228)
* init

* copied from remote

* add proper structure and llama like structure

* fixup

* revert to state that works

* get closer to llama

* slow and steady

* some removal

* masks work

* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm

* nice

* getting closer

* closer to transformers style

* let's simplify this, batching works now

* simplified

* working version with modular

* it is indeed the rotation per weights, make it complete llama style

* cleanup conversion, next to look at -> tokenizer

* remove llama artefacts

* fix modeling tests (common ones)

* style

* integration test + first look into tokenization (will need more work, focussing on modeling other models first)

* style

* working moe version, based on remote

* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)

* more cleanup

* refactor namings and remove addition forXXX classes

* our moe won't cut it it seems, correction bias seems to be missing in remote code version

* tokenization change (remote)

* our moe version works when adding normalization :D

* cleanup moe

* nits

* cleanup modeling -> let's get to modular next

* style

* modular v1

* minor things + attempt at conversion (which doesn't work)

* no conversion follow glm, fixup modular and other nits

* modular cleanup

* fixes

* tests, tests, tests + some moe dtype forcing

* simplify modular, fix fatal fa2 bug, remaining tests

* fix import issue?

* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float

* fix sdpa test, load on init dtype only

* fixup post merge

* style

* fix doc links

* tokenization cleanup beginnings

* simplify tokenizer by a lot as its basically llama

* tokenizer is full llama with different defaults + extra special tokens

* sync og special tokens of ernie

* fix decoding with numbers (also in remote done what a timing), begin of tok tests

* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)

* nits

* docs

* my daily post merge it is

* check

* tokenization update with explanations and conversion script

* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)

* post merge fixes

* fixup tokenization, llama fast is the way to go

* more fixups

* check

* import fixes

* correction bias following the paddle code

* fix

* fix TP plan, fix correction bias sharding during forward

* style

* whoops

* fix tied weights

* docs and last nit

* license

* flasky tests

* move repo id, update when merged on the hub
2025-07-21 19:51:49 +02:00
69b158260f Refactor embedding input/output getter/setter (#39339)
* simplify common get/set

* remove some noise

* change some 5 years old modeling utils

* update examples

* fix copies

* revert some changes

* fixes, gah

* format

* move to Mixin

* remove smolvlm specific require grad

* skip

* force defaults

* remodularise some stuff

* remodularise more stuff

* add safety for audio models

* style

* have a correct fallback, you daft donkey

* remove this argh

* change heuristic for audio models

* fixup

* revert

* this works

* revert again

* 🧠

* aaah ESM has two modelings aaah

* add informative but short comment

* add `input_embed_layer` mixin attribute

* style

* walrus has low precedence

* modular fix

* this was breaking parser
2025-07-21 18:18:14 +02:00
2da97f0943 🌐 [i18n-KO] Translated perf_infer_gpu_multi.md to Korean (#39441)
* docs: ko: perf_infer_gpu_many.md

* feat: nmt draft

* docs: refine KO translation and enhance naturalness

* docs: add missing TOC to documentation

* Align toctree and filename with original: perf_infer_gpu_multi

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Refine Korean translation

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/perf_infer_gpu_multi.md

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
2025-07-21 09:14:15 -07:00
82807e56b1 [Fast image processor] refactor fast image processor glm4v (#39490)
refactor fast image processor glm4v
2025-07-21 11:18:46 -04:00
4b4f04fcca fix ndim check of device_mesh for TP (#39538) 2025-07-21 13:09:33 +00:00
1aa7256f01 Refactor MambaCache to modeling_mamba.py (#38086)
* Refactor MambaCache to modeling_mamba.py (parity with Zamba)

* ruff

* fix dummies

* update

* update

* remove mamba ref in cache tests

* remove cache_implementation from tests

* update

* ruff

* ruff

* sneaky regression

* model consistency

* fix test_multi_gpu_data_parallel_forward

* fix falcon slow tests

* ruff

* ruff

* add sample false

* try to fix slow tests

* Revert "fix test_multi_gpu_data_parallel_forward"

This reverts commit 66b7162c7c5c5ce8a73ccf48cffc8a96343ebb33.

* fix tests on nvidia t4, remove dataparallel tests from mamba

* ruff

* remove DDP tests from mamba and falcon_mamba

* add explicit error for MambaCache

* mamba2 also needs to init cache in prepare_inputs_for_generation

* ruff

* ruff

* move MambaCache to its own file

* ruff

* unprotected import fix

* another attempt to fix unprotected imports

* Revert "another attempt to fix unprotected imports"

This reverts commit 2338354fcab630de5899321f5daced5fb312c2a2.

* fixing unprotected import, attempt 3

* Update src/transformers/cache_utils.py

* ruff's fault

* fix arthur review

* modular falcon mamba

* found a hack

* fix config docs

* fix docs

* add export info

* merge modular falcon branch

* oopsie

* fix fast path failing

* new approach

* oopsie

* fix types

* Revert new pragma in modular

This reverts commit 80b1cf160ee251536f07c40b8a0857d499e70db6.

* trying another modular workaround

* review & fix ci

* oopsie

* clear prepare_inputs on mamba/mamba2/falcon_mamba
2025-07-21 14:59:36 +02:00
a419a40234 Fix Docstring of BarkProcessor (#39546)
* Fix Docstring of BarkProcessor

* Fix typo

* Add type hint of return value for BarkProcessor.__call__
2025-07-21 12:56:44 +00:00
9323d0873c use the enable_gqa param in torch.nn.functional.scaled_dot_product_at… (#39412)
* use the enable_gqa param in torch.nn.functional.scaled_dot_product_attention

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ci failure fix

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add check

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix ci failure

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code, extend to cuda

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine code

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix review comments

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* refine the PR

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:46:43 +02:00
6b3a1f2f51 Fix missing initializations for models created in 2023 (#39239)
* fix SwiftFormer

* fix Kosmos2

* fix Owlv2

* fix Sam

* fix Vits

* fix Pvt

* fix MobileViTV2

* fix PatchTST

* fix Bros

* fix Informer

* fix BridgeTower

* fix Mra and Yoso

* fix Rwkv

* fix EfficientNet

* fix NllbMoe

* fix Tvp

* fix Clap

* fix Autoformer

* fix SwiftFormer

* fix Mgpstr

* fix Align

* fix VitMatte

* fix SpeechT5

* add conditional check for parameters

* fix SpeechT5

* fix TimmBackbone and Clvp

* fix SwiftFormer

* fix SeamlessM4T and SeamlessM4Tv2

* fix Align

* fix Owlv2 and OwlViT

* add reviewed changes

* add reviewed changes

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 14:43:52 +02:00
970d9a75ce Raise TypeError instead of ValueError for invalid types (#38660)
* Raise TypeError instead of ValueError for invalid types.

* Removed un-necessary changes.

* Resolved conflicts

* Code quality

* Fix failing tests.

* Fix failing tests.
2025-07-21 12:42:00 +00:00
822c5e45b2 Fix pylint warnings (#39477)
* Fix pylint warnings

Signed-off-by: cyy <cyyever@outlook.com>

* Fix variable names

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-21 12:38:05 +00:00
dc017cd763 Fix Qwen Omni integration test (#39553)
fix
2025-07-21 14:11:46 +02:00
fdc0566e15 🚨🚨🚨 [Trainer] Enable average_tokens_across_devices by default in TrainingArguments (#39395)
Enable average_tokens_across_devices by default in TrainingArguments

Fixes #39392

This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-21 12:11:20 +00:00
8c102e2eb1 Rename _supports_flash_attn_2 in examples and tests (#39471)
* delete `_supports_flash_attn_2` from examples and tests

* simplify docs
2025-07-21 14:02:57 +02:00
3a152e3a5c Fix the check in flex test (#39548)
* fix the check

* fix flags

* flags
2025-07-21 13:29:44 +02:00
78fb2d2760 Fix bad tensor shape in failing Hubert test. (#39502)
Fix bad tensor shape in Hubert test.
2025-07-21 12:25:52 +01:00
39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
344012b3a6 [qwen2 vl] fix packing with all attentions (#39447)
* fix qwen2 vl packing in FA2

* why? delete!

* qwen2-5-vl seems to work now

* update

* fix tests

* start by adapting FA2 tests

* add similar tests for sdpa/eager

* address comments

* why is this even in conditional model and not base model?
2025-07-21 12:19:15 +02:00
e42681b48b [gemma3] support sequence classification task (#39465)
* add seq clf class

* fix docs and add in auto-map

* skip tests

* optional pixels
2025-07-21 11:03:20 +02:00
34133d0a79 Fix placeholders replacement logic in auto_docstring (#39433)
Fix and simplify placeholders replacement logic
2025-07-18 22:56:23 +00:00
433d2a23d7 Update SAM/SAM HQ attention implementation + fix Cuda sync issues (#39386)
* update attention implementation and improve inference speed

* modular sam_hq + fix integration tests on A10

* fixup

* fix after review

* softmax in correct place

* return attn_weights in sam/sam_hq
2025-07-18 18:46:27 -04:00
541bed22d6 Improve @auto_docstring doc and rename args_doc.py to auto_docstring.py (#39439)
* rename `args_doc.py` to `auto_docstring.py` and improve doc

* modifs after review
2025-07-18 18:00:34 +00:00
de0dd3139d Add fast image processor SAM (#39385)
* add fast image processor sam

* nits
2025-07-18 17:27:16 +00:00
561a79a2f4 Fix BatchEncoding.to() for nested elements (#38985) 2025-07-18 14:14:45 +01:00
f4d076561f [gemma3] Fix do_convert_rgb in image processors. (#39438)
* [gemma3] Fix do_convert_rgb in image processors.

* [gemma3] Fix do_convert_rgb in image processors.
2025-07-18 12:33:00 +00:00
bcc0091937 [chat template] return assistant mask in processors (#38545)
* messed up the git history, squash commits

* raise error if slow and refine tests

* index was off by one

* fix the test
2025-07-18 12:23:20 +00:00
328ca9cf1d [dependencies] Update datasets pin (#39500)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?

* datasets pin

* comment
2025-07-18 12:05:28 +00:00
fb58377700 Slack CI bot: set default result for non-existing artifacts (#39499)
* Set default result for non-existing artifacts

* FMT

* Address review comments
2025-07-18 11:45:47 +00:00
4ded9a4113 🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try

* Update modeling_utils.py

* Update modeling_utils.py

* big refactor

* Update modeling_utils.py

* style

* docstrings and simplify inner workings of configs

* remove all trace of _internal

* Update modeling_utils.py

* fix logic error

* Update modeling_utils.py

* recursive on config

* Update configuration_utils.py

* fix

* Update configuration_dpt.py

* Update configuration_utils.py

* Update configuration_utils.py

* Update modeling_idefics.py

* Update modeling_utils.py

* fix for old models

* more old models fixup

* Update modeling_utils.py

* Update configuration_utils.py

* Remove outdated test

* remove the deepcopy!! 🥵🥵

* Update test_modeling_gpt_bigcode.py

* fix qwen dispatch

* restrict to only models supporting it

* style

* switch name

* Update modeling_utils.py

* Update modeling_utils.py

* add tests!

* fix

* rypo

* remove bad copies

* fix

* Update modeling_utils.py

* additional check

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* Update modeling_utils.py

* fix

* skip
2025-07-18 13:41:54 +02:00
2b819ba4e3 [dependencies] temporary pyarrow pin (#39496)
* pyarrow pin

* make fixup

* test?

* like this?

* like this?

* like this?
2025-07-18 10:05:40 +00:00
967045082f Add voxtral (#39429)
* draft

* draft update (conversion working)

* mend

* draft update

* draft update: working generate

* refactor

* VoxtralProcessor draft

* processor update

* update convert_tekken_tokenizer

* refactor processor

* update convert

* make style

* better handle prefil

* make style

* add tests

* add mistral_common audio loading

* processor update

* revert changes

* audio utils update

* add audio to apply chat template mistral update

* voxtral processor update

* fix

* udpate converstion script

* make mistral tokenier from pretrain work from local dir

* fix udpates

* add integration tests

* add batched version

* processor docstring

* make style

* revert convert_tekken_tokenizer changes

* revert processing_qwen2.5 changes

* add multi-turn test

* processor improvements

* address review changes

* Update src/transformers/tokenization_mistral_common.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* update audio utils

* nits

* integration test update

* correct _support

* update tests

* test update

* update integration tests

* fix

* fix

* fix

* add test_apply_chat_template_with_audio

* add model doc

* model doc

* nit

* doc uptade

* nit

* processor improvement

* ensure default is 3B

* nits

* make

* make

* convert modular

* update checkpoint

* fix test

* make

* make

* autos

* make

* make

* nit

* nit

* nit

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-18 00:02:04 +00:00
73869f2e81 Fix typing order (#39467)
* fix type order

* change all Union[str, dict] to Union[dict, str]

* add hf_parser test && fix test order

* add deepspeed dependency

* replace deepspeed with accelerator
2025-07-17 15:47:31 +00:00
bda75b4011 Add unified logits_to_keep support to LLMClass (#39472)
* add supports for logits_to_keep for qwen25vl and glm4v

* Update relevant modular files
2025-07-17 17:07:12 +02:00
bf6c997685 [serve] Add speech to text (/v1/audio/transcriptions) (#39434)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

* experiment: base model from typed dict

* audio working

* fix bad rebase

* load audio with librosa

* implement timed models

* almost working

* make fixup

* fix tests

* transcription request type

* tokenizer -> processor

* add example in docs

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-07-17 14:29:57 +00:00
8b3de61a65 Update integration_utils.py (#39469)
* Update integration_utils.py

sanitize mlflow upload metric

* Update integration_utils.py

change import order to pass CI

* Update integration_utils.py

add comments

* Update integration_utils.py

Remove whitespace from blank line
2025-07-17 13:57:49 +00:00
7fd60047c8 fix: ImageTextToTextPipeline handles user-defined generation_config (#39374)
fix: ImageTextToTextPipeline handles user-defined generation_config passed to the pipeline

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-07-17 13:23:29 +00:00
60b5471da3 Enable some ruff checks for performance and readability (#39383)
* Fix inefficient sequence tests

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PERF102

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC1802

Signed-off-by: cyy <cyyever@outlook.com>

* Enable PLC0208

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:21:59 +00:00
fc700c2a26 Fix convert_and_export_with_cache failures for GPU models (#38976)
* Add the `device` option for `generate()`

* Add device for default tensors to avoid tensor mismatch

* [test] Enable test_static_cache_exportability for torch_device

* infer device from the prompt_token_ids

* Add device for generated tensor

* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU

* fix format

* infer device from the model
2025-07-17 13:12:32 +00:00
54680d75c9 Update GemmaIntegrationTest::test_model_2b_bf16_dola (#39362)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-17 14:06:23 +01:00
322400af58 fix a comment typo in utils.py (#39459) 2025-07-17 13:06:04 +00:00
43f07018cf Use newer typing notation (#38934)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-17 13:05:21 +00:00
565dd0bad7 Fix tests due to breaking change in accelerate (#39451)
* update values

* fix
2025-07-17 13:51:50 +01:00
26fed50460 fix max_length calculating using cu_seq_lens (#39341) 2025-07-17 10:54:23 +02:00
cdfe6164b3 fix(pipelines): QA pipeline returns fewer than top_k results in batch mode (#39193)
* fixing the bug

* Try a simpler approach

* make fixup

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2025-07-17 10:24:30 +02:00
b85ed49e0a Corrections to PR #38642 and enhancements to Wav2Vec2Processor __call__ and pad docstrings (#38822)
* Correcting PR #38642.  The PR removed references to the deprecated method "as_target_processor()" in the
__call__ and pad method docstrings, which is correct, but also removed all references to PreTrainedTokenizer,
which is incorrect.  This commit adds back the reference to PreTrainedTokenizer and also takes the
opportunity to enhance the docstrings with the invocation procedure post removal of "as_target_processor()"
and adds information on return values.

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/wav2vec2/processing_wav2vec2.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: René Tio <tor@Jammer.local>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-16 14:13:07 -07:00
787a0128a9 create ijepa modelcard (ref : PR #36979 ). (#39354)
* wip: adding first version of the IJEPA model card.

* refactor based on the @stevhliu feedbacks

* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture

- general context updation.

* - changes of model for example quantization.
- merging the  quantization content.
2025-07-16 12:40:22 -07:00
48f2233cdf Improve grammar and clarity in perf_hardware.md (#39428) 2025-07-16 12:15:15 -07:00
e68ebb695f fix cached file error when repo type is dataset (#36909)
* fix cached file

* Update hub.py
2025-07-16 18:02:26 +02:00
35a416c400 Fix indentation bug in SmolVLM image processor causing KeyError (#39452)
Fix indentation bug in Idefics3 image processor

- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors

cc @yonigozlan

Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
2025-07-16 11:59:28 -04:00
2c58705dc2 Updated Megatron conversion script for gpt2 checkpoints (#38969)
* update script to support new megatron gpt format

* fixed quality failures

---------

Co-authored-by: Luke Friedrichs <LckyLke>
2025-07-16 15:54:29 +00:00
26be7f717e [CI] Fix partially red CI (#39448)
fix
2025-07-16 15:53:43 +02:00
0a88751940 Fixes #39204: add fallback if get_base_model missing (#39226)
* Fixes #39204: add fallback if get_base_model missing

* Inline try_get_base_model logic as suggested in PR review

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-16 15:51:30 +02:00
ba506f87db make the loss context manager easier to extend (#39321) 2025-07-16 15:47:24 +02:00
9f1ac6f185 Remove something that should have never been there (#38254)
* what the hell

* update

* style

* style

* typing

* fix init issue

* fix granite moe hybrid as well
2025-07-16 15:22:44 +02:00
a7ca5b5d67 Fix processor tests (#39450)
fix
2025-07-16 15:01:35 +02:00
71818f570b [Bugfix] [Quantization] Remove unused init arg (#39324)
remove unused arg from ct config init

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 14:57:42 +02:00
cc24b0378e Better typing for model.config (#39132)
* Apply to all models config annotation

* Update modular to preserve order

* Apply modular

* fix define docstring

* fix dinov2 consistency (docs<->modular)

* fix InstructBlipVideoForConditionalGeneration docs<->modular consistency

* fixup

* remove duplicate code

* Delete config_class attribute from the modeling code

* Add config_class attribute in base model

* Update init sub class

* Deprecated models update

* Update new models

* Fix remote code BC issue

* fixup

* fixing more corner cases

* fix new models

* add test

* modular docs update

* fix comment a bit

* fix for py3.9
2025-07-16 14:50:35 +02:00
4b258454a7 Fix typo in generation configuration for Janus model weight conversion (#39432)
* Fix typo in generation configuration for Janus model weight conversion

* Fix typo

* Update Janus model generation configuration

* Update Janus model to use generation_kwargs
2025-07-16 14:28:02 +02:00
de5ca373ac Responses API in transformers serve (#39155)
* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* Responses API (to be merged into #39155) (#39338)

* Scaffolding

* Explicit content

* Naïve Responses API streaming implementation

* Cleanup

* use openai

* validate request, including detecting unused fields

* dict indexing

* dict var access

* tmp commit (tests failing)

* add slow

* use oai output type in completions

* (little rebase errors)

* working spec?

* guard type hint

* type hints. fix state (CB can now load different models)

* type hints; fn names; error type

* add docstrings

* responses + kv cache

* metadata support; fix kv cache; error event

* add output_index and content_index

* docstrings

* add test_build_response_event

* docs/comments

* gate test requirements; terminate cb manager on model switch

* nasty type hints

* more type hints

* disable validation by default; enable force models

* todo

---------

Co-authored-by: Lysandre <hi@lysand.re>

* Slight bugfixes

* PR comments from #39338

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Joao Gante <joao@huggingface.co>
2025-07-16 14:16:16 +02:00
c8524aeb07 [cache] make all classes cache compatible finally (#38635)
* dump

* push other models

* fix simple greedy generation

* xmod

* add fmst and clean up some mentions of old cache format

* gpt-bigcode now follows standards

* delete tuple cache reference in generation

* fix some models

* fix some models

* fix mambas and support cache in tapas

* fix some more tests

* fix copies

* delete `_reorder_cache`

* another fix copies

* fix typos and delete unnecessary test

* fix rag generate, needs special cache reordering

* fix tapas and superglue

* reformer create special cache

* recurrent gemma `reorder_cache` was a no-op, delete

* fix-copies

* fix blio and musicgen pipeline tests

* fix reformer

* fix reformer, again...

* delete `_supports_cache_class`

* delete `supports_quantized_cache`

* fix failing tests

* fix copies

* some minor clean up

* style

* style

* fix copies

* fix tests

* fix copies

* create causal mask now needs positions?

* fixc copies

* style

* Update tests/test_modeling_common.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* clean-up of non-generative model after merging main

* check `is_decoder` for cache

* delete transpose for scores

* remove tuple cache from docs everywhere

* fix tests

* fix copies

* fix copies once more

* properly deprecate `encoder_attention_mask` in Bert-like models

* import `deprecate_kwarg` where needed

* fix copies again

* fix copies

* delete `nex_decoder_cache`

* fix copies asks to update for PLM

* fix copies

* rebasing had a few new models, fix them and merge asap!

* fix copies once more

* fix slow tests

* fix tests and updare PLM checkpoint

* add read token and revert accidentally removed line

* oh com -on, style

* just skip it, read token has no access to PLM yet

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-16 14:00:17 +02:00
6cb43defd0 docs: add missing numpy import to minimal example (#39444)
docs: add numpy import to minimal example
2025-07-16 11:57:13 +00:00
61163099f1 Remove runtime conditions for type checking (#37340)
Remove dynamic conditions for type checking

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 13:36:48 +02:00
bfc9ddf5c6 Add StableAdamW Optimizer (#39446)
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.

* Fixed issue with

* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers

---------

Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
2025-07-16 13:35:53 +02:00
b9ee528246 add test scanner (#39419)
* add test scanner

* add doc + license

* refactor for only 1 tree traversal

* add back test of only one method

* document single method scan

* format

* fixup generate tests

* minor fix

* fixup

* fixup doc
2025-07-16 12:45:46 +02:00
79941c61ce Fix missing definition of diff_file_url in notification service (#39445)
Fix missing definition of diff_file_url
2025-07-16 12:09:18 +02:00
e048d48bd0 Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer (#31870)
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer

* Update src/transformers/optimization.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update optimization.py

fix the error of the unclosed "("

* Update optimization.py

remove whitespace in line 402 in order to pass the quality test

* Update src/transformers/optimization.py

* Update src/transformers/optimization.py

* Apply style fixes

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-07-16 12:01:08 +02:00
0cf08e90dd Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor (#39372)
Change log level from warning to info for scheduled request logging in ContinuousBatchProcessor
2025-07-16 11:54:20 +02:00
ae4e306a40 Defaults to adamw_torch_fused for Pytorch>=2.8 (#37358)
* Defaults to adamw_torch_fused for latest Pytorch

Signed-off-by: cyy <cyyever@outlook.com>

* Fix test

Signed-off-by: cyy <cyyever@outlook.com>

---------

Signed-off-by: cyy <cyyever@outlook.com>
2025-07-16 09:52:33 +00:00
4524a68c66 Fix L270 - hasattr("moe_args") returning False error (#38715)
* Fix L270 - hasattr("moe_args") returning False error

* Update src/transformers/models/llama4/convert_llama4_weights_to_hf.py

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 09:45:58 +00:00
d33a1c389f [chat template] add a testcase for kwargs (#39415)
add a testcase
2025-07-16 11:31:35 +02:00
99c9763398 Fixed a bug calculating cross entropy loss in JetMoeForCausalLM (#37830)
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM

In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-16 11:22:00 +02:00
667ad02374 Remove double soft-max in load-balancing loss. Fixes #39055 . (#39056)
Remove double soft-max in load-balancing loss. Fixes #39055
2025-07-16 09:20:23 +00:00
31d81943c9 [Core] [Offloading] Fix saving offloaded submodules (#39280)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add clarifying comment

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test_save_offloaded_model_with_direct_params

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* fix merge conflict, add decorators

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-16 08:44:40 +00:00
add43c4d09 [autodocstring] add video and audio inputs (#39420)
* add  video and audio inputs in auto docstring

* fix copies
2025-07-16 09:41:50 +02:00
0dc2df5dda CI workflow for performed test regressions (#39198)
* WIP script to compare test runs for models

* Update line normalitzation logic

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-16 04:20:02 +02:00
1bc9ac5107 docs: update LightGlue docs (#39407)
* docs: update LightGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:50 -07:00
d9574f2fe3 docs: update SuperGlue docs (#39406)
* docs: update SuperGlue docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-15 12:40:26 -07:00
9f41f67135 [vlm] fix loading of retrieval VLMs (#39242)
* fix vlm with retrieval

* we can't use AutoModel because new ColQwen was released after refactor

* no need for colqwen

* tied weight keys are necessary, if using IMageTextToText

* need to apply renaming in tied weights, only for ColPali

* overwrite tied keys in ColPali

* fix copies, modular can't handle if-statements
2025-07-15 17:23:54 +02:00
b1d14086e4 handle training summary when creating modelcard but offline mode is set (#37095)
* handle training summary when creating modelcard but offline mode is set

* chore: lint
2025-07-15 17:21:15 +02:00
67f42928f0 Remove residual quantization attribute from dequantized models (#39373)
* fix: removing quantization trace attribute from dequantized model

Fixes #39295

* add: test `to(dtype=torch.float16)` after dequantization
2025-07-15 17:16:10 +02:00
30c508dbcb Remove deprecated audio utils functions (#39330)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-07-15 14:02:25 +00:00
d8e05951b8 Fix bugs in pytorch example run_clm when streaming is enabled (#39286) 2025-07-15 15:37:28 +02:00
a989bf8d84 Fix bugs from pipeline preprocessor overhaul (#39425)
* Correct load classes for VideoClassificationPipeline

* Correct load classes for the ASR pipeline
2025-07-15 14:28:59 +01:00
53c9dcd6fd refactor: remove set_tracer_provider and set_meter_provider calls (#39422) 2025-07-15 14:22:12 +02:00
f03b384149 Fix invalid property (#39384)
Signed-off-by: cyy <cyyever@outlook.com>
2025-07-15 12:11:37 +00:00
c4d41567fa set document_question_answering pipeline _load_tokenizer to True (#39411)
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-15 12:05:49 +00:00
f56b49f48f Ignore extra position embeddings weights for ESM (#39063)
* Ignore extra position embeddings weights

* Slight name fix
2025-07-15 11:57:32 +00:00
2b79f14375 support loading qwen3 gguf (#38645)
* support loading qwen3 gguf

* Add qwen3 into GGUF_TO_FAST_CONVERTERS for tokenizer conversion

* Add testcase

* Fix formatting
2025-07-15 09:53:41 +00:00
0e4b7938d0 Add ModernBERT Decoder Models - ModernBERT, but trained with CLM! (#38967)
* working locally; need to style and test

* added docs and initial tests; need to debug and flesh out

* fixed tests

* working long context; batches

* working fa2 and eager

* update tests

* add missing confnigs

* remove default autoset

* fix spacing

* fix most tests

* fixed tests

* fix to init

* refactor to match new transformers updates

* remove static cache option

* fa2 fix

* fix docs

* in progress

* working on tests

* fixed issue with attn outputs

* remove debug

* fix local config attr

* update doc string

* fix docstring

* add docs to toc

* correct typo in toc

* add new updates from main w.r.t. ModernBERT RoPE

* fix local param

---------

Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
2025-07-15 10:40:41 +02:00
0b724114cf Fix typo in /v1/models output payload (#39414) 2025-07-15 08:59:25 +01:00
8d6259b0b8 [refactor] set attention implementation (#38974)
* update

* fix some tests

* init from config, changes it in-place, add deepcopy in tests

* fix modernbert

* don't delete thsi config attr

* update

* style and copies

* skip tests in generation

* fix style

* accidentally removed flash-attn-3, revert

* docs

* forgot about flags set to False

* fix copies

* address a few comments

* fix copies

* custom code BC
2025-07-15 09:34:06 +02:00
6017f5e8ed [siglip] fix pooling comment (#39378)
* feat(siglip2): add forward pass with pooled output logic in Siglip2TextModel

* test(siglip2): add test_text_model.py to verify pooled output behavior

* style(siglip2): fix formatting in test_text_model.py using Ruff

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* fix(siglip2): remove misleading 'sticky EOS' comment and sync modular-classic files

* chore(siglip2): regenerate classic model after modular change

* Update
2025-07-14 17:47:19 +00:00
8d40ca5749 Update phi4_multimodal.md (#38830)
* Update phi4_multimodal.md

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/phi4_multimodal.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

* Update phi4_multimodal.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-14 10:35:17 -07:00
3635415af2 [Docs] Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic (#39391)
Fix typo in CustomTrainer compute_loss method and adjust loss reduction logic
2025-07-14 09:25:06 -07:00
3a48e9534c Use np.pad instead of np.lib.pad. (#39346)
* Use np.pad instead of np.lib.pad.

* Update audio_utils.py

Formatting
2025-07-14 16:05:28 +00:00
3d8be20cd2 Totally rewrite how pipelines load preprocessors (#38947)
* Totally rewrite how pipelines load preprocessors

* Delete more mappings

* Fix conditionals, thanks Cyril!
2025-07-14 16:40:04 +01:00
903944a411 [examples] fix do_reduce_labels argument for run_semantic_segmentation_no_trainer (#39322)
* no use do_reduce_labels argument in model

* use do_reducer_labels in AutoImageProcessor
2025-07-14 10:16:49 +00:00
8165c703ab Fix Lfm2 and common tests (#39398)
* fix

* better fix

* typo
2025-07-14 12:02:59 +02:00
878d60a3cb Deprecate AutoModelForVision2Seq (#38900)
deprecate vision2seq
2025-07-14 11:42:06 +02:00
ad333d4852 [Qwen2.5-VL] Fix torch.finfo() TypeError for integer attention_mask_tensor (#39333)
* Update modeling_qwen2_5_vl.py

### 🐛 Bug Description

When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42cef21662c304583368ff6645116a45), the model crashes due to a type mismatch in the attention mask handling.

---

### 🔥 Error Traceback

* Fix dtype compatibility in attention mask processing

Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:

Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Update modeling_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* # Fix: Use appropriate function based on dtype

* Update modular_qwen2_5_vl.py

* Fix: Cast to float before applying torch.finfo

* Fix: Use appropriate function based on dtype

* Fix: Use appropriate function based on dtype

* Updatet modeling_glm4v.py

* Only apply conversion for floating point tensors (inverted masks)

* corrected the format issue

reformatted modeling_glm4v.py

All done!  🍰 
1 file reformatted

* Fix: Cast to float before applying torch.finfo

Corrected the format issue

* Fix torch.finfo() for integer attention mask

#39333

* Run make fix-copies and make style for CI compliance

- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC

* Fix torch.finfo() TypeError for

Fix torch.finfo() TypeError for integer attention_mask_tensor #39333

* Fix torch.finfo() TypeError for integer
2025-07-14 07:47:39 +00:00
c30af65521 [BLIP] remove cache from Qformer (#39335)
* remove cache from Qformer

* fix

* this was never correct...
2025-07-14 09:20:01 +02:00
66cd995618 [shieldgemma] fix checkpoint loading (#39348)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-14 08:34:58 +02:00
a1ad9197c5 Fix overriding Fast Image/Video Processors instance attributes affect other instances (#39363)
* fix and add tests

* nit
2025-07-12 23:39:06 +00:00
dc98fb3e5e update docker file to use latest timm (for perception_lm) (#39380)
update docker file for timm

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-12 23:19:37 +02:00
5c30f7e390 Update Model Card for Encoder Decoder Model (#39272)
* update model card.

* add back the model contributors for mamba and mamba2.

* update the model card.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update batches with correct alignment.

* update examples and remove quantization example.

* update the examples.

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* update example.

* correct the example.

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-11 11:23:08 -07:00
0d7efe3e4b fix gpt2 usage doc (#39351)
fix typo of gpt2 doc usage
2025-07-11 10:59:41 -07:00
a646fd55fd Updated CamemBERT model card to new standardized format (#39227)
* Updated CamemBERT model card to new standardized format

* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution

* Updated CamemBERT usage examples, quantization, badges, and format

* Updated CamemBERT badges

* Fixed CLI Section
2025-07-11 10:59:09 -07:00
af74ec65a7 Update Readme to Run Multiple Choice Script from Example Directory (#39323)
* Update Readme to run in current place

* Update Readme files to execute PyTorch examples from their respective folders
2025-07-11 10:58:26 -07:00
70e57e4710 Add mistral common support (#38906)
* wip: correct docstrings

* Add mistral-common support.

* quality

* wip: add requested methods

* wip: fix tests

* wip: add internally some methods not being supported in mistral-common

* wip

* wip: add opencv dependency and update test list

* wip: add mistral-common to testing dependencies

* wip: revert some test changes

* wip: ci

* wip: ci

* clean

* check

* check

* check

* wip: add hf image format to apply_chat_template and return pixel_values

* wip: make mistral-common non-installed safe

* wip: clean zip

* fix: from_pretrained

* fix: path and base64

* fix: path and import root

* wip: add docs

* clean

* clean

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-07-11 16:26:58 +00:00
665418dacc Remove device check in HQQ quantizer (#39299)
* Remove device check in HQQ quantizer

Fix https://github.com/huggingface/transformers/issues/38439

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-11 14:59:51 +00:00
601bea2c4e Verbose error in fix mode for utils/check_docstrings.py (#38915)
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.
2025-07-11 14:36:10 +00:00
24f771a043 fix failing test_sdpa_can_dispatch_on_flash (#39259)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-11 16:30:56 +02:00
ee74397d20 update cb TP (#39361)
* update cb TP

* safety
2025-07-11 15:54:25 +02:00
9bc675b3b6 Fix link for testpypi (#39360)
fix link
2025-07-11 15:34:01 +02:00
bf607f6d3b PerceptionLM (#37878)
* plm template

* A working plm with fixed image features

* hacked processor

* First version that reproduced PLM output using PE from timm.

* Simplify and fix tie_word_embeddings

* Use PIL resize. Simplify converstion.

* First version that works with video input.

* simplifed image preprocessing (not batched)

* Minor fixes after rebasing on main.

* Video processor based on new API.

* Revert to use _preprocess for image processor.

* refactor with modular

* fix tie_word_embedding

* Testing with timm PE

* check in missed converstion from modular to model.py

* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.

* address review comments

* Fixed batching if video and image examples mixed.

* Simplify PE configuration.

* Enable AutoModel for PerceptionEncoder.

* Update PE config style.

* update all headers

* Minor fixes.

* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.

* Fix for testing_modeling_perception_lm.py

* Image processing refactoring to use more common parts.

* Fix processor test.

* update tests to use model from hub

* More test fixes.

* integration test GT update after rebasing; probably due to video preprocessing

* update test media path to hub

* Stop tracking local scripts

* address some review comments

* refactor image processing.

* small fixes

* update documentation and minor fixes

* remove scripts

* Minor fix for CI

* Fix image processing

* CI and doc fix

* CI formatting fix

* ruff fix

* ruff formatting

* ran utils/sort_auto_mappings.py

* update docstring

* more docstring udpates

* add vision_input_type default fallback for image processing

* more verbose variable naming

* test update

* Remove PE and PEConfig use AutoModel(TimmWrapper) instead

* Minor cleanup.

* Minor Fix: remove any ref to PE. Ruff format and check.

* fix docstring

* Fix modular/model consistency.Improvex docstringfor  .

* Fix PerceptionLMForConditionalGenerationModelTest

* ruff fix

* fix for check_repo

* minor formatting

* dummy size arg to fix for processor test.

* Update docstring for PerceptionLMConfig

* Minor fixes from review feedback.

* Revert some minor changes per reviewer feedback.

* update base_model_prefix

* address reviewer feedback

* fix comment in modeling file

* address reviewer feedback

* ruff format

* Pre-merge test update.

* reapply modular and fix checkpoint name

* processor test path

* use modular a bit more

* remove dead code

* add token decorator

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-11 11:07:32 +02:00
4b47b2b8ea Updated Switch Transformers model card with standardized format (Issue #36979) (#39305)
* Updated Switch Transformers model card with standardized format (Issue #36979)

* Apply reviewer suggestions to the new standardised Switch Transformer's model card

* Update switch_transformers.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-10 15:34:10 -07:00
fe1a5b73e6 [modular] speedup check_modular_conversion with multiprocessing (#37456)
* Change topological sort to return level-based output (lists of lists)

* Update main for modular converter

* Update test

* update check_modular_conversion

* Update gitignore

* Fix missing conversion for glm4

* Update

* Fix error msg

* Fixup

* fix docstring

* update docs

* Add comment

* delete qwen3_moe
2025-07-10 19:07:59 +01:00
571a8c2131 Add a default value for position_ids in masking_utils (#39310)
* set default

* Update masking_utils.py

* add small test
2025-07-10 18:53:40 +02:00
bdc8028cb3 [Core] [Offloading] Enable saving offloaded models with multiple shared tensor groups (#39263)
* fix counting meta tensors, fix onloading meta tensors

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove unrelated fix

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* add test

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-07-10 18:33:30 +02:00
df49b399dc [tests] tag serve tests as slow (#39343)
* maybe they need more cpu resources?

* add todo
2025-07-10 15:40:08 +00:00
36e80a18da [modeling][lfm2] LFM2: Remove deprecated seen_tokens (#39342)
* [modeling][lfm2] remove deprecated seen_tokens

* [modular][lfm2] remove deprecated seen_tokens from modular file
2025-07-10 17:27:55 +02:00
9682d07f92 LFM2 (#39340)
* [modeling][lfm2] LFM2 model on 4.53.0 interface

* [configuration] hook in LFM2 keys

* [modeling][lfm2] update modeling interface for 4.53.1

* [modeling][lfm2] apply mask to hidden conv states

* [misc] ruff format/lint

* [modeling][lfm2] minor: NotImplemented legacy cache conversion

* Create lfm2.md

* create nice modular

* style

* Update modeling_auto.py

* clean and start adding tests

* style

* Update test_modeling_lfm2.py

* Update __init__.py

* small test model size

* config

* small fix

* fix

* remove useless config attrs -> block_dim and conv_dim are hiden_size

* fix prepare inputs

* fix config

* test

* typo

* skip tests accordingly

* config docstrings

* add doc to .md

* skip config docstring check

---------

Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-10 16:07:33 +02:00
38c3931362 [server] add tests and fix passing a custom generation_config (#39230)
* add tests; fix passing a custom generation_config

* tool integration test

* add install step

* add accelerate as dep to serving

* add todo
2025-07-10 13:41:38 +00:00
6b09c8eab0 Handle DAC conversion when using weight_norm with newer PyTorch versions (#36393)
* Update convert_dac_checkpoint.py

* Update convert_dac_checkpoint.py

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-07-10 10:36:58 +00:00
92043bde29 fix phi3 tests (#39312)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-10 11:51:55 +02:00
520b9dcb42 fix Glm4v batch videos forward (#39172)
* changes for video

* update modular

* change get_video_features

* update video token replacement

* update modular

* add test and fix typo

* lint

* fix order

* lint

* fix

* remove dependency

* lint

* lint

* remove todo

* resize video for test

* lint..

* fix test

* new a processor for video_test

* fix test
2025-07-10 10:44:28 +02:00
bc161d5d06 Delete deprecated stuff (#38838)
* delete deprecated stuff

* fix copies

* remove unused tests

* fix modernbert and fuyu

* Update src/transformers/cache_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* bye bye `seen_tokens`

* address comments

* update typings

* ecnoder decoder models follow same pattern as whisper

* fix copies

* why is it set to False?

* fix switch transformers

* fix encoder decoder models shared weight

* fix copies and RAG

* remove `next_cache`

* fix gptj/git

* fix copies

* fix copies

* style...

* another forgotten docsrting

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-07-10 05:18:44 +00:00
c6ee0b1da8 Fix broken SAM after #39120 (#39289)
fix
2025-07-09 17:46:22 -04:00
aff7df8436 enable static cache on TP model (#39164)
* enable static cache on TP model

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* check tp size before init kv cache

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix docstring

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* add tp tests

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix comment

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* fix other cache head size

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-07-09 21:14:45 +00:00
2ef59646b8 Fix max_length_q and max_length_k types to flash_attn_varlen_func (#37206)
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-07-09 23:12:39 +02:00
2d600a4363 Granite speech speedups (#39197)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum

* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.

* support audio.shape=(1,L)
2025-07-09 23:09:50 +02:00
5111c8ea2f Fix typo: langauge -> language (#39317) 2025-07-09 12:06:46 -07:00
2781ad092d docs: update LLaVA-NeXT model card (#38894)
* docs: update LLaVA-NeXT model card

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/llava_next.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [docs] Updated llava_next model card

* Update docs/source/en/model_doc/llava_next.md remove image sources

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [fix] Change Flash Attention to SDPA badge

* [doc] fixed quantization example

* docs: updated contribution details and badges

* Update llava_next.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 11:32:40 -07:00
16dd7f48d0 skip files in src/ for doctest (for now) (#39316)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:36:48 +02:00
d61c0d087c Updated the Model docs - for the MARIAN model (#39138)
* Update marian.md

This update improves the Marian model card to follow the Hugging Face standardized model card format. The changes include:

- Added a clear description of MarianMT, its architecture, and how it differs from other models.
- Provided usage examples for Pipeline and AutoModel.
- Added a quantization example for optimizing model inference.
- Included instructions and examples for multilingual translation with language codes.
- Added an Attention Mask Visualizer example.
- Added a Resources section with relevant links to papers, the Marian framework, language codes, tokenizer guides, and quantization documentation.
- Fixed formatting issues in the code blocks for correct rendering.

This update improves the readability, usability, and consistency of the Marian model documentation for users.

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

* Update docs/source/en/model_doc/marian.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update marian.md

* Update marian.md

* Update marian.md

* Update marian.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-07-09 10:23:03 -07:00
161cf3415e add stevhliu to the list in self-comment-ci.yml (#39315)
add

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 19:07:44 +02:00
3be10c6d19 Fix consistency and a few docstrings warnings (#39314)
* Update modeling_deepseek_v2.py

* fix docstrings

* fix

* fix
2025-07-09 18:40:37 +02:00
4652677c89 🌐 [i18n-KO] Translated quark.md to Korean (#39268)
* initial translation

* removed english parts

* maintain consistency

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/quantization/quark.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* add toctree

* fixed indentation

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-07-09 09:29:51 -07:00
c980904204 Add DeepSeek V2 Model into Transformers (#36400)
* add initial structure

* doc fixes, add model base logic

* update init files

* some fixes to config and modular

* some improvements for attention

* format

* remove unused attn

* some fixes for moe layer and for decoder

* adapt _compute_yarn_parameters for deepseek

* format

* small fix

* fix for decoder forward

* add tests, small refactoring

* fix dummies

* fix init

* fix doc

* fix config docs

* add sequce doc, fix init for gate

* fix issues in tests

* fix config doc

* remove unused args

* some fixes and refactoring after review

* fix doc for config

* small fixes for config args

* revert config refactoring

* small refactoring

* minor fixes after rebase

* small fix after merge

* fix modular

* remove rotaryembd from public init

* small test fix

* some rotary pos calculation improvement

* fix format

* some improvements and fixes

* fix config

* some refactoring

* adjust some unit tests

* skip test

* small fixes and tests adjustment

* reapply modular

* fix all tests except Integration

* fix integration testzs

* cleanup BC stuff

* rope

* fix integrations tests based on a10

* style

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
accbd8e0fe [sliding window] revert and deprecate (#39301)
* bring back and deprecate

* oops

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-09 16:10:38 +02:00
1cefb5d788 [modular] Allow method with the same name in case of @property decorator (#39308)
* fix

* add example

* fix

* Update modular_model_converter.py
2025-07-09 15:46:53 +02:00
4798c05c64 skip test_torchscript_* for now until the majority of the community ask for it (#39307)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-07-09 15:35:48 +02:00
3796 changed files with 253809 additions and 327220 deletions

View File

@ -16,10 +16,9 @@
import argparse
import copy
import os
import random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
import glob
from typing import Any, Optional
import yaml
@ -82,15 +81,15 @@ class EmptyJob:
@dataclass
class CircleCIJob:
name: str
additional_env: Dict[str, Any] = None
docker_image: List[Dict[str, str]] = None
install_steps: List[str] = None
additional_env: dict[str, Any] = None
docker_image: list[dict[str, str]] = None
install_steps: list[str] = None
marker: Optional[str] = None
parallelism: Optional[int] = 0
pytest_num_workers: int = 8
pytest_options: Dict[str, Any] = None
pytest_options: dict[str, Any] = None
resource_class: Optional[str] = "xlarge"
tests_to_run: Optional[List[str]] = None
tests_to_run: Optional[list[str]] = None
num_test_files_per_worker: Optional[int] = 10
# This should be only used for doctest job!
command_timeout: Optional[int] = None
@ -109,7 +108,9 @@ class CircleCIJob:
self.docker_image[0]["image"] = f"{self.docker_image[0]['image']}:dev"
print(f"Using {self.docker_image} docker image")
if self.install_steps is None:
self.install_steps = ["uv venv && uv pip install ."]
self.install_steps = ["uv pip install ."]
# Use a custom patched pytest to force exit the process at the end, to avoid `Too long with no output (exceeded 10m0s): context deadline exceeded`
self.install_steps.append("uv pip install git+https://github.com/ydshieh/pytest.git@8.4.1-ydshieh")
if self.pytest_options is None:
self.pytest_options = {}
if isinstance(self.tests_to_run, str):
@ -128,6 +129,12 @@ class CircleCIJob:
def to_dict(self):
env = COMMON_ENV_VARIABLES.copy()
if self.job_name != "tests_hub":
# fmt: off
# not critical
env.update({"HF_TOKEN": "".join(["h", "f", "_", "H", "o", "d", "V", "u", "M", "q", "b", "R", "m", "t", "b", "z", "F", "Q", "O", "Q", "A", "J", "G", "D", "l", "V", "Q", "r", "R", "N", "w", "D", "M", "V", "C", "s", "d"])})
# fmt: on
# Do not run tests decorated by @is_flaky on pull requests
env['RUN_FLAKY'] = os.environ.get("CIRCLE_PULL_REQUEST", "") == ""
env.update(self.additional_env)
@ -147,7 +154,7 @@ class CircleCIJob:
# Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues
timeout_cmd = f"timeout {self.command_timeout} " if self.command_timeout else ""
marker_cmd = f"-m '{self.marker}'" if self.marker is not None else ""
junit_flags = f" -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
junit_flags = " -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
joined_flaky_patterns = "|".join(FLAKY_TEST_FAILURE_PATTERNS)
repeat_on_failure_flags = f"--reruns 5 --reruns-delay 2 --only-rerun '({joined_flaky_patterns})'"
parallel = f' << pipeline.parameters.{self.job_name}_parallelism >> '
@ -175,14 +182,32 @@ class CircleCIJob:
"command": f"TESTS=$(circleci tests split --split-by=timings {self.job_name}_test_list.txt) && echo $TESTS > splitted_tests.txt && echo $TESTS | tr ' ' '\n'" if self.parallelism else f"awk '{{printf \"%s \", $0}}' {self.job_name}_test_list.txt > splitted_tests.txt"
}
},
{"run": {"name": "fetch hub objects before pytest", "command": "python3 utils/fetch_hub_objects_for_ci.py"}},
# During the CircleCI docker images build time, we might already (or not) download the data.
# If it's done already, the files are inside the directory `/test_data/`.
{"run": {"name": "fetch hub objects before pytest", "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py"}},
{"run": {
"name": "Run tests",
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"run":
{
"name": "Check for test crashes",
"when": "always",
"command": """if [ ! -f tests_output.txt ]; then
echo "ERROR: tests_output.txt does not exist - tests may not have run properly"
exit 1
elif grep -q "crashed and worker restarting disabled" tests_output.txt; then
echo "ERROR: Worker crash detected in test output"
echo "Found: crashed and worker restarting disabled"
exit 1
else
echo "Tests output file exists and no worker crashes detected"
fi"""
},
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"store_test_results": {"path": "test-results"}},
{"store_artifacts": {"path": "test-results/junit.xml"}},
{"store_artifacts": {"path": "reports"}},
@ -213,7 +238,7 @@ generate_job = CircleCIJob(
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv venv && uv pip install ."],
install_steps=["uv pip install ."],
marker="generate",
parallelism=6,
)
@ -244,13 +269,12 @@ custom_tokenizers_job = CircleCIJob(
docker_image=[{"image": "huggingface/transformers-custom-tokenizers"}],
)
examples_torch_job = CircleCIJob(
"examples_torch",
additional_env={"OMP_NUM_THREADS": 8},
docker_image=[{"image":"huggingface/transformers-examples-torch"}],
# TODO @ArthurZucker remove this once docker is easier to build
install_steps=["uv venv && uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
install_steps=["uv pip install . && uv pip install -r examples/pytorch/_tests_requirements.txt"],
pytest_num_workers=4,
)
@ -259,7 +283,7 @@ hub_job = CircleCIJob(
additional_env={"HUGGINGFACE_CO_STAGING": True},
docker_image=[{"image":"huggingface/transformers-torch-light"}],
install_steps=[
'uv venv && uv pip install .',
'uv pip install .',
'git config --global user.email "ci@dummy.com"',
'git config --global user.name "ci"',
],
@ -268,20 +292,6 @@ hub_job = CircleCIJob(
resource_class="medium",
)
onnx_job = CircleCIJob(
"onnx",
docker_image=[{"image":"huggingface/transformers-torch-tf-light"}],
install_steps=[
"uv venv",
"uv pip install .[testing,sentencepiece,onnxruntime,vision,rjieba]",
],
pytest_options={"k onnx": None},
pytest_num_workers=1,
resource_class="small",
)
exotic_models_job = CircleCIJob(
"exotic_models",
docker_image=[{"image":"huggingface/transformers-exotic-models"}],
@ -289,7 +299,6 @@ exotic_models_job = CircleCIJob(
pytest_options={"durations": 100},
)
repo_utils_job = CircleCIJob(
"repo_utils",
docker_image=[{"image":"huggingface/transformers-consistency"}],
@ -297,13 +306,12 @@ repo_utils_job = CircleCIJob(
resource_class="large",
)
non_model_job = CircleCIJob(
"non_model",
docker_image=[{"image": "huggingface/transformers-torch-light"}],
# networkx==3.3 (after #36957) cause some issues
# TODO: remove this once it works directly
install_steps=["uv venv && uv pip install ."],
install_steps=["uv pip install .[serving]"],
marker="not generate",
parallelism=6,
)
@ -321,7 +329,7 @@ doc_test_job = CircleCIJob(
additional_env={"TRANSFORMERS_VERBOSITY": "error", "DATASETS_VERBOSITY": "error", "SKIP_CUDA_DOCTEST": "1"},
install_steps=[
# Add an empty file to keep the test step running correctly even no file is selected to be tested.
"uv venv && pip install .",
"uv pip install .",
"touch dummy.py",
command,
"cat pr_documentation_tests_temp.txt",
@ -333,7 +341,7 @@ doc_test_job = CircleCIJob(
pytest_num_workers=1,
)
REGULAR_TESTS = [torch_job, hub_job, onnx_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
REGULAR_TESTS = [torch_job, hub_job, tokenization_job, processor_job, generate_job, non_model_job] # fmt: skip
EXAMPLES_TESTS = [examples_torch_job]
PIPELINE_TESTS = [pipelines_torch_job]
REPO_UTIL_TESTS = [repo_utils_job]

View File

@ -1,5 +1,6 @@
import re
import argparse
import re
def parse_pytest_output(file_path):
skipped_tests = {}

View File

@ -36,19 +36,23 @@ body:
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @gante and @Rocketknight1
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
@ -56,6 +60,7 @@ body:
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
Devices/Backends:
@ -69,19 +74,6 @@ body:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @Rocketknight1
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
placeholder: "@Username ..."

View File

@ -39,20 +39,23 @@ members/contributors who may be interested in your PR.
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @gante and @Rocketknight1
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @zach-huggingface, @SunMarc and @qgallouedec
- chat templates: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
@ -60,20 +63,16 @@ Integrations:
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
Devices/Backends:
- AMD ROCm: @ivarflakstad
- Intel XPU: @IlyasMoutawwakil
- Ascend NPU: @ivarflakstad
Documentation: @stevhliu
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @Rocketknight1
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
-->

39
.github/copilot-instructions.md vendored Normal file
View File

@ -0,0 +1,39 @@
# copilot-instructions.md Guide for Hugging Face Transformers
This copilot-instructions.md file provides guidance for code agents working with this codebase.
## Core Project Structure
- `/src/transformers`: This contains the core source code for the library
- `/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
- `/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
- `/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
- `/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -13,14 +13,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import github
import json
from github import Github
import os
import re
from collections import Counter
from pathlib import Path
import github
from github import Github
def pattern_to_regex(pattern):
if pattern.startswith("/"):
start_anchor = True

View File

@ -7,8 +7,8 @@ docs/ @stevhliu
/docker/ @ydshieh @ArthurZucker
# More high-level globs catch cases when specific rules later don't apply
/src/transformers/models/*/processing* @molbap @yonigozlan @qubvel
/src/transformers/models/*/image_processing* @qubvel
/src/transformers/models/*/processing* @molbap @yonigozlan
/src/transformers/models/*/image_processing* @yonigozlan
/src/transformers/models/*/image_processing_*_fast* @yonigozlan
# Owners of subsections of the library
@ -186,65 +186,65 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/zamba/mod*_zamba* @ArthurZucker
# Vision models
/src/transformers/models/beit/mod*_beit* @amyeroberts @qubvel
/src/transformers/models/bit/mod*_bit* @amyeroberts @qubvel
/src/transformers/models/conditional_detr/mod*_conditional_detr* @amyeroberts @qubvel
/src/transformers/models/convnext/mod*_convnext* @amyeroberts @qubvel
/src/transformers/models/convnextv2/mod*_convnextv2* @amyeroberts @qubvel
/src/transformers/models/cvt/mod*_cvt* @amyeroberts @qubvel
/src/transformers/models/deformable_detr/mod*_deformable_detr* @amyeroberts @qubvel
/src/transformers/models/deit/mod*_deit* @amyeroberts @qubvel
/src/transformers/models/depth_anything/mod*_depth_anything* @amyeroberts @qubvel
/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @amyeroberts @qubvel
/src/transformers/models/deta/mod*_deta* @amyeroberts @qubvel
/src/transformers/models/detr/mod*_detr* @amyeroberts @qubvel
/src/transformers/models/dinat/mod*_dinat* @amyeroberts @qubvel
/src/transformers/models/dinov2/mod*_dinov2* @amyeroberts @qubvel
/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @amyeroberts @qubvel
/src/transformers/models/dit/mod*_dit* @amyeroberts @qubvel
/src/transformers/models/dpt/mod*_dpt* @amyeroberts @qubvel
/src/transformers/models/efficientformer/mod*_efficientformer* @amyeroberts @qubvel
/src/transformers/models/efficientnet/mod*_efficientnet* @amyeroberts @qubvel
/src/transformers/models/focalnet/mod*_focalnet* @amyeroberts @qubvel
/src/transformers/models/glpn/mod*_glpn* @amyeroberts @qubvel
/src/transformers/models/hiera/mod*_hiera* @amyeroberts @qubvel
/src/transformers/models/ijepa/mod*_ijepa* @amyeroberts @qubvel
/src/transformers/models/imagegpt/mod*_imagegpt* @amyeroberts @qubvel
/src/transformers/models/levit/mod*_levit* @amyeroberts @qubvel
/src/transformers/models/mask2former/mod*_mask2former* @amyeroberts @qubvel
/src/transformers/models/maskformer/mod*_maskformer* @amyeroberts @qubvel
/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @amyeroberts @qubvel
/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @amyeroberts @qubvel
/src/transformers/models/mobilevit/mod*_mobilevit* @amyeroberts @qubvel
/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @amyeroberts @qubvel
/src/transformers/models/nat/mod*_nat* @amyeroberts @qubvel
/src/transformers/models/poolformer/mod*_poolformer* @amyeroberts @qubvel
/src/transformers/models/pvt/mod*_pvt* @amyeroberts @qubvel
/src/transformers/models/pvt_v2/mod*_pvt_v2* @amyeroberts @qubvel
/src/transformers/models/regnet/mod*_regnet* @amyeroberts @qubvel
/src/transformers/models/resnet/mod*_resnet* @amyeroberts @qubvel
/src/transformers/models/rt_detr/mod*_rt_detr* @amyeroberts @qubvel
/src/transformers/models/segformer/mod*_segformer* @amyeroberts @qubvel
/src/transformers/models/seggpt/mod*_seggpt* @amyeroberts @qubvel
/src/transformers/models/superpoint/mod*_superpoint* @amyeroberts @qubvel
/src/transformers/models/swiftformer/mod*_swiftformer* @amyeroberts @qubvel
/src/transformers/models/swin/mod*_swin* @amyeroberts @qubvel
/src/transformers/models/swinv2/mod*_swinv2* @amyeroberts @qubvel
/src/transformers/models/swin2sr/mod*_swin2sr* @amyeroberts @qubvel
/src/transformers/models/table_transformer/mod*_table_transformer* @amyeroberts @qubvel
/src/transformers/models/textnet/mod*_textnet* @amyeroberts @qubvel
/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @amyeroberts @qubvel
/src/transformers/models/upernet/mod*_upernet* @amyeroberts @qubvel
/src/transformers/models/van/mod*_van* @amyeroberts @qubvel
/src/transformers/models/vit/mod*_vit* @amyeroberts @qubvel
/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @amyeroberts @qubvel
/src/transformers/models/vitdet/mod*_vitdet* @amyeroberts @qubvel
/src/transformers/models/vit_mae/mod*_vit_mae* @amyeroberts @qubvel
/src/transformers/models/vitmatte/mod*_vitmatte* @amyeroberts @qubvel
/src/transformers/models/vit_msn/mod*_vit_msn* @amyeroberts @qubvel
/src/transformers/models/vitpose/mod*_vitpose* @amyeroberts @qubvel
/src/transformers/models/yolos/mod*_yolos* @amyeroberts @qubvel
/src/transformers/models/zoedepth/mod*_zoedepth* @amyeroberts @qubvel
/src/transformers/models/beit/mod*_beit* @yonigozlan @molbap
/src/transformers/models/bit/mod*_bit* @yonigozlan @molbap
/src/transformers/models/conditional_detr/mod*_conditional_detr* @yonigozlan @molbap
/src/transformers/models/convnext/mod*_convnext* @yonigozlan @molbap
/src/transformers/models/convnextv2/mod*_convnextv2* @yonigozlan @molbap
/src/transformers/models/cvt/mod*_cvt* @yonigozlan @molbap
/src/transformers/models/deformable_detr/mod*_deformable_detr* @yonigozlan @molbap
/src/transformers/models/deit/mod*_deit* @yonigozlan @molbap
/src/transformers/models/depth_anything/mod*_depth_anything* @yonigozlan @molbap
/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @yonigozlan @molbap
/src/transformers/models/deta/mod*_deta* @yonigozlan @molbap
/src/transformers/models/detr/mod*_detr* @yonigozlan @molbap
/src/transformers/models/dinat/mod*_dinat* @yonigozlan @molbap
/src/transformers/models/dinov2/mod*_dinov2* @yonigozlan @molbap
/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @yonigozlan @molbap
/src/transformers/models/dit/mod*_dit* @yonigozlan @molbap
/src/transformers/models/dpt/mod*_dpt* @yonigozlan @molbap
/src/transformers/models/efficientformer/mod*_efficientformer* @yonigozlan @molbap
/src/transformers/models/efficientnet/mod*_efficientnet* @yonigozlan @molbap
/src/transformers/models/focalnet/mod*_focalnet* @yonigozlan @molbap
/src/transformers/models/glpn/mod*_glpn* @yonigozlan @molbap
/src/transformers/models/hiera/mod*_hiera* @yonigozlan @molbap
/src/transformers/models/ijepa/mod*_ijepa* @yonigozlan @molbap
/src/transformers/models/imagegpt/mod*_imagegpt* @yonigozlan @molbap
/src/transformers/models/levit/mod*_levit* @yonigozlan @molbap
/src/transformers/models/mask2former/mod*_mask2former* @yonigozlan @molbap
/src/transformers/models/maskformer/mod*_maskformer* @yonigozlan @molbap
/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @yonigozlan @molbap
/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @yonigozlan @molbap
/src/transformers/models/mobilevit/mod*_mobilevit* @yonigozlan @molbap
/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @yonigozlan @molbap
/src/transformers/models/nat/mod*_nat* @yonigozlan @molbap
/src/transformers/models/poolformer/mod*_poolformer* @yonigozlan @molbap
/src/transformers/models/pvt/mod*_pvt* @yonigozlan @molbap
/src/transformers/models/pvt_v2/mod*_pvt_v2* @yonigozlan @molbap
/src/transformers/models/regnet/mod*_regnet* @yonigozlan @molbap
/src/transformers/models/resnet/mod*_resnet* @yonigozlan @molbap
/src/transformers/models/rt_detr/mod*_rt_detr* @yonigozlan @molbap
/src/transformers/models/segformer/mod*_segformer* @yonigozlan @molbap
/src/transformers/models/seggpt/mod*_seggpt* @yonigozlan @molbap
/src/transformers/models/superpoint/mod*_superpoint* @yonigozlan @molbap
/src/transformers/models/swiftformer/mod*_swiftformer* @yonigozlan @molbap
/src/transformers/models/swin/mod*_swin* @yonigozlan @molbap
/src/transformers/models/swinv2/mod*_swinv2* @yonigozlan @molbap
/src/transformers/models/swin2sr/mod*_swin2sr* @yonigozlan @molbap
/src/transformers/models/table_transformer/mod*_table_transformer* @yonigozlan @molbap
/src/transformers/models/textnet/mod*_textnet* @yonigozlan @molbap
/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @yonigozlan @molbap
/src/transformers/models/upernet/mod*_upernet* @yonigozlan @molbap
/src/transformers/models/van/mod*_van* @yonigozlan @molbap
/src/transformers/models/vit/mod*_vit* @yonigozlan @molbap
/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @yonigozlan @molbap
/src/transformers/models/vitdet/mod*_vitdet* @yonigozlan @molbap
/src/transformers/models/vit_mae/mod*_vit_mae* @yonigozlan @molbap
/src/transformers/models/vitmatte/mod*_vitmatte* @yonigozlan @molbap
/src/transformers/models/vit_msn/mod*_vit_msn* @yonigozlan @molbap
/src/transformers/models/vitpose/mod*_vitpose* @yonigozlan @molbap
/src/transformers/models/yolos/mod*_yolos* @yonigozlan @molbap
/src/transformers/models/zoedepth/mod*_zoedepth* @yonigozlan @molbap
# Audio models
/src/transformers/models/audio_spectrogram_transformer/mod*_audio_spectrogram_transformer* @eustlb
@ -304,7 +304,7 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/donut/mod*_donut* @zucchini-nlp
/src/transformers/models/flava/mod*_flava* @zucchini-nlp
/src/transformers/models/git/mod*_git* @zucchini-nlp
/src/transformers/models/grounding_dino/mod*_grounding_dino* @qubvel
/src/transformers/models/grounding_dino/mod*_grounding_dino* @yonigozlan
/src/transformers/models/groupvit/mod*_groupvit* @zucchini-nlp
/src/transformers/models/idefics/mod*_idefics* @zucchini-nlp
/src/transformers/models/idefics2/mod*_idefics2* @zucchini-nlp
@ -326,10 +326,10 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/mgp_str/mod*_mgp_str* @zucchini-nlp
/src/transformers/models/mllama/mod*_mllama* @zucchini-nlp
/src/transformers/models/nougat/mod*_nougat* @NielsRogge
/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @qubvel @yonigozlan
/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @yonigozlan
/src/transformers/models/oneformer/mod*_oneformer* @zucchini-nlp
/src/transformers/models/owlvit/mod*_owlvit* @qubvel
/src/transformers/models/owlv2/mod*_owlv2* @qubvel
/src/transformers/models/owlvit/mod*_owlvit* @yonigozlan
/src/transformers/models/owlv2/mod*_owlv2* @yonigozlan
/src/transformers/models/paligemma/mod*_paligemma* @zucchini-nlp @molbap
/src/transformers/models/perceiver/mod*_perceiver* @zucchini-nlp
/src/transformers/models/pix2struct/mod*_pix2struct* @zucchini-nlp

View File

@ -48,7 +48,7 @@ jobs:
- name: Run database init script
run: |
psql -f benchmark/init_db.sql
psql -f benchmark/utils/init_db.sql
env:
PGDATABASE: metrics
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}

85
.github/workflows/benchmark_v2.yml vendored Normal file
View File

@ -0,0 +1,85 @@
name: Benchmark v2 Framework
on:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
container_image:
description: 'Docker image to use'
required: true
type: string
container_options:
description: 'Container options to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
benchmark-v2:
name: Benchmark v2
runs-on: ${{ inputs.runner }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
(github.event_name == 'schedule')
container:
image: ${{ inputs.container_image }}
options: ${{ inputs.container_options }}
steps:
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install benchmark dependencies
run: |
python3 -m pip install -r benchmark_v2/requirements.txt
- name: Reinstall transformers in edit mode
run: |
python3 -m pip uninstall -y transformers
python3 -m pip install -e ".[torch]"
- name: Show installed libraries and their versions
run: |
python3 -m pip list
python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
nvidia-smi || true
- name: Run benchmark v2
working-directory: benchmark_v2
run: |
echo "Running benchmarks"
python3 run_benchmarks.py \
--commit-id '${{ inputs.commit_sha || github.sha }}' \
--run-id '${{ inputs.run_id }}' \
--push-to-hub '${{ inputs.benchmark_repo_id}}' \
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -0,0 +1,21 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
container_image: huggingface/transformers-pytorch-gpu
container_options: --gpus all --privileged --ipc host --shm-size "16gb"
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -0,0 +1,21 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: amd-mi325-ci-1gpu
container_image: huggingface/transformers-pytorch-amd-gpu
container_options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -26,7 +26,7 @@ jobs:
strategy:
matrix:
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "tf-light", "exotic-models", "torch-tf-light", "jax-light", "examples-torch", "examples-tf"]
file: ["quality", "consistency", "custom-tokenizers", "torch-light", "exotic-models", "examples-torch"]
continue-on-error: true
steps:

View File

@ -2,6 +2,10 @@ name: Build docker images (Nightly CI)
on:
workflow_call:
inputs:
job:
required: true
type: string
push:
branches:
- build_nightly_ci_docker_image*
@ -12,7 +16,8 @@ concurrency:
jobs:
latest-with-torch-nightly-docker:
name: "Nightly PyTorch + Stable TensorFlow"
name: "Nightly PyTorch"
if: inputs.job == 'latest-with-torch-nightly-docker' || inputs.job == ''
runs-on:
group: aws-general-8-plus
steps:
@ -41,6 +46,7 @@ jobs:
nightly-torch-deepspeed-docker:
name: "Nightly PyTorch + DeepSpeed"
if: inputs.job == 'nightly-torch-deepspeed-docker' || inputs.job == ''
runs-on:
group: aws-g4dn-2xlarge-cache
steps:

View File

@ -16,7 +16,7 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de en es fr hi it ko pt tr zh ja te
languages: ar de en es fr hi it ja ko pt zh
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}

View File

@ -21,6 +21,9 @@ on:
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
@ -87,7 +90,7 @@ jobs:
- name: Update clone
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Get target commit
working-directory: /transformers/utils

43
.github/workflows/collated-reports.yml vendored Normal file
View File

@ -0,0 +1,43 @@
name: CI collated reports
on:
workflow_call:
inputs:
job:
required: true
type: string
report_repo_id:
required: true
type: string
machine_type:
required: true
type: string
gpu_name:
description: Name of the GPU used for the job. Its enough that the value contains the name of the GPU, e.g. "noise-h100-more-noise". Case insensitive.
required: true
type: string
jobs:
collated_reports:
name: Collated reports
runs-on: ubuntu-22.04
if: always()
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
- name: Collated reports
shell: bash
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_SHA: ${{ github.sha }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
run: |
pip install huggingface_hub
python3 utils/collated_reports.py \
--path . \
--machine-type ${{ inputs.machine_type }} \
--commit-hash ${{ env.CI_SHA }} \
--job ${{ inputs.job }} \
--report-repo-id ${{ inputs.report_repo_id }} \
--gpu-name ${{ inputs.gpu_name }}

View File

@ -31,7 +31,7 @@ jobs:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers

View File

@ -18,7 +18,7 @@ jobs:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
job_splits: ${{ steps.set-matrix.outputs.job_splits }}
split_keys: ${{ steps.set-matrix.outputs.split_keys }}

View File

@ -12,16 +12,22 @@ on:
slice_id:
required: true
type: number
runner_map:
required: false
type: string
docker:
required: true
type: string
commit_sha:
required: false
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
runner_type:
required: false
type: string
report_repo_id:
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -45,10 +51,12 @@ jobs:
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: ${{ fromJson(inputs.runner_map)[matrix.folders][inputs.machine_type] }}
group: '${{ inputs.machine_type }}'
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
machine_type: ${{ steps.set_machine_type.outputs.machine_type }}
steps:
- name: Echo input and matrix info
shell: bash
@ -70,7 +78,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -102,6 +110,7 @@ jobs:
run: pip freeze
- name: Set `machine_type` for report and artifact names
id: set_machine_type
working-directory: /transformers
shell: bash
run: |
@ -117,26 +126,58 @@ jobs:
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
echo "machine_type=$machine_type" >> $GITHUB_OUTPUT
- name: Create report directory if it doesn't exist
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
exit ${EXIT_CODE:-1}
- name: Failure short reports
if: ${{ failure() }}
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
run: |
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
collated_reports:
name: Collated Reports
if: ${{ always() }}
needs: run_models_gpu
uses: huggingface/transformers/.github/workflows/collated-reports.yml@main
with:
job: run_models_gpu
report_repo_id: ${{ inputs.report_repo_id }}
gpu_name: ${{ inputs.runner_type }}
machine_type: ${{ needs.run_models_gpu.outputs.machine_type }}
secrets: inherit

View File

@ -0,0 +1,134 @@
name: PR - build doc via comment
on:
issue_comment:
types:
- created
branches-ignore:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}-${{ startsWith(github.event.comment.body, 'build-doc') }}
cancel-in-progress: true
permissions: {}
jobs:
get-pr-number:
name: Get PR number
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
verity_pr_commit:
name: Verity PR commit corresponds to a specific event by comparing timestamps
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
runs-on: ubuntu-22.04
needs: get-pr-info
env:
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
steps:
- run: |
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
create_run:
name: Create run
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
permissions:
statuses: write
runs-on: ubuntu-22.04
steps:
- name: Create Run
id: create_run
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Custom doc building job" -f "context=custom-doc-build"
reply_to_comment:
name: Reply to the comment
if: ${{ needs.create_run.result == 'success' }}
needs: [get-pr-number, create_run]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=[Building docs for all languages...](${{ env.GITHUB_RUN_URL }})"
build-doc:
name: Build doc
needs: [get-pr-number, get-pr-info]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != '' }}
uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
with:
commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
package: transformers
languages: ar de en es fr hi it ko pt tr zh ja te
update_run_status:
name: Update Check Run Status
needs: [ get-pr-info, create_run, build-doc ]
permissions:
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.create_run.result) }}
steps:
- name: Get `build-doc` job status
run: |
echo "${{ needs.build-doc.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
- name: Update PR commit statuses
run: |
echo "${{ needs.build-doc.result }}"
echo "${{ env.STATUS }}"
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-pr-info.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Custom doc building job" -f "context=custom-doc-build"

View File

@ -16,28 +16,6 @@ jobs:
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
# We only need to verify the timestamp if the workflow is triggered by `issue_comment`.
verity_pr_commit:
name: Verity PR commit corresponds to a specific event by comparing timestamps
if: ${{ github.event.comment.created_at != '' }}
runs-on: ubuntu-22.04
needs: get-pr-info
env:
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_DATE: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_DATE }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
steps:
- run: |
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "PR_MERGE_COMMIT_DATE: $PR_MERGE_COMMIT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
if [ $COMMENT_TIMESTAMP -le $PR_MERGE_COMMIT_TIMESTAMP ]; then
echo "Last commit on the pull request is newer than the issue comment triggering this run! Abort!";
exit -1;
fi
get-jobs:
name: Get test files to run
runs-on: ubuntu-22.04

View File

@ -4,17 +4,6 @@ on:
push:
branches: [ main ]
env:
OUTPUT_SLACK_CHANNEL_ID: "C06L2SGMEEA"
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:
get_modified_models:
name: "Get all modified files"
@ -25,111 +14,144 @@ jobs:
- name: Check out code
uses: actions/checkout@v4
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c
- name: Get changed files using `actions/github-script`
id: get-changed-files
uses: actions/github-script@v7
with:
files: src/transformers/models/**
script: |
let files = [];
- name: Run step if only the files listed above change
if: steps.changed-files.outputs.any_changed == 'true'
id: set-matrix
// Only handle push events
if (context.eventName === 'push') {
const afterSha = context.payload.after;
const branchName = context.payload.ref.replace('refs/heads/', '');
let baseSha;
if (branchName === 'main') {
console.log('Push to main branch, comparing to parent commit');
// Get the parent commit of the pushed commit
const { data: commit } = await github.rest.repos.getCommit({
owner: context.repo.owner,
repo: context.repo.repo,
ref: afterSha
});
baseSha = commit.parents[0]?.sha;
if (!baseSha) {
throw new Error('No parent commit found for the pushed commit');
}
} else {
console.log(`Push to branch ${branchName}, comparing to main`);
baseSha = 'main';
}
const { data: comparison } = await github.rest.repos.compareCommits({
owner: context.repo.owner,
repo: context.repo.repo,
base: baseSha,
head: afterSha
});
// Include added, modified, and renamed files
files = comparison.files
.filter(file => file.status === 'added' || file.status === 'modified' || file.status === 'renamed')
.map(file => file.filename);
}
// Include all files under src/transformers/ (not just models subdirectory)
const filteredFiles = files.filter(file =>
file.startsWith('src/transformers/')
);
core.setOutput('changed_files', filteredFiles.join(' '));
core.setOutput('any_changed', filteredFiles.length > 0 ? 'true' : 'false');
- name: Parse changed files with Python
if: steps.get-changed-files.outputs.any_changed == 'true'
env:
ALL_CHANGED_FILES: ${{ steps.changed-files.outputs.all_changed_files }}
CHANGED_FILES: ${{ steps.get-changed-files.outputs.changed_files }}
id: set-matrix
run: |
model_arrays=()
for file in $ALL_CHANGED_FILES; do
model_path="${file#*models/}"
model_path="models/${model_path%%/*}"
if grep -qFx "$model_path" utils/important_models.txt; then
# Append the file to the matrix string
model_arrays+=("$model_path")
fi
done
matrix_string=$(printf '"%s", ' "${model_arrays[@]}" | sed 's/, $//')
echo "matrix=[$matrix_string]" >> $GITHUB_OUTPUT
test_modified_files:
python3 - << 'EOF'
import os
import sys
import json
# Add the utils directory to Python path
sys.path.insert(0, 'utils')
# Import the important models list
from important_files import IMPORTANT_MODELS
print(f"Important models: {IMPORTANT_MODELS}")
# Get the changed files from the previous step
changed_files_str = os.environ.get('CHANGED_FILES', '')
changed_files = changed_files_str.split() if changed_files_str else []
# Filter to only Python files
python_files = [f for f in changed_files if f.endswith('.py')]
print(f"Python files changed: {python_files}")
result_models = set()
# Specific files that trigger all models
transformers_utils_files = [
'modeling_utils.py',
'modeling_rope_utils.py',
'modeling_flash_attention_utils.py',
'modeling_attn_mask_utils.py',
'cache_utils.py',
'masking_utils.py',
'pytorch_utils.py'
]
# Single loop through all Python files
for file in python_files:
# Check for files under src/transformers/models/
if file.startswith('src/transformers/models/'):
remaining_path = file[len('src/transformers/models/'):]
if '/' in remaining_path:
model_dir = remaining_path.split('/')[0]
if model_dir in IMPORTANT_MODELS:
result_models.add(model_dir)
print(f"Added model directory: {model_dir}")
# Check for specific files under src/transformers/ or src/transformers/generation/ files
elif file.startswith('src/transformers/generation/') or \
(file.startswith('src/transformers/') and os.path.basename(file) in transformers_utils_files):
print(f"Found core file: {file} - including all important models")
result_models.update(IMPORTANT_MODELS)
break # No need to continue once we include all models
# Convert to sorted list and create matrix
result_list = sorted(list(result_models))
print(f"Final model list: {result_list}")
if result_list:
matrix_json = json.dumps(result_list)
print(f"matrix={matrix_json}")
# Write to GITHUB_OUTPUT
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write(f"matrix={matrix_json}\n")
else:
print("matrix=[]")
with open(os.environ['GITHUB_OUTPUT'], 'a') as f:
f.write("matrix=[]\n")
EOF
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
needs: get_modified_models
name: Slow & FA2 tests
runs-on:
group: aws-g5-4xlarge-cache
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
if: ${{ needs.get_modified_models.outputs.matrix != '[]' && needs.get_modified_models.outputs.matrix != '' && fromJson(needs.get_modified_models.outputs.matrix)[0] != null }}
strategy:
fail-fast: false
matrix:
model-name: ${{ fromJson(needs.get_modified_models.outputs.matrix) }}
steps:
- name: Check out code
uses: actions/checkout@v4
- name: Install locally transformers & other libs
run: |
apt install sudo
sudo -H pip install --upgrade pip
sudo -H pip uninstall -y transformers
sudo -H pip install -U -e ".[testing]"
MAX_JOBS=4 pip install flash-attn --no-build-isolation
pip install bitsandbytes
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Show installed libraries and their versions
run: pip freeze
- name: Run FA2 tests
id: run_fa2_tests
run:
pytest -rsfE -m "flash_attn_test" --make-reports=${{ matrix.model-name }}_fa2_tests/ tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: ${{ matrix.model-name }}_fa2_tests"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.model-name }}_fa2_tests
path: /transformers/reports/${{ matrix.model-name }}_fa2_tests
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the FA2 tests - ${{ matrix.model-name }}
status: ${{ steps.run_fa2_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Run integration tests
id: run_integration_tests
if: always()
run:
pytest -rsfE -k "IntegrationTest" --make-reports=tests_integration_${{ matrix.model-name }} tests/${{ matrix.model-name }}/test_modeling_*
- name: "Test suite reports artifacts: tests_integration_${{ matrix.model-name }}"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: tests_integration_${{ matrix.model-name }}
path: /transformers/reports/tests_integration_${{ matrix.model-name }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ env.OUTPUT_SLACK_CHANNEL_ID }}
title: 🤗 Results of the Integration tests - ${{ matrix.model-name }}
status: ${{ steps.run_integration_tests.conclusion}}
slack_token: ${{ secrets.CI_SLACK_BOT_TOKEN }}
- name: Tailscale # In order to be able to SSH when a test fails
if: ${{ runner.debug == '1'}}
uses: huggingface/tailscale-action@v1
with:
authkey: ${{ secrets.TAILSCALE_SSH_AUTHKEY }}
slackChannel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
slackToken: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
waitForSSH: true
if: needs.get_modified_models.outputs.matrix != '' && needs.get_modified_models.outputs.matrix != '[]'
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-push"
docker: huggingface/transformers-all-latest-gpu
ci_event: push
report_repo_id: hf-internal-testing/transformers_ci_push
commit_sha: ${{ github.sha }}
models: ${{ needs.get_modified_models.outputs.matrix }}
secrets: inherit

View File

@ -29,7 +29,7 @@ jobs:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:

View File

@ -1,43 +1,56 @@
name: Self-hosted runner (nightly-ci)
name: Nvidia CI with nightly torch
on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
# triggered when the daily scheduled Nvidia CI is completed.
# This way, we can compare the results more easily.
workflow_run:
workflows: ["Nvidia CI"]
branches: ["main"]
types: [completed]
push:
branches:
- run_nightly_ci*
- run_ci_with_nightly_torch*
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
build_nightly_ci_images:
name: Build Nightly CI Docker Images
if: (github.event_name == 'schedule') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_nightly_ci'))
build_nightly_torch_ci_images:
name: Build CI Docker Images with nightly torch
uses: ./.github/workflows/build-nightly-ci-docker-images.yml
with:
job: latest-with-torch-nightly-docker
secrets: inherit
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
needs: [build_nightly_ci_images]
needs: build_nightly_torch_ci_images
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
docker: huggingface/transformers-all-latest-torch-nightly-gpu
ci_event: Nightly CI
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
needs: [build_nightly_ci_images]
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-past-future"
runner: ci
# test deepspeed nightly build with the latest release torch
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Nightly CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
secrets: inherit

View File

@ -1,25 +0,0 @@
name: Self-hosted runner (AMD mi300 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi300
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && (startsWith(github.ref_name, 'run_amd_push_ci_caller') || startsWith(github.ref_name, 'mi300-ci'))))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi300
secrets: inherit

View File

@ -36,7 +36,7 @@ jobs:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
@ -136,7 +136,7 @@ jobs:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
@ -362,7 +362,7 @@ jobs:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}

View File

@ -1,8 +1,8 @@
name: Self-hosted runner scale set (AMD mi300 scheduled CI caller)
name: Self-hosted runner scale set (AMD mi325 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu scale set: amd-mi300-ci-1gpu
# 2gpu scale set: amd-mi300-ci-2gpu
# For example, 1gpu scale set: amd-mi325-ci-1gpu
# 2gpu scale set: amd-mi325-ci-2gpu
on:
workflow_run:
@ -20,10 +20,11 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi300-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi300
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
torch-pipeline:
@ -32,10 +33,11 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi300-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi300
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
example-ci:
@ -44,10 +46,11 @@ jobs:
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi300-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi300
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit
deepspeed-ci:
@ -56,8 +59,9 @@ jobs:
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi300-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi300
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
env_file: /etc/podinfo/gha-gpu-isolation-settings
secrets: inherit

View File

@ -0,0 +1,63 @@
name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- run_amd_scheduled_ci_caller*
jobs:
model-ci:
name: Model CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
example-ci:
name: Example CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -1,5 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI
on:
repository_dispatch:
@ -7,7 +6,7 @@ on:
- cron: "17 2 * * *"
push:
branches:
- run_scheduled_ci*
- run_nvidia_ci*
workflow_dispatch:
inputs:
prev_workflow_run_id:
@ -53,7 +52,9 @@ jobs:
slack_report_channel: "#transformers-ci-daily-models"
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
torch-pipeline:
@ -65,6 +66,7 @@ jobs:
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
example-ci:
@ -76,6 +78,7 @@ jobs:
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
trainer-fsdp-ci:
@ -85,8 +88,10 @@ jobs:
job: run_trainer_and_fsdp_gpu
slack_report_channel: "#transformers-ci-daily-training"
docker: huggingface/transformers-all-latest-gpu
runner_type: "a10"
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
deepspeed-ci:
@ -99,6 +104,7 @@ jobs:
ci_event: Daily CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
quantization-ci:
@ -110,4 +116,5 @@ jobs:
docker: huggingface/transformers-quantization-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit

View File

@ -1,4 +1,4 @@
name: Self-hosted runner (scheduled)
name: Nvidia CI (job definitions)
# Note that each job's dependencies go into a corresponding docker file.
#
@ -28,7 +28,16 @@ on:
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
runner_type:
required: false
type: string
models:
default: ""
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -46,8 +55,8 @@ env:
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
name: Setup
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
strategy:
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
@ -55,17 +64,16 @@ jobs:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
runner_map: ${{ steps.set-matrix.outputs.runner_map }}
quantization_matrix: ${{ steps.set-matrix-quantization.outputs.quantization_matrix }}
steps:
- name: Update clone
working-directory: /transformers
run: |
git fetch && git checkout ${{ github.sha }}
git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Cleanup
working-directory: /transformers
@ -84,9 +92,8 @@ jobs:
working-directory: /transformers/tests
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
echo "runner_map=$(python3 ../utils/get_runner_map.py)" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
@ -110,15 +117,17 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
secrets: inherit
run_trainer_and_fsdp_gpu:
@ -135,8 +144,10 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
@ -155,7 +166,7 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -219,11 +230,11 @@ jobs:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus 0 --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -292,7 +303,7 @@ jobs:
steps:
- name: Update clone
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: ${{ inputs.working-directory-prefix }}/transformers
@ -400,7 +411,7 @@ jobs:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -464,6 +475,7 @@ jobs:
uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install transformers
run: pip install transformers
@ -506,7 +518,7 @@ jobs:
run_quantization_torch_gpu,
run_extract_warnings
]
if: ${{ always() }}
if: always() && !cancelled()
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
@ -518,6 +530,7 @@ jobs:
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
secrets: inherit
@ -528,7 +541,7 @@ jobs:
uses: ./.github/workflows/check_failed_tests.yml
with:
docker: ${{ inputs.docker }}
start_sha: ${{ github.sha }}
start_sha: ${{ inputs.commit_sha || github.sha }}
job: ${{ inputs.job }}
slack_report_channel: ${{ inputs.slack_report_channel }}
ci_event: ${{ inputs.ci_event }}

View File

@ -24,6 +24,10 @@ on:
report_repo_id:
required: true
type: string
commit_sha:
required: false
type: string
env:
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
@ -32,7 +36,7 @@ jobs:
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
if: always() && !cancelled()
steps:
- name: Preliminary job status
shell: bash
@ -41,6 +45,10 @@ jobs:
echo "Setup status: ${{ inputs.setup_status }}"
- uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- uses: actions/download-artifact@v4
- name: Prepare some setup values
@ -67,7 +75,9 @@ jobs:
SLACK_REPORT_CHANNEL: ${{ inputs.slack_report_channel }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: ${{ inputs.ci_event }}
CI_SHA: ${{ github.sha }}
# This `CI_TITLE` would be empty for `schedule` or `workflow_run` events.
CI_TITLE: ${{ github.event.head_commit.message }}
CI_SHA: ${{ inputs.commit_sha || github.sha }}
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}

4
.gitignore vendored
View File

@ -13,6 +13,7 @@ tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/
reports/
# Distribution / packaging
.Python
@ -167,3 +168,6 @@ tags
# ruff
.ruff_cache
# modular conversion
*.modular_backup

View File

@ -68,8 +68,7 @@ already reported** (use the search bar on GitHub under Issues). Your issue shoul
Once you've confirmed the bug hasn't already been reported, please include the following information in your issue so we can quickly resolve it:
* Your **OS type and version** and **Python**, **PyTorch** and
**TensorFlow** versions when applicable.
* Your **OS type and version** and **Python**, and **PyTorch** versions when applicable.
* A short, self-contained, code snippet that allows us to reproduce the bug in
less than 30s.
* The *full* traceback if an exception is raised.
@ -165,8 +164,7 @@ You'll need **[Python 3.9](https://github.com/huggingface/transformers/blob/main
mode with the `-e` flag.
Depending on your OS, and since the number of optional dependencies of Transformers is growing, you might get a
failure with this command. If that's the case make sure to install the Deep Learning framework you are working with
(PyTorch, TensorFlow and/or Flax) then do:
failure with this command. If that's the case make sure to install Pytorch then do:
```bash
pip install -e ".[quality]"
@ -280,13 +278,14 @@ are working on it).<br>
useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.<br>
☐ Make sure existing tests pass.<br>
☐ If adding a new feature, also add tests for it.<br>
- If you are adding a new model, make sure you use
- If you are adding a new model, make sure you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` to trigger the common tests.
- If you are adding new `@slow` tests, make sure they pass using
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests and make sure
- If you are adding a new tokenizer, write tests and make sure
`RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` passes.
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
☐ All public methods must have informative docstrings (see
[`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)
@ -330,11 +329,8 @@ By default, slow tests are skipped but you can set the `RUN_SLOW` environment va
`yes` to run them. This will download many gigabytes of models so make sure you
have enough disk space, a good internet connection or a lot of patience!
<Tip warning={true}>
Remember to specify a *path to a subfolder or a test file* to run the test. Otherwise, you'll run all the tests in the `tests` or `examples` folder, which will take a very long time!
</Tip>
> [!WARNING]
> Remember to specify a *path to a subfolder or a test file* to run the test. Otherwise, you'll run all the tests in the `tests` or `examples` folder, which will take a very long time!
```bash
RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model
@ -342,6 +338,7 @@ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/t
```
Like the slow tests, there are other environment variables available which are not enabled by default during testing:
- `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers.
More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py).

View File

@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
1. https://github.com/huggingface/transformers/issues/9257
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:

View File

@ -3,7 +3,7 @@
# make sure to test the local checkout in scripts and not the pre-installed one (don't use quotes!)
export PYTHONPATH = src
check_dirs := examples tests src utils
check_dirs := examples tests src utils scripts benchmark benchmark_v2
exclude_folders := ""
@ -52,6 +52,7 @@ repo-consistency:
python utils/check_doctest_list.py
python utils/update_metadata.py --check-only
python utils/check_docstrings.py
python utils/add_dates.py
# this target runs checks on all files

View File

@ -44,13 +44,14 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ja.md">日本語</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_hd.md">हिन्दी</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ru.md">Русский</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Рortuguês</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_pt-br.md">Português</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> |
</p>
</h4>
@ -62,7 +63,6 @@ limitations under the License.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
@ -80,7 +80,7 @@ Explore the [Hub](https://huggingface.com/) today to find a model and use Transf
## Installation
Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+.
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
@ -147,7 +147,7 @@ chat = [
{"role": "user", "content": "Hey, can you tell me any fun things to do in New York?"}
]
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipeline(chat, max_new_tokens=512)
print(response[0]["generated_text"][-1]["content"])
```
@ -193,7 +193,6 @@ pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.pn
<details>
<summary>Visual question answering</summary>
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
</h3>
@ -242,7 +241,7 @@ pipeline(
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
- The [example scripts]((https://github.com/huggingface/transformers/tree/main/examples)) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
## 100 projects using Transformers
@ -280,8 +279,8 @@ Expand each modality below to see a few example models for various use cases.
- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
- Keypoint detection with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue)
- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)

View File

@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re
models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
To avoid loading models from unsafe formats (e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
### Remote code

View File

@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen

1
benchmark/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
benchmark_results/

354
benchmark/benches/llama.py Normal file
View File

@ -0,0 +1,354 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import sys
from logging import Logger
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
# Add the parent directory to Python path to import benchmarks_entrypoint
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import gpustat
import psutil
import psycopg2
from benchmarks_entrypoint import MetricsRecorder
# Optional heavy ML dependencies - only required when actually running the benchmark
try:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
TRANSFORMERS_AVAILABLE = True
except ImportError:
TRANSFORMERS_AVAILABLE = False
torch = None
AutoModelForCausalLM = None
AutoTokenizer = None
GenerationConfig = None
StaticCache = None
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
# Only set torch precision if torch is available
if TRANSFORMERS_AVAILABLE:
torch.set_float32_matmul_precision("high")
def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
p = psutil.Process(os.getpid())
while not continue_metric_collection.is_set():
with p.oneshot():
cpu_util = p.cpu_percent()
mem_megabytes = p.memory_info().rss / (1024 * 1024)
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_util = gpu_stats[0]["utilization.gpu"]
gpu_mem_megabytes = gpu_stats[0]["memory.used"]
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
sleep(0.01)
def run_benchmark(
logger: Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
metrics_recorder=None,
num_tokens_to_generate=100,
):
# Check if required ML dependencies are available
if not TRANSFORMERS_AVAILABLE:
logger.error("Transformers and torch are required to run the LLaMA benchmark. Please install them with:")
logger.error("pip install torch transformers")
logger.error("Skipping LLaMA benchmark due to missing dependencies.")
return
continue_metric_collection = Event()
metrics_thread = None
model_id = "meta-llama/Llama-2-7b-hf"
# If no metrics_recorder is provided, create one for backward compatibility
if metrics_recorder is None:
try:
metrics_recorder = MetricsRecorder(
psycopg2.connect("dbname=metrics"), logger, repository, branch, commit_id, commit_msg, True
)
should_close_recorder = True
except Exception as e:
logger.error(f"Failed to create metrics recorder: {e}")
return
else:
should_close_recorder = False
try:
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_name = gpu_stats[0]["name"]
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
metrics_thread = Thread(
target=collect_metrics,
args=[benchmark_id, continue_metric_collection, metrics_recorder],
)
metrics_thread.start()
logger.info("started background thread to fetch device metrics")
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
device = "cuda"
logger.info("downloading weights")
# This is to avoid counting download in model load time measurement
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16)
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
logger.info("loading model")
start = perf_counter()
model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=torch.float16, generation_config=gen_config
).eval()
model.to(device)
torch.cuda.synchronize()
end = perf_counter()
model_load_time = end - start
logger.info(f"loaded model in: {model_load_time}s")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Why dogs are so cute?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Specify the max length (including both the prompt and the response)
# When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
# with sequence length = `max_length`. The longer the more you will re-use it
seq_length = inputs["input_ids"].shape[1]
model.generation_config.max_length = seq_length + num_tokens_to_generate
batch_size = inputs["input_ids"].shape[0]
# Copied from the gpt-fast repo
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
pivot = v.select(-1, -1).unsqueeze(-1)
logits = torch.where(logits < pivot, -float("Inf"), logits)
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
probs = logits_to_probs(logits[0, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs
# First eager forward pass
logger.info("running first eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_fwd_pass_time = end - start
logger.info(f"completed first eager forward pass in: {first_eager_fwd_pass_time}s")
# Second eager forward pass (should be faster)
logger.info("running second eager forward pass")
start = perf_counter()
_ = model(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_fwd_pass_time = end - start
logger.info(f"completed second eager forward pass in: {second_eager_fwd_pass_time}s")
# First eager generation
logger.info("running first eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
first_eager_generate_time = end - start
logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
# Second eager generation (should be faster)
logger.info("running second eager generation")
start = perf_counter()
output = model.generate(**inputs)
torch.cuda.synchronize()
end = perf_counter()
second_eager_generate_time = end - start
logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
logger.info("running generation timing loop")
input_pos = torch.arange(0, seq_length, device=device)
inputs = inputs["input_ids"]
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(inputs, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_first_token = end - start
input_pos = torch.tensor([seq_length], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_second_token = end - start
input_pos = torch.tensor([seq_length + 1], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
time_to_third_token = end - start
logger.info("running longer generation timing loop")
total_time = 0
for i in range(20):
input_pos = torch.tensor([seq_length + 2 + i], device=device, dtype=torch.int)
next_token = next_token.clone()
start = perf_counter()
with torch.nn.attention.sdpa_kernel(torch.nn.attention.SDPBackend.MATH):
logits = model(next_token, position_ids=input_pos).logits
next_token, probs = sample(logits, temperature=0.6, top_k=5)
torch.cuda.synchronize()
end = perf_counter()
total_time += end - start
mean_time_to_next_token = total_time / 20
logger.info("running compilation benchmarks")
# Now compile the model
model = torch.compile(model, mode="max-autotune", fullgraph=True)
# StaticCache for generation
with torch.device(device):
model.setup_caches(max_batch_size=batch_size, max_seq_len=seq_length + num_tokens_to_generate)
input_pos = torch.arange(0, seq_length, device=device)
inputs = tokenizer(prompt, return_tensors="pt").to(device)["input_ids"]
logger.info("compiling model")
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.float16, generation_config=gen_config)
model.to(device)
model = torch.compile(model, mode="max-autotune", fullgraph=True)
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 1st call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
first_compile_generate_time = end - start
logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 2nd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
second_compile_generate_time = end - start
logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 3rd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
third_compile_generate_time = end - start
logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 4th call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
fourth_compile_generate_time = end - start
logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
except Exception as e:
logger.error(f"Caught exception: {e}")
continue_metric_collection.set()
if metrics_thread is not None:
metrics_thread.join()
# Only close the recorder if we created it locally
if should_close_recorder:
metrics_recorder.close()

View File

@ -31,9 +31,7 @@ from contextlib import contextmanager
from pathlib import Path
from git import Repo
from huggingface_hub import HfApi
from optimum_benchmark import Benchmark
from optimum_benchmark_wrapper import main

View File

@ -1,15 +1,36 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import importlib.util
import json
import logging
import os
import sys
from typing import Dict, Tuple
import uuid
from datetime import datetime
from psycopg2.extensions import register_adapter
from psycopg2.extras import Json
import pandas as pd
register_adapter(dict, Json)
try:
from psycopg2.extensions import register_adapter
from psycopg2.extras import Json
register_adapter(dict, Json)
PSYCOPG2_AVAILABLE = True
except ImportError:
PSYCOPG2_AVAILABLE = False
class ImportModuleException(Exception):
@ -18,61 +39,272 @@ class ImportModuleException(Exception):
class MetricsRecorder:
def __init__(
self, connection, logger: logging.Logger, repository: str, branch: str, commit_id: str, commit_msg: str
self,
connection,
logger: logging.Logger,
repository: str,
branch: str,
commit_id: str,
commit_msg: str,
collect_csv_data: bool = True,
):
self.conn = connection
self.conn.autocommit = True
self.use_database = connection is not None
if self.use_database:
self.conn.autocommit = True
self.logger = logger
self.repository = repository
self.branch = branch
self.commit_id = commit_id
self.commit_msg = commit_msg
self.collect_csv_data = collect_csv_data
def initialise_benchmark(self, metadata: dict[str, str]) -> int:
"""
Creates a new benchmark, returns the benchmark id
"""
# gpu_name: str, model_id: str
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO benchmarks (repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s) RETURNING benchmark_id",
(self.repository, self.branch, self.commit_id, self.commit_msg, metadata),
# For CSV export - store all data in pandas DataFrames (only if CSV collection is enabled)
if self.collect_csv_data:
# Initialize empty DataFrames with proper schemas
self.benchmarks_df = pd.DataFrame(
columns=[
"benchmark_id",
"repository",
"branch",
"commit_id",
"commit_message",
"metadata",
"created_at",
]
)
benchmark_id = cur.fetchone()[0]
logger.debug(f"initialised benchmark #{benchmark_id}")
return benchmark_id
self.device_measurements_df = pd.DataFrame(
columns=["benchmark_id", "cpu_util", "mem_megabytes", "gpu_util", "gpu_mem_megabytes", "time"]
)
self.model_measurements_df = pd.DataFrame(
columns=[
"benchmark_id",
"time",
"model_load_time",
"first_eager_forward_pass_time_secs",
"second_eager_forward_pass_time_secs",
"first_eager_generate_time_secs",
"second_eager_generate_time_secs",
"time_to_first_token_secs",
"time_to_second_token_secs",
"time_to_third_token_secs",
"time_to_next_token_mean_secs",
"first_compile_generate_time_secs",
"second_compile_generate_time_secs",
"third_compile_generate_time_secs",
"fourth_compile_generate_time_secs",
]
)
else:
self.benchmarks_df = None
self.device_measurements_df = None
self.model_measurements_df = None
def collect_device_measurements(self, benchmark_id: int, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
def initialise_benchmark(self, metadata: dict[str, str]) -> str:
"""
Creates a new benchmark, returns the benchmark id (UUID)
"""
# Generate a unique UUID for this benchmark
benchmark_id = str(uuid.uuid4())
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO benchmarks (benchmark_id, repository, branch, commit_id, commit_message, metadata) VALUES (%s, %s, %s, %s, %s, %s)",
(benchmark_id, self.repository, self.branch, self.commit_id, self.commit_msg, metadata),
)
self.logger.debug(f"initialised benchmark #{benchmark_id}")
# Store benchmark data for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"repository": self.repository,
"branch": self.branch,
"commit_id": self.commit_id,
"commit_message": self.commit_msg,
"metadata": json.dumps(metadata),
"created_at": datetime.utcnow().isoformat(),
}
]
)
self.benchmarks_df = pd.concat([self.benchmarks_df, new_row], ignore_index=True)
mode_info = []
if self.use_database:
mode_info.append("database")
if self.collect_csv_data:
mode_info.append("CSV")
mode_str = " + ".join(mode_info) if mode_info else "no storage"
self.logger.debug(f"initialised benchmark #{benchmark_id} ({mode_str} mode)")
return benchmark_id
def collect_device_measurements(self, benchmark_id: str, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes):
"""
Collect device metrics, such as CPU & GPU usage. These are "static", as in you cannot pass arbitrary arguments to the function.
"""
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
(benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
# Store device measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame
new_row = pd.DataFrame(
[
{
"benchmark_id": benchmark_id,
"cpu_util": cpu_util,
"mem_megabytes": mem_megabytes,
"gpu_util": gpu_util,
"gpu_mem_megabytes": gpu_mem_megabytes,
"time": datetime.utcnow().isoformat(),
}
]
)
self.device_measurements_df = pd.concat([self.device_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"INSERT INTO device_measurements (benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes) VALUES (%s, %s, %s, %s, %s)",
(benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes),
)
self.logger.debug(
f"inserted device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
f"collected device measurements for benchmark #{benchmark_id} [CPU util: {cpu_util}, mem MBs: {mem_megabytes}, GPU util: {gpu_util}, GPU mem MBs: {gpu_mem_megabytes}]"
)
def collect_model_measurements(self, benchmark_id: int, measurements: dict[str, float]):
with self.conn.cursor() as cur:
cur.execute(
"""
INSERT INTO model_measurements (
benchmark_id,
measurements
) VALUES (%s, %s)
""",
(
benchmark_id,
measurements,
),
def collect_model_measurements(self, benchmark_id: str, measurements: dict[str, float]):
# Store model measurements for CSV export (if enabled)
if self.collect_csv_data:
# Add row to pandas DataFrame with flattened measurements
row_data = {"benchmark_id": benchmark_id, "time": datetime.utcnow().isoformat()}
# Flatten the measurements dict into the row
row_data.update(measurements)
new_row = pd.DataFrame([row_data])
self.model_measurements_df = pd.concat([self.model_measurements_df, new_row], ignore_index=True)
# Store in database if available
if self.use_database:
with self.conn.cursor() as cur:
cur.execute(
"""
INSERT INTO model_measurements (
benchmark_id,
measurements
) VALUES (%s, %s)
""",
(
benchmark_id,
measurements,
),
)
self.logger.debug(f"collected model measurements for benchmark #{benchmark_id}: {measurements}")
def export_to_csv(self, output_dir: str = "benchmark_results"):
"""
Export all collected data to CSV files using pandas DataFrames
"""
if not self.collect_csv_data:
self.logger.warning("CSV data collection is disabled - no CSV files will be generated")
return
if not os.path.exists(output_dir):
os.makedirs(output_dir)
self.logger.info(f"Created output directory: {output_dir}")
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
files_created = []
# Export using pandas DataFrames
self._export_pandas_data(output_dir, timestamp, files_created)
self.logger.info(f"CSV export complete! Created {len(files_created)} files in {output_dir}")
def _export_pandas_data(self, output_dir: str, timestamp: str, files_created: list):
"""
Export CSV files using pandas DataFrames
"""
# Export benchmarks
benchmarks_file = os.path.join(output_dir, f"benchmarks_{timestamp}.csv")
self.benchmarks_df.to_csv(benchmarks_file, index=False)
files_created.append(benchmarks_file)
self.logger.info(f"Exported {len(self.benchmarks_df)} benchmark records to {benchmarks_file}")
# Export device measurements
device_file = os.path.join(output_dir, f"device_measurements_{timestamp}.csv")
self.device_measurements_df.to_csv(device_file, index=False)
files_created.append(device_file)
self.logger.info(f"Exported {len(self.device_measurements_df)} device measurement records to {device_file}")
# Export model measurements (already flattened)
model_file = os.path.join(output_dir, f"model_measurements_{timestamp}.csv")
self.model_measurements_df.to_csv(model_file, index=False)
files_created.append(model_file)
self.logger.info(f"Exported {len(self.model_measurements_df)} model measurement records to {model_file}")
# Create comprehensive summary using pandas operations
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.csv")
self._create_summary(summary_file)
files_created.append(summary_file)
def _create_summary(self, summary_file: str):
"""
Create a comprehensive summary CSV using pandas operations
"""
if len(self.benchmarks_df) == 0:
# Create empty summary file
summary_df = pd.DataFrame()
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created empty benchmark summary at {summary_file}")
return
# Start with benchmarks as the base
summary_df = self.benchmarks_df.copy()
# Add model measurements (join on benchmark_id)
if len(self.model_measurements_df) > 0:
# Drop 'time' column from model measurements to avoid conflicts
model_df = self.model_measurements_df.drop(columns=["time"], errors="ignore")
summary_df = summary_df.merge(model_df, on="benchmark_id", how="left")
# Calculate device measurement aggregates using pandas groupby
if len(self.device_measurements_df) > 0:
device_agg = (
self.device_measurements_df.groupby("benchmark_id")
.agg(
{
"cpu_util": ["mean", "max", "std", "count"],
"mem_megabytes": ["mean", "max", "std"],
"gpu_util": ["mean", "max", "std"],
"gpu_mem_megabytes": ["mean", "max", "std"],
}
)
.round(3)
)
self.logger.debug(f"inserted model measurements for benchmark #{benchmark_id}: {measurements}")
# Flatten column names
device_agg.columns = [f"{col[0]}_{col[1]}" for col in device_agg.columns]
device_agg = device_agg.reset_index()
# Rename count column to be more descriptive
if "cpu_util_count" in device_agg.columns:
device_agg = device_agg.rename(columns={"cpu_util_count": "device_measurement_count"})
# Merge with summary
summary_df = summary_df.merge(device_agg, on="benchmark_id", how="left")
# Export the comprehensive summary
summary_df.to_csv(summary_file, index=False)
self.logger.info(f"Created comprehensive benchmark summary with {len(summary_df)} records at {summary_file}")
def close(self):
self.conn.close()
if self.use_database and self.conn:
self.conn.close()
logger = logging.getLogger(__name__)
@ -85,7 +317,7 @@ handler.setFormatter(formatter)
logger.addHandler(handler)
def parse_arguments() -> tuple[str, str, str, str]:
def parse_arguments() -> tuple[str, str, str, str, bool, str]:
"""
Parse command line arguments for the benchmarking CLI.
"""
@ -115,9 +347,21 @@ def parse_arguments() -> tuple[str, str, str, str]:
help="The commit message associated with the commit, truncated to 70 characters.",
)
parser.add_argument("--csv", action="store_true", default=False, help="Enable CSV output files generation.")
parser.add_argument(
"--csv-output-dir",
type=str,
default="benchmark_results",
help="Directory for CSV output files (default: benchmark_results).",
)
args = parser.parse_args()
return args.repository, args.branch, args.commit_id, args.commit_msg
# CSV is disabled by default, only enabled when --csv is used
generate_csv = args.csv
return args.repository, args.branch, args.commit_id, args.commit_msg, generate_csv, args.csv_output_dir
def import_from_path(module_name, file_path):
@ -131,22 +375,128 @@ def import_from_path(module_name, file_path):
raise ImportModuleException(f"failed to load python module: {e}")
def create_database_connection():
"""
Try to create a database connection. Returns None if connection fails.
"""
if not PSYCOPG2_AVAILABLE:
logger.warning("psycopg2 not available - running in CSV-only mode")
return None
try:
import psycopg2
conn = psycopg2.connect("dbname=metrics")
logger.info("Successfully connected to database")
return conn
except Exception as e:
logger.warning(f"Failed to connect to database: {e}. Running in CSV-only mode")
return None
def create_global_metrics_recorder(
repository: str, branch: str, commit_id: str, commit_msg: str, generate_csv: bool = False
) -> MetricsRecorder:
"""
Create a global metrics recorder that will be used across all benchmarks.
"""
connection = create_database_connection()
recorder = MetricsRecorder(connection, logger, repository, branch, commit_id, commit_msg, generate_csv)
# Log the storage mode
storage_modes = []
if connection is not None:
storage_modes.append("database")
if generate_csv:
storage_modes.append("CSV")
if not storage_modes:
logger.warning("Running benchmarks with NO data storage (no database connection, CSV disabled)")
logger.warning("Use --csv flag to enable CSV output when database is unavailable")
else:
logger.info(f"Running benchmarks with: {' + '.join(storage_modes)} storage")
return recorder
if __name__ == "__main__":
benchmarks_folder_path = os.path.dirname(os.path.realpath(__file__))
benches_folder_path = os.path.join(benchmarks_folder_path, "benches")
repository, branch, commit_id, commit_msg = parse_arguments()
repository, branch, commit_id, commit_msg, generate_csv, csv_output_dir = parse_arguments()
for entry in os.scandir(benchmarks_folder_path):
try:
# Create a global metrics recorder
global_metrics_recorder = create_global_metrics_recorder(repository, branch, commit_id, commit_msg, generate_csv)
successful_benchmarks = 0
failed_benchmarks = 0
# Automatically discover all benchmark modules in benches/ folder
benchmark_modules = []
if os.path.exists(benches_folder_path):
logger.debug(f"Scanning for benchmarks in: {benches_folder_path}")
for entry in os.scandir(benches_folder_path):
if not entry.name.endswith(".py"):
continue
if entry.path == __file__:
if entry.name.startswith("__"): # Skip __init__.py, __pycache__, etc.
continue
logger.debug(f"loading: {entry.name}")
module = import_from_path(entry.name.split(".")[0], entry.path)
logger.info(f"running benchmarks in: {entry.name}")
module.run_benchmark(logger, repository, branch, commit_id, commit_msg)
# Check if the file has a run_benchmark function
try:
logger.debug(f"checking if benches/{entry.name} has run_benchmark function")
module = import_from_path(entry.name.split(".")[0], entry.path)
if hasattr(module, "run_benchmark"):
benchmark_modules.append(entry.name)
logger.debug(f"discovered benchmark: {entry.name}")
else:
logger.debug(f"skipping {entry.name} - no run_benchmark function found")
except Exception as e:
logger.debug(f"failed to check benches/{entry.name}: {e}")
else:
logger.warning(f"Benches directory not found: {benches_folder_path}")
if benchmark_modules:
logger.info(f"Discovered {len(benchmark_modules)} benchmark(s): {benchmark_modules}")
else:
logger.warning("No benchmark modules found in benches/ directory")
for module_name in benchmark_modules:
module_path = os.path.join(benches_folder_path, module_name)
try:
logger.debug(f"loading: {module_name}")
module = import_from_path(module_name.split(".")[0], module_path)
logger.info(f"running benchmarks in: {module_name}")
# Check if the module has an updated run_benchmark function that accepts metrics_recorder
try:
# Try the new signature first
module.run_benchmark(logger, repository, branch, commit_id, commit_msg, global_metrics_recorder)
except TypeError:
# Fall back to the old signature for backward compatibility
logger.warning(
f"Module {module_name} using old run_benchmark signature - database connection will be created per module"
)
module.run_benchmark(logger, repository, branch, commit_id, commit_msg)
successful_benchmarks += 1
except ImportModuleException as e:
logger.error(e)
failed_benchmarks += 1
except Exception as e:
logger.error(f"error running benchmarks for {entry.name}: {e}")
logger.error(f"error running benchmarks for {module_name}: {e}")
failed_benchmarks += 1
# Export CSV results at the end (if enabled)
try:
if generate_csv:
global_metrics_recorder.export_to_csv(csv_output_dir)
logger.info(f"CSV reports have been generated and saved to the {csv_output_dir} directory")
else:
logger.info("CSV generation disabled - no CSV files created (use --csv to enable)")
logger.info(f"Benchmark run completed. Successful: {successful_benchmarks}, Failed: {failed_benchmarks}")
except Exception as e:
logger.error(f"Failed to export CSV results: {e}")
finally:
global_metrics_recorder.close()

View File

@ -19,7 +19,7 @@ backend:
model: meta-llama/Llama-2-7b-hf
cache_implementation: static
torch_compile: true
torch_dtype: float16
dtype: float16
torch_compile_config:
backend: inductor
mode: reduce-overhead

View File

@ -1,34 +0,0 @@
CREATE TABLE IF NOT EXISTS benchmarks (
benchmark_id SERIAL PRIMARY KEY,
repository VARCHAR(255),
branch VARCHAR(255),
commit_id VARCHAR(72),
commit_message VARCHAR(70),
metadata jsonb,
created_at timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS benchmarks_benchmark_id_idx ON benchmarks (benchmark_id);
CREATE INDEX IF NOT EXISTS benchmarks_branch_idx ON benchmarks (branch);
CREATE TABLE IF NOT EXISTS device_measurements (
measurement_id SERIAL PRIMARY KEY,
benchmark_id int REFERENCES benchmarks (benchmark_id),
cpu_util double precision,
mem_megabytes double precision,
gpu_util double precision,
gpu_mem_megabytes double precision,
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS device_measurements_branch_idx ON device_measurements (benchmark_id);
CREATE TABLE IF NOT EXISTS model_measurements (
measurement_id SERIAL PRIMARY KEY,
benchmark_id int REFERENCES benchmarks (benchmark_id),
measurements jsonb,
time timestamp without time zone NOT NULL DEFAULT (current_timestamp AT TIME ZONE 'UTC')
);
CREATE INDEX IF NOT EXISTS model_measurements_branch_idx ON model_measurements (benchmark_id);

View File

@ -1,346 +0,0 @@
from logging import Logger
import os
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
from benchmarks_entrypoint import MetricsRecorder
import gpustat
import psutil
import psycopg2
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, StaticCache
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
def collect_metrics(benchmark_id, continue_metric_collection, metrics_recorder):
p = psutil.Process(os.getpid())
while not continue_metric_collection.is_set():
with p.oneshot():
cpu_util = p.cpu_percent()
mem_megabytes = p.memory_info().rss / (1024 * 1024)
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_util = gpu_stats[0]["utilization.gpu"]
gpu_mem_megabytes = gpu_stats[0]["memory.used"]
metrics_recorder.collect_device_measurements(
benchmark_id, cpu_util, mem_megabytes, gpu_util, gpu_mem_megabytes
)
sleep(0.01)
def run_benchmark(
logger: Logger, repository: str, branch: str, commit_id: str, commit_msg: str, num_tokens_to_generate=100
):
continue_metric_collection = Event()
metrics_thread = None
model_id = "meta-llama/Llama-2-7b-hf"
metrics_recorder = MetricsRecorder(
psycopg2.connect("dbname=metrics"), logger, repository, branch, commit_id, commit_msg
)
try:
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_name = gpu_stats[0]["name"]
benchmark_id = metrics_recorder.initialise_benchmark({"gpu_name": gpu_name, "model_id": model_id})
logger.info(f"running benchmark #{benchmark_id} on {gpu_name} for {model_id}")
metrics_thread = Thread(
target=collect_metrics,
args=[benchmark_id, continue_metric_collection, metrics_recorder],
)
metrics_thread.start()
logger.info("started background thread to fetch device metrics")
os.environ["TOKENIZERS_PARALLELISM"] = "false" # silence warnings when compiling
device = "cuda"
logger.info("downloading weights")
# This is to avoid counting download in model load time measurement
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16)
gen_config = GenerationConfig(do_sample=False, top_p=1, temperature=1)
logger.info("loading model")
start = perf_counter()
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.float16, generation_config=gen_config
).eval()
model.to(device)
torch.cuda.synchronize()
end = perf_counter()
model_load_time = end - start
logger.info(f"loaded model in: {model_load_time}s")
tokenizer = AutoTokenizer.from_pretrained(model_id)
prompt = "Why dogs are so cute?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)
# Specify the max length (including both the prompt and the response)
# When calling `generate` with `cache_implementation="static" later, this is also used to create a `StaticCache` object
# with sequence length = `max_length`. The longer the more you will re-use it
seq_length = inputs["input_ids"].shape[1]
model.generation_config.max_length = seq_length + num_tokens_to_generate
batch_size = inputs["input_ids"].shape[0]
# Copied from the gpt-fast repo
def multinomial_sample_one_no_sync(probs_sort): # Does multinomial sampling without a cuda synchronization
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
v, _ = torch.topk(logits, min(top_k, logits.size(-1)))
pivot = v.select(-1, -1).unsqueeze(-1)
logits = torch.where(logits < pivot, -float("Inf"), logits)
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
probs = logits_to_probs(logits[:, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs
def decode_one_token(model, cur_token, cache_position, past_key_values):
logits = model(
cur_token,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)[0]
new_token = sample(logits, temperature=0.6, top_k=5)[0]
return new_token
#########
# Eager #
#########
with torch.no_grad():
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate,
)
cache_position = torch.arange(seq_length, device=device)
start = perf_counter()
model(
**inputs,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)
end = perf_counter()
first_eager_fwd_pass_time = end - start
logger.info(f"completed first eager fwd pass in: {first_eager_fwd_pass_time}s")
start = perf_counter()
output = model.generate(**inputs, do_sample=False)
end = perf_counter()
first_eager_generate_time = end - start
logger.info(f"completed first eager generation in: {first_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate,
)
cache_position = torch.arange(seq_length, device=device)
start = perf_counter()
model(
**inputs,
cache_position=cache_position,
past_key_values=past_key_values,
return_dict=False,
use_cache=True,
)
end = perf_counter()
second_eager_fwd_pass_time = end - start
logger.info(f"completed second eager fwd pass in: {second_eager_fwd_pass_time}s")
start = perf_counter()
model.generate(**inputs, do_sample=False)
end = perf_counter()
second_eager_generate_time = end - start
logger.info(f"completed second eager generation in: {second_eager_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
torch.compiler.reset()
################
# Forward pass #
################
# `torch.compile(model, ...)` is not recommended as you compile callbacks
# and full generate. We recommend compiling only the forward for now.
# "reduce-overhead" will use cudagraphs.
generated_ids = torch.zeros(
(batch_size, num_tokens_to_generate + seq_length), dtype=torch.int, device=device
)
generated_ids[:, :seq_length] = inputs["input_ids"]
decode_one_token = torch.compile(decode_one_token, mode="reduce-overhead", fullgraph=True)
# model.forward = torch.compile(model.forward, mode="reduce-overhead", fullgraph=True)
# TODO use decode_one_token(model, input_id.clone(), cache_position) for verification
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + num_tokens_to_generate + 10,
)
cache_position = torch.arange(seq_length, device=device)
all_generated_tokens = []
### First compile, prefill
start = perf_counter()
next_token = decode_one_token(
model, inputs["input_ids"], cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_first_token = end - start
logger.info(f"completed first compile generation in: {time_to_first_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
cache_position = torch.tensor([seq_length], device=device)
### First compile, decoding
start = perf_counter()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_second_token = end - start
logger.info(f"completed second compile generation in: {time_to_second_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
### Second compile, decoding
start = perf_counter()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
torch.cuda.synchronize()
end = perf_counter()
time_to_third_token = end - start
logger.info(f"completed third compile forward in: {time_to_third_token}s")
cache_position += 1
all_generated_tokens += next_token.tolist()
### Using cuda graphs decoding
start = perf_counter()
for _ in range(1, num_tokens_to_generate):
all_generated_tokens += next_token.tolist()
next_token = decode_one_token(
model, next_token.clone(), cache_position=cache_position, past_key_values=past_key_values
)
cache_position += 1
torch.cuda.synchronize()
end = perf_counter()
mean_time_to_next_token = (end - start) / num_tokens_to_generate
logger.info(f"completed next compile generation in: {mean_time_to_next_token}s")
logger.info(f"generated: {tokenizer.batch_decode(all_generated_tokens)}")
####################
# Generate compile #
####################
torch.compiler.reset()
# we will not compile full generate as it' s to intensive, tho we measure full forward!
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 1st call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
torch.cuda.synchronize()
end = perf_counter()
first_compile_generate_time = end - start
logger.info(f"completed first compile generation in: {first_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 2nd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
torch.cuda.synchronize()
end = perf_counter()
second_compile_generate_time = end - start
logger.info(f"completed second compile generation in: {second_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 3rd call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
third_compile_generate_time = end - start
logger.info(f"completed third compile generation in: {third_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
past_key_values = StaticCache(
model.config,
max_batch_size=batch_size,
device=device,
dtype=torch.float16,
max_cache_len=seq_length + 128,
)
# 4th call
start = perf_counter()
output = model.generate(**inputs, past_key_values=past_key_values)
end = perf_counter()
fourth_compile_generate_time = end - start
logger.info(f"completed fourth compile generation in: {fourth_compile_generate_time}s")
logger.info(f"generated: {tokenizer.batch_decode(output.cpu().tolist())}")
metrics_recorder.collect_model_measurements(
benchmark_id,
{
"model_load_time": model_load_time,
"first_eager_forward_pass_time_secs": first_eager_fwd_pass_time,
"second_eager_forward_pass_time_secs": second_eager_fwd_pass_time,
"first_eager_generate_time_secs": first_eager_generate_time,
"second_eager_generate_time_secs": second_eager_generate_time,
"time_to_first_token_secs": time_to_first_token,
"time_to_second_token_secs": time_to_second_token,
"time_to_third_token_secs": time_to_third_token,
"time_to_next_token_mean_secs": mean_time_to_next_token,
"first_compile_generate_time_secs": first_compile_generate_time,
"second_compile_generate_time_secs": second_compile_generate_time,
"third_compile_generate_time_secs": third_compile_generate_time,
"fourth_compile_generate_time_secs": fourth_compile_generate_time,
},
)
except Exception as e:
logger.error(f"Caught exception: {e}")
continue_metric_collection.set()
if metrics_thread is not None:
metrics_thread.join()
metrics_recorder.close()

View File

@ -3,7 +3,11 @@ import subprocess
def main(config_dir, config_name, args):
subprocess.run(["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"] + ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"] + args)
subprocess.run(
["optimum-benchmark", "--config-dir", f"{config_dir}", "--config-name", f"{config_name}"]
+ ["hydra/job_logging=disabled", "hydra/hydra_logging=disabled"]
+ args
)
if __name__ == "__main__":

View File

@ -3,3 +3,4 @@ psutil==6.0.0
psycopg2==2.9.9
torch>=2.4.0
hf_transfer
pandas>=1.5.0

View File

1
benchmark_v2/.gitignore vendored Normal file
View File

@ -0,0 +1 @@
benchmark_results/

138
benchmark_v2/README.md Normal file
View File

@ -0,0 +1,138 @@
# Benchmarking v2
A comprehensive benchmarking framework for transformer models that supports multiple execution modes (eager, compiled, kernelized), detailed performance metrics collection, and structured output format.
## Quick Start
### Running All Benchmarks
```bash
# Run all benchmarks with default settings
python run_benchmarks.py
# Specify output directory
python run_benchmarks.py --output-dir my_results
# Run with custom parameters
python run_benchmarks.py \
--warmup-iterations 5 \
--measurement-iterations 10 \
--num-tokens-to-generate 200
```
### Uploading Results to HuggingFace Dataset
You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:
```bash
# Upload to a public dataset with auto-generated run ID
python run_benchmarks.py --upload-to-hub username/benchmark-results
# Upload with a custom run ID for easy identification
python run_benchmarks.py --upload-to-hub username/benchmark-results --run-id experiment_v1
# Upload with custom HuggingFace token (if not set in environment)
python run_benchmarks.py --upload-to-hub username/benchmark-results --token hf_your_token_here
```
**Dataset Directory Structure:**
```
dataset_name/
├── 2025-01-15/
│ ├── runs/ # Non-scheduled runs (manual, PR, etc.)
│ │ └── 123-1245151651/ # GitHub run number and ID
│ │ └── benchmark_results/
│ │ ├── benchmark_summary_20250115_143022.json
│ │ └── model-name/
│ │ └── model-name_benchmark_20250115_143022.json
│ └── benchmark_results_abc123de/ # Scheduled runs (daily CI)
│ ├── benchmark_summary_20250115_143022.json
│ └── model-name/
│ └── model-name_benchmark_20250115_143022.json
└── 2025-01-16/
└── ...
```
**Authentication for Uploads:**
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash
# Include only specific benchmarks
python run_benchmarks.py --include llama
# Exclude specific benchmarks
python run_benchmarks.py --exclude old_benchmark
## Output Format
Results are saved as JSON files with the following structure:
```json
{
"model_name": "llama_2_7b",
"benchmark_scenarios": [
{
"scenario_name": "eager_variant",
"metadata": {
"timestamp": "2025-01-XX...",
"commit_id": "abc123...",
"hardware_info": {
"gpu_name": "NVIDIA A100",
"gpu_memory_total": 40960,
"cpu_count": 64
},
"config": {
"variant": "eager",
"warmup_iterations": 3,
"measurement_iterations": 5
}
},
"measurements": {
"latency": {
"mean": 2.45,
"median": 2.43,
"std": 0.12,
"min": 2.31,
"max": 2.67,
"p95": 2.61,
"p99": 2.65
},
"time_to_first_token": {
"mean": 0.15,
"std": 0.02
},
"tokens_per_second": {
"mean": 87.3,
"unit": "tokens/sec"
}
},
"gpu_metrics": {
"gpu_utilization_mean": 85.2,
"gpu_memory_used_mean": 12450
}
}
]
}
```
### Debug Mode
```bash
python run_benchmarks.py --log-level DEBUG
```
## Contributing
To add new benchmarks:
1. Create a new file in `benches/`
2. Implement the `ModelBenchmark` interface
3. Add a runner function (`run_<benchmark_name>` or `run_benchmark`)
4. run_benchmarks.py

View File

@ -0,0 +1 @@
# Benchmark implementations directory

View File

@ -0,0 +1,165 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
from typing import Any
import torch
from benchmark_framework import ModelBenchmark
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
class LLaMABenchmark(ModelBenchmark):
"""Simplified LLaMA model benchmark implementation using the ModelBenchmark base class."""
def __init__(self, logger: logging.Logger):
super().__init__(logger)
self._default_prompt = "Why dogs are so cute?" # Custom prompt for LLaMA
def get_scenario_configs(self) -> list[dict[str, Any]]:
"""
Get LLaMA-specific scenario configurations.
Returns:
List of scenario configuration dictionaries
"""
return [
# Eager variants
{"variant": "eager", "compile_mode": None, "use_cache": True, "description": "Eager execution with cache"},
# Compiled variants
{
"variant": "compiled",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Compiled with max autotune",
},
# Kernelized variant (if available)
{
"variant": "kernelized",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Kernelized execution",
},
]
def _is_kernelization_available(self) -> bool:
"""Check if kernelization is available for LLaMA."""
try:
from kernels import Mode, kernelize # noqa: F401
return True
except ImportError:
self.logger.debug("Kernelization not available: kernels module not found")
return False
def get_default_generation_config(self) -> dict[str, Any]:
"""Get LLaMA-specific generation configuration."""
return {
"do_sample": False,
"top_p": 1.0,
"temperature": 1.0,
"repetition_penalty": 1.0,
"max_new_tokens": None, # Will be set per scenario
}
def get_model_init_kwargs(self, config) -> dict[str, Any]:
"""Get LLaMA-specific model initialization kwargs."""
return {
"torch_dtype": getattr(torch, config.torch_dtype),
"attn_implementation": config.attn_implementation,
"use_cache": True,
}
def get_default_torch_dtype(self) -> str:
"""Get default torch dtype for LLaMA."""
return "float16" # LLaMA works well with float16
def get_default_device(self) -> str:
"""Get default device for LLaMA."""
return "cuda" # LLaMA prefers CUDA
def run_llama(logger, output_dir, **kwargs):
"""
Run LLaMA benchmark with the given configuration.
Args:
logger: Logger instance
output_dir: Output directory for results
**kwargs: Additional configuration options
Returns:
Path to output file if successful
"""
from benchmark_framework import BenchmarkRunner
# Extract parameters with defaults
model_id = kwargs.get("model_id", "meta-llama/Llama-2-7b-hf")
warmup_iterations = kwargs.get("warmup_iterations", 3)
measurement_iterations = kwargs.get("measurement_iterations", 5)
num_tokens_to_generate = kwargs.get("num_tokens_to_generate", 100)
include_sdpa_variants = kwargs.get("include_sdpa_variants", True)
device = kwargs.get("device", "cuda")
torch_dtype = kwargs.get("torch_dtype", "float16")
batch_size = kwargs.get("batch_size", 1)
commit_id = kwargs.get("commit_id")
logger.info(f"Starting LLaMA benchmark for model: {model_id}")
logger.info(
f"Configuration: warmup={warmup_iterations}, measurement={measurement_iterations}, tokens={num_tokens_to_generate}"
)
try:
# Create benchmark instance
benchmark = LLaMABenchmark(logger)
# Create scenarios
scenarios = benchmark.create_scenarios(
model_id=model_id,
warmup_iterations=warmup_iterations,
measurement_iterations=measurement_iterations,
num_tokens_to_generate=num_tokens_to_generate,
include_sdpa_variants=include_sdpa_variants,
device=device,
torch_dtype=torch_dtype,
batch_size=batch_size,
)
logger.info(f"Created {len(scenarios)} benchmark scenarios")
# Create runner and execute benchmarks
runner = BenchmarkRunner(logger, output_dir)
results = runner.run_benchmark(benchmark, scenarios, commit_id=commit_id)
if not results:
logger.warning("No successful benchmark results")
return None
# Save results
model_name = model_id.split("/")[-1] # Extract model name from ID
output_file = runner.save_results(model_name, results)
logger.info(f"LLaMA benchmark completed successfully. Results saved to: {output_file}")
return output_file
except Exception as e:
logger.error(f"LLaMA benchmark failed: {e}")
import traceback
logger.debug(traceback.format_exc())
raise

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,7 @@
numpy>=1.21.0
psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
huggingface_hub>=0.16.0

495
benchmark_v2/run_benchmarks.py Executable file
View File

@ -0,0 +1,495 @@
#!/usr/bin/env python3
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Top-level benchmarking script that automatically discovers and runs all benchmarks
in the ./benches directory, organizing outputs into model-specific subfolders.
"""
import argparse
import importlib.util
import json
import logging
import os
import sys
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
def setup_logging(log_level: str = "INFO", enable_file_logging: bool = False) -> logging.Logger:
"""Setup logging configuration."""
numeric_level = getattr(logging, log_level.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError(f"Invalid log level: {log_level}")
handlers = [logging.StreamHandler(sys.stdout)]
if enable_file_logging:
handlers.append(logging.FileHandler(f"benchmark_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"))
logging.basicConfig(
level=numeric_level, format="[%(levelname)s - %(asctime)s] %(name)s: %(message)s", handlers=handlers
)
return logging.getLogger(__name__)
def discover_benchmarks(benches_dir: str) -> list[dict[str, Any]]:
"""
Discover all benchmark modules in the benches directory.
Returns:
List of dictionaries containing benchmark module info
"""
benchmarks = []
benches_path = Path(benches_dir)
if not benches_path.exists():
raise FileNotFoundError(f"Benches directory not found: {benches_dir}")
for py_file in benches_path.glob("*.py"):
if py_file.name.startswith("__"):
continue
module_name = py_file.stem
try:
# Import the module
spec = importlib.util.spec_from_file_location(module_name, py_file)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Check if it has a benchmark runner function
if hasattr(module, f"run_{module_name}"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, f"run_{module_name}"),
}
)
elif hasattr(module, "run_benchmark"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, "run_benchmark"),
}
)
else:
logging.warning(f"No runner function found in {py_file}")
except Exception as e:
logging.error(f"Failed to import {py_file}: {e}")
return benchmarks
def run_single_benchmark(
benchmark_info: dict[str, Any], output_dir: str, logger: logging.Logger, **kwargs
) -> Optional[str]:
"""
Run a single benchmark and return the output file path.
Args:
benchmark_info: Dictionary containing benchmark module info
output_dir: Base output directory
logger: Logger instance
**kwargs: Additional arguments to pass to the benchmark
Returns:
Path to the output file if successful, None otherwise
"""
benchmark_name = benchmark_info["name"]
runner_func = benchmark_info["runner_function"]
logger.info(f"Running benchmark: {benchmark_name}")
try:
# Check function signature to determine what arguments to pass
import inspect
sig = inspect.signature(runner_func)
# Prepare arguments based on function signature
func_kwargs = {"logger": logger, "output_dir": output_dir}
# Add other kwargs if the function accepts them
for param_name in sig.parameters:
if param_name in kwargs:
func_kwargs[param_name] = kwargs[param_name]
# Filter kwargs to only include parameters the function accepts
# If function has **kwargs, include all provided kwargs
has_var_kwargs = any(param.kind == param.VAR_KEYWORD for param in sig.parameters.values())
if has_var_kwargs:
valid_kwargs = {**func_kwargs, **kwargs}
else:
valid_kwargs = {k: v for k, v in func_kwargs.items() if k in sig.parameters}
# Run the benchmark
result = runner_func(**valid_kwargs)
if isinstance(result, str):
# Function returned a file path
return result
else:
logger.info(f"Benchmark {benchmark_name} completed successfully")
return "completed"
except Exception as e:
logger.error(f"Benchmark {benchmark_name} failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return None
def generate_summary_report(
output_dir: str,
benchmark_results: dict[str, Any],
logger: logging.Logger,
benchmark_run_uuid: Optional[str] = None,
) -> str:
"""Generate a summary report of all benchmark runs."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.json")
summary_data = {
"run_metadata": {
"timestamp": datetime.utcnow().isoformat(),
"benchmark_run_uuid": benchmark_run_uuid,
"total_benchmarks": len(benchmark_results),
"successful_benchmarks": len([r for r in benchmark_results.values() if r is not None]),
"failed_benchmarks": len([r for r in benchmark_results.values() if r is None]),
},
"benchmark_results": benchmark_results,
"output_directory": output_dir,
}
with open(summary_file, "w") as f:
json.dump(summary_data, f, indent=2, default=str)
logger.info(f"Summary report saved to: {summary_file}")
return summary_file
def upload_results_to_hf_dataset(
output_dir: str,
summary_file: str,
dataset_name: str,
run_id: Optional[str] = None,
token: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> Optional[str]:
"""
Upload benchmark results to a HuggingFace Dataset.
Based on upload_collated_report() from utils/collated_reports.py
Args:
output_dir: Local output directory containing results
summary_file: Path to the summary file
dataset_name: Name of the HuggingFace dataset to upload to
run_id: Unique run identifier (if None, will generate one)
token: HuggingFace token for authentication (if None, will use environment variables)
logger: Logger instance
Returns:
The run_id used for the upload, None if upload failed
"""
if logger is None:
logger = logging.getLogger(__name__)
import os
from huggingface_hub import HfApi
api = HfApi()
if run_id is None:
github_run_number = os.getenv("GITHUB_RUN_NUMBER")
github_run_id = os.getenv("GITHUB_RUN_ID")
if github_run_number and github_run_id:
run_id = f"{github_run_number}-{github_run_id}"
date_folder = datetime.now().strftime("%Y-%m-%d")
github_event_name = os.getenv("GITHUB_EVENT_NAME")
if github_event_name != "schedule":
# Non-scheduled runs go under a runs subfolder
repo_path = f"{date_folder}/runs/{run_id}/benchmark_results"
else:
# Scheduled runs go directly under the date
repo_path = f"{date_folder}/{run_id}/benchmark_results"
logger.info(f"Uploading benchmark results to dataset '{dataset_name}' at path '{repo_path}'")
try:
# Upload all files in the output directory
from pathlib import Path
output_path = Path(output_dir)
for file_path in output_path.rglob("*"):
if file_path.is_file():
# Calculate relative path from output_dir
relative_path = file_path.relative_to(output_path)
path_in_repo = f"{repo_path}/{relative_path}"
logger.debug(f"Uploading {file_path} to {path_in_repo}")
api.upload_file(
path_or_fileobj=str(file_path),
path_in_repo=path_in_repo,
repo_id=dataset_name,
repo_type="dataset",
token=token,
commit_message=f"Upload benchmark results for run {run_id}",
)
logger.info(
f"Successfully uploaded results to: https://huggingface.co/datasets/{dataset_name}/tree/main/{repo_path}"
)
return run_id
except Exception as upload_error:
logger.error(f"Failed to upload results: {upload_error}")
import traceback
logger.debug(traceback.format_exc())
return None
def main():
"""Main entry point for the benchmarking script."""
# Generate a unique UUID for this benchmark run
benchmark_run_uuid = str(uuid.uuid4())[:8]
parser = argparse.ArgumentParser(
description="Run all benchmarks in the ./benches directory",
epilog="""
Examples:
# Run all available benchmarks
python3 run_benchmarks.py
# Run with specific model and upload to HuggingFace Dataset
python3 run_benchmarks.py --model-id meta-llama/Llama-2-7b-hf --upload-to-hf username/benchmark-results
# Run with custom run ID and upload to HuggingFace Dataset
python3 run_benchmarks.py --run-id experiment_v1 --upload-to-hf org/benchmarks
# Run only specific benchmarks with file logging
python3 run_benchmarks.py --include llama --enable-file-logging
""", # noqa: W293
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--output-dir",
type=str,
default="benchmark_results",
help="Base output directory for benchmark results (default: benchmark_results)",
)
parser.add_argument(
"--benches-dir",
type=str,
default="./benches",
help="Directory containing benchmark implementations (default: ./benches)",
)
parser.add_argument(
"--log-level",
type=str,
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level (default: INFO)",
)
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup-iterations", type=int, default=3, help="Number of warmup iterations (default: 3)")
parser.add_argument(
"--measurement-iterations", type=int, default=5, help="Number of measurement iterations (default: 5)"
)
parser.add_argument(
"--num-tokens-to-generate",
type=int,
default=100,
help="Number of tokens to generate in benchmarks (default: 100)",
)
parser.add_argument("--include", type=str, nargs="*", help="Only run benchmarks matching these names")
parser.add_argument("--exclude", type=str, nargs="*", help="Exclude benchmarks matching these names")
parser.add_argument("--enable-file-logging", action="store_true", help="Enable file logging (disabled by default)")
parser.add_argument(
"--commit-id", type=str, help="Git commit ID for metadata (if not provided, will auto-detect from git)"
)
parser.add_argument(
"--push-to-hub",
type=str,
help="Upload results to HuggingFace Dataset (provide dataset name, e.g., 'username/benchmark-results')",
)
parser.add_argument(
"--run-id", type=str, help="Custom run ID for organizing results (if not provided, will generate a unique ID)"
)
parser.add_argument(
"--token",
type=str,
help="HuggingFace token for dataset uploads (if not provided, will use HF_TOKEN environment variable)",
)
args = parser.parse_args()
# Setup logging
logger = setup_logging(args.log_level, args.enable_file_logging)
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Benches directory: {args.benches_dir}")
# Create output directory
os.makedirs(args.output_dir, exist_ok=True)
try:
# Discover benchmarks
benchmarks = discover_benchmarks(args.benches_dir)
logger.info(f"Discovered {len(benchmarks)} benchmark(s): {[b['name'] for b in benchmarks]}")
if not benchmarks:
logger.warning("No benchmarks found!")
return 1
# Filter benchmarks based on include/exclude
filtered_benchmarks = benchmarks
if args.include:
filtered_benchmarks = [
b for b in filtered_benchmarks if any(pattern in b["name"] for pattern in args.include)
]
logger.info(f"Filtered to include: {[b['name'] for b in filtered_benchmarks]}")
if args.exclude:
filtered_benchmarks = [
b for b in filtered_benchmarks if not any(pattern in b["name"] for pattern in args.exclude)
]
logger.info(f"After exclusion: {[b['name'] for b in filtered_benchmarks]}")
if not filtered_benchmarks:
logger.warning("No benchmarks remaining after filtering!")
return 1
# Prepare common kwargs for benchmarks
benchmark_kwargs = {
"warmup_iterations": args.warmup_iterations,
"measurement_iterations": args.measurement_iterations,
"num_tokens_to_generate": args.num_tokens_to_generate,
}
if args.model_id:
benchmark_kwargs["model_id"] = args.model_id
# Add commit_id if provided
if args.commit_id:
benchmark_kwargs["commit_id"] = args.commit_id
# Run benchmarks
benchmark_results = {}
successful_count = 0
for benchmark_info in filtered_benchmarks:
result = run_single_benchmark(benchmark_info, args.output_dir, logger, **benchmark_kwargs)
benchmark_results[benchmark_info["name"]] = result
if result is not None:
successful_count += 1
# Generate summary report
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger, benchmark_run_uuid)
# Upload results to HuggingFace Dataset if requested
upload_run_id = None
if args.push_to_hub:
logger.info("=" * 60)
logger.info("UPLOADING TO HUGGINGFACE DATASET")
logger.info("=" * 60)
# Use provided run_id or fallback to benchmark run UUID
effective_run_id = args.run_id or benchmark_run_uuid
upload_run_id = upload_results_to_hf_dataset(
output_dir=args.output_dir,
summary_file=summary_file,
dataset_name=args.push_to_hub,
run_id=effective_run_id,
token=args.token,
logger=logger,
)
if upload_run_id:
logger.info(f"Upload completed with run ID: {upload_run_id}")
else:
logger.warning("Upload failed - continuing with local results")
# Final summary
total_benchmarks = len(filtered_benchmarks)
failed_count = total_benchmarks - successful_count
logger.info("=" * 60)
logger.info("BENCHMARK RUN SUMMARY")
logger.info("=" * 60)
logger.info(f"Total benchmarks: {total_benchmarks}")
logger.info(f"Successful: {successful_count}")
logger.info(f"Failed: {failed_count}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Summary report: {summary_file}")
if args.push_to_hub:
if upload_run_id:
logger.info(f"HuggingFace Dataset: {args.push_to_hub}")
logger.info(f"Run ID: {upload_run_id}")
logger.info(
f"View results: https://huggingface.co/datasets/{args.push_to_hub}/tree/main/{datetime.now().strftime('%Y-%m-%d')}/runs/{upload_run_id}"
)
else:
logger.warning("Upload to HuggingFace Dataset failed")
if failed_count > 0:
logger.warning(f"{failed_count} benchmark(s) failed. Check logs for details.")
return 1
else:
logger.info("All benchmarks completed successfully!")
return 0
except Exception as e:
logger.error(f"Benchmark run failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -16,6 +16,7 @@
# by pytest before any tests are run
import doctest
import os
import sys
import warnings
from os.path import abspath, dirname, join
@ -23,12 +24,18 @@ from os.path import abspath, dirname, join
import _pytest
import pytest
from transformers.testing_utils import HfDoctestModule, HfDocTestParser
from transformers.testing_utils import (
HfDoctestModule,
HfDocTestParser,
is_torch_available,
patch_testing_methods_to_collect_info,
patch_torch_compile_force_graph,
)
NOT_DEVICE_TESTS = {
"test_tokenization",
"test_processor",
"test_tokenization_mistral_common",
"test_processing",
"test_beam_constraints",
"test_configuration_utils",
@ -57,11 +64,8 @@ NOT_DEVICE_TESTS = {
"test_load_save_without_tied_weights",
"test_tied_weights_keys",
"test_model_weights_reload_no_missing_tied_weights",
"test_mismatched_shapes_have_properly_initialized_weights",
"test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist",
"test_can_load_ignoring_mismatched_shapes",
"test_model_is_small",
"test_tf_from_pt_safetensors",
"test_flax_from_pt_safetensors",
"ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device
"ModelTester::test_pipeline_",
"/repo_utils/",
@ -83,6 +87,8 @@ def pytest_configure(config):
config.addinivalue_line("markers", "is_staging_test: mark test to run only in the staging environment")
config.addinivalue_line("markers", "accelerate_tests: mark test that require accelerate")
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
def pytest_collection_modifyitems(items):
@ -127,3 +133,18 @@ class CustomOutputChecker(OutputChecker):
doctest.OutputChecker = CustomOutputChecker
_pytest.doctest.DoctestModule = HfDoctestModule
doctest.DocTestParser = HfDocTestParser
if is_torch_available():
import torch
# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
# We set it to `False` for CI. See https://github.com/pytorch/pytorch/issues/157274#issuecomment-3090791615
torch.backends.cudnn.allow_tf32 = False
# patch `torch.compile`: if `TORCH_COMPILE_FORCE_FULLGRAPH=1` (or values considered as true, e.g. yes, y, etc.),
# the patched version will always run with `fullgraph=True`.
patch_torch_compile_force_graph()
if os.environ.get("PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS", "").lower() in ("yes", "true", "on", "y", "1"):
patch_testing_methods_to_collect_info()

View File

@ -1,15 +1,13 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
USER root
ARG REF=main
RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN git lfs install
RUN uv pip uninstall transformers

View File

@ -1,10 +1,10 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake wget xz-utils build-essential g++5 libprotobuf-dev protobuf-compiler git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN wget https://github.com/ku-nlp/jumanpp/releases/download/v2.0.0-rc3/jumanpp-2.0.0-rc3.tar.xz
RUN tar xvf jumanpp-2.0.0-rc3.tar.xz
@ -15,12 +15,20 @@ RUN mv catch.hpp ../libs/
RUN cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
RUN make install -j 10
WORKDIR /
RUN uv pip install --no-cache --upgrade 'torch' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,jieba,spacy,ftfy,rjieba]" unidic unidic-lite
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ja,testing,sentencepiece,spacy,ftfy,rjieba]" unidic unidic-lite
# spacy is not used so not tested. Causes to failures. TODO fix later
RUN python3 -m unidic download
RUN uv run python -m unidic download
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,13 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git
RUN apt-get install -y g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv
RUN uv pip install --no-cache-dir -U pip setuptools albumentations seqeval
RUN uv pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,12 +1,19 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git ffmpeg
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]" seqeval albumentations jiwer
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,17 +1,24 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1-mesa-glx libgl1 g++ tesseract-ocr
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git libgl1 g++ tesseract-ocr git-lfs curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir --no-deps timm accelerate
RUN pip install -U --upgrade-strategy eager --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
RUN uv pip install -U --no-cache-dir pytesseract python-Levenshtein opencv-python nltk
# RUN uv pip install --no-cache-dir natten==0.15.1+torch210cpu -f https://shi-labs.com/natten/wheels
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[testing, vision]" 'scikit-learn' 'torch-stft' 'nose' 'dataset'
# RUN git clone https://github.com/facebookresearch/detectron2.git
# RUN python3 -m pip install --no-cache-dir -e detectron2
RUN uv pip install 'git+https://github.com/facebookresearch/detectron2.git@92ae9f0b92aba5867824b4f12aa06a22a60a45d3' --no-build-isolation
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,10 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,10 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git cmake g++
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,tf-cpu,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3" tensorflow_probability
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,11 +1,18 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git pkg-config openssh-client git ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing]"
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

View File

@ -1,9 +1,9 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y time git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv venv
RUN pip install uv
RUN uv pip install --no-cache-dir -U pip setuptools GitPython "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[ruff]" urllib3
RUN apt-get install -y jq curl && apt-get clean && rm -rf /var/lib/apt/lists/*

View File

@ -1,12 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ pkg-config openssh-client git
RUN apt-get install -y cmake
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --upgrade --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,testing,sentencepiece,tf-speech,vision]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,16 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-deps accelerate
RUN uv pip install --no-cache-dir 'torch' 'torchvision' 'torchaudio' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir "scipy<1.13" "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,audio,sklearn,sentencepiece,vision,testing]"
# RUN pip install --no-cache-dir "scipy<1.13" "transformers[flax,testing,sentencepiece,flax-speech,vision]"
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -1,11 +1,17 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs ffmpeg
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git-lfs ffmpeg curl
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN pip --no-cache-dir install uv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' 'torchcodec' --index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-deps timm accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir librosa "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[sklearn,sentencepiece,vision,testing,tiktoken,num2words,video]"
# fetch test data and hub objects within CircleCI docker images to reduce even more connections
# we don't need a full clone of `transformers` to run `fetch_hub_objects_for_ci.py`
# the data are downloaded to the directory `/test_data` and during CircleCI's CI runtime, we need to move them to the root of `transformers`
RUN mkdir test_data && cd test_data && curl -O https://raw.githubusercontent.com/huggingface/transformers/${REF}/utils/fetch_hub_objects_for_ci.py && python3 fetch_hub_objects_for_ci.py
RUN uv pip uninstall transformers

View File

@ -1,19 +0,0 @@
FROM python:3.9-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
RUN echo ${REF}
USER root
RUN apt-get update && apt-get install -y --no-install-recommends libsndfile1-dev espeak-ng time git g++ cmake pkg-config openssh-client git git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip --no-cache-dir install uv && uv venv && uv pip install --no-cache-dir -U pip setuptools
RUN uv pip install --no-cache-dir --no-deps accelerate --extra-index-url https://download.pytorch.org/whl/cpu
RUN uv pip install --no-cache-dir 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
RUN git lfs install
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[tf-cpu,sklearn,sentencepiece,vision,testing]"
RUN uv pip install --no-cache-dir "protobuf==3.20.3" librosa
RUN uv pip uninstall transformers
RUN apt-get clean && rm -rf /var/lib/apt/lists/* && apt-get autoremove && apt-get autoclean

View File

@ -9,7 +9,7 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.7.1'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# Disable kernel mapping for now until all tests pass
@ -26,11 +26,14 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA && python3 -m pip uninstall -y tensorflow tensorflow_text tensorflow_probability
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir -U timm
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN python3 -m pip install --no-cache-dir pytesseract
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -39,6 +42,8 @@ RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/pef
# For bettertransformer
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
# For kernels
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
# For video model testing
RUN python3 -m pip install --no-cache-dir av
@ -49,15 +54,14 @@ RUN python3 -m pip install --no-cache-dir bitsandbytes
# Some tests require quanto
RUN python3 -m pip install --no-cache-dir quanto
# After using A10 as CI runner, let's run FA2 tests
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
# TODO (ydshieh): check this again
# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests
# (`deformable_detr`, `rwkv`, `mra`)
RUN python3 -m pip uninstall -y ninja
# For `dinat` model
# The `XXX` part in `torchXXX` needs to match `PYTORCH` (to some extent)
# pin `0.17.4` otherwise `cannot import name 'natten2dav' from 'natten.functional'`
RUN python3 -m pip install --no-cache-dir natten==0.17.4+torch250cu121 -f https://shi-labs.com/natten/wheels
# For `nougat` tokenizer
RUN python3 -m pip install --no-cache-dir python-Levenshtein

View File

@ -15,8 +15,8 @@ RUN apt update && \
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
RUN git clone https://github.com/NVIDIA/apex
RUN cd apex && \

View File

@ -0,0 +1,71 @@
FROM intel/deep-learning-essentials:2025.1.3-0-devel-ubuntu24.04 AS base
LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VERSION=3.12
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update
RUN apt-get update && \
apt-get -y install \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
vim \
numactl \
gnupg2 \
gpg-agent \
python3-dev \
python3-opencv \
unzip \
ffmpeg \
tesseract-ocr \
espeak-ng \
wget \
ncurses-term \
google-perftools \
libjemalloc-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Use virtual env because Ubuntu:24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VERSION} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install --upgrade pip wheel
RUN pip install torch torchvision torchaudio torchcodec --index-url https://download.pytorch.org/whl/cpu --no-cache-dir
RUN pip install av pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sentence_transformers sacremoses nltk rouge_score librosa soundfile mpi4py pytorch_msssim
RUN pip install onnx optimum onnxruntime
RUN pip install autoawq
RUN pip install gptqmodel --no-build-isolation
RUN pip install -U datasets timm transformers accelerate peft diffusers opencv-python kenlm evaluate
RUN pip install -U intel-openmp
# install bitsandbytes
RUN git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ && \
cmake -DCOMPUTE_BACKEND=cpu -S . && make && pip install . && cd ../
# CPU don't need triton
RUN pip uninstall triton -y
ENV LD_PRELOAD=${LD_PRELOAD}:/opt/venv/lib/libiomp5.so:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
ENV KMP_AFFINITY=granularity=fine,compact,1,0
RUN touch /entrypoint.sh
RUN chmod +x /entrypoint.sh
RUN echo "#!/bin/bash" >> /entrypoint.sh
RUN echo "/bin/bash" >> /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -1,59 +0,0 @@
ARG BASE_DOCKER_IMAGE
FROM $BASE_DOCKER_IMAGE
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs libaio-dev
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
ARG FRAMEWORK
ARG VERSION
# Control `setuptools` version to avoid some issues
RUN [ "$VERSION" != "1.10" ] && python3 -m pip install -U setuptools || python3 -m pip install -U "setuptools<=59.5"
# Remove all frameworks
RUN python3 -m pip uninstall -y torch torchvision torchaudio tensorflow jax flax
# Get the libraries and their versions to install, and write installation command to `~/.profile`.
RUN python3 ./transformers/utils/past_ci_versions.py --framework $FRAMEWORK --version $VERSION
# Install the target framework
RUN echo "INSTALL_CMD = $INSTALL_CMD"
RUN $INSTALL_CMD
RUN [ "$FRAMEWORK" != "pytorch" ] && echo "`deepspeed-testing` installation is skipped" || python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Remove `accelerate`: it requires `torch`, and this causes import issues for TF-only testing
# We will install `accelerate@main` in Past CI workflow file
RUN python3 -m pip uninstall -y accelerate
# Uninstall `torch-tensorrt` and `apex` shipped with the base image
RUN python3 -m pip uninstall -y torch-tensorrt apex
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -1,11 +1,8 @@
FROM rocm/pytorch:rocm6.4_ubuntu22.04_py3.10_pytorch_release_2.6.0
FROM rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
ARG TORCH_VISION='0.21.0'
ARG TORCH_AUDIO='2.6.0'
RUN apt update && \
apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip python3-dev ffmpeg git-lfs && \
apt clean && \
@ -23,10 +20,8 @@ WORKDIR /
ADD https://api.github.com/repos/huggingface/transformers/git/refs/heads/main version.json
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video]
RUN python3 -m pip uninstall -y tensorflow flax
# Install transformers
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video,audio]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
@ -37,3 +32,13 @@ RUN python3 -m pip uninstall py3nvml pynvml nvidia-ml-py apex -y
# `kernels` may causes many failing tests
RUN python3 -m pip uninstall -y kernels
# On ROCm, torchcodec is required to decode audio files and 0.4 or 0.6 fails
RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"
# Install flash attention from source. Tested with commit 6387433156558135a998d5568a9d74c1778666d8
RUN git clone https://github.com/ROCm/flash-attention/ -b tridao && \
cd flash-attention && \
GPU_ARCHS="gfx942" python setup.py install
RUN python3 -m pip install --no-cache-dir einops

View File

@ -4,7 +4,7 @@ LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
ARG PYTORCH='2.7.1'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'

View File

@ -11,7 +11,7 @@ ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
# If set to nothing, will install the latest version
ARG PYTORCH='2.7.1'
ARG PYTORCH='2.8.0'
ARG TORCH_VISION=''
ARG TORCH_AUDIO=''
# Example: `cu102`, `cu113`, etc.
@ -25,8 +25,6 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y tensorflow flax
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -1,4 +1,4 @@
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -9,9 +9,9 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.6.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu121'
ARG CUDA='cu126'
# Disable kernel mapping for quantization tests
ENV DISABLE_KERNEL_MAPPING=1
@ -30,31 +30,20 @@ RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio tor
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
# needed in bnb and awq
RUN python3 -m pip install --no-cache-dir einops
# Add bitsandbytes for mixed int8 testing
RUN python3 -m pip install --no-cache-dir bitsandbytes
# Add gptqmodel for gtpq quantization testing, installed from source for pytorch==2.6.0 compatibility
RUN python3 -m pip install lm_eval
RUN git clone https://github.com/ModelCloud/GPTQModel.git && cd GPTQModel && pip install -v . --no-build-isolation
# Add optimum for gptq quantization testing
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
# Add PEFT
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/peft@main#egg=peft
# Add aqlm for quantization testing
RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# needed in bnb and awq
RUN python3 -m pip install --no-cache-dir einops
# Add vptq for quantization testing
RUN pip install vptq
# Add bitsandbytes
RUN python3 -m pip install --no-cache-dir bitsandbytes
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# # Add gptqmodel
# RUN python3 -m pip install --no-cache-dir gptqmodel
# Add hqq for quantization testing
RUN python3 -m pip install --no-cache-dir hqq
@ -63,21 +52,11 @@ RUN python3 -m pip install --no-cache-dir hqq
RUN python3 -m pip install --no-cache-dir gguf
# Add autoawq for quantization testing
# New release v0.2.8
RUN python3 -m pip install --no-cache-dir autoawq[kernels]
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir optimum-quanto
# Add eetq for quantization testing
RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add compressed-tensors for quantization testing
RUN python3 -m pip install --no-cache-dir compressed-tensors
@ -85,7 +64,10 @@ RUN python3 -m pip install --no-cache-dir compressed-tensors
RUN python3 -m pip install --no-cache-dir amd-quark
# Add AutoRound for quantization testing
RUN python3 -m pip install --no-cache-dir "auto-round>=0.5.0"
RUN python3 -m pip install --no-cache-dir auto-round
# Add torchao for quantization testing
RUN python3 -m pip install --no-cache-dir torchao
# Add transformers in editable mode
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch]
@ -99,3 +81,28 @@ RUN python3 -m pip uninstall -y flash-attn
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
# Low usage or incompatible lib, will enable later on
# # Add aqlm for quantization testing
# RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# # Add vptq for quantization testing
# RUN pip install vptq
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# # Add eetq for quantization testing
# RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add fp-quant for quantization testing
# Requires py3.11 but our CI runs on 3.9
# RUN python3 -m pip install --no-cache-dir "fp-quant>=0.1.6"

View File

@ -1,25 +0,0 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-tensorflow,testing]
# If set to nothing, will install the latest version
ARG TENSORFLOW='2.13'
RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSION='tensorflow'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y torch flax
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir -U "tensorflow_probability<0.22"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -20,22 +20,21 @@ To generate the documentation, you first have to build it. Several packages are
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
Then you need to install our special tool that builds the documentation:
```bash
pip install git+https://github.com/huggingface/doc-builder
```
---
**NOTE**
You only need to generate the documentation to inspect it locally (if you're planning changes and want to
check how they look before committing for instance). You don't have to commit the built documentation.
---
> [!NOTE]
> You only need to generate the documentation to inspect it locally (if you're planning changes and want to
> check how they look before committing for instance). You don't have to commit the built documentation.
## Building the documentation
@ -72,12 +71,8 @@ doc-builder preview transformers docs/source/en/
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
> [!NOTE]
> The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
## Adding a new element to the navigation bar
@ -164,6 +159,9 @@ These classes should be added using our Markdown syntax. Usually as follows:
[[autodoc]] XXXConfig
```
> [!IMPORTANT]
> Always add a blank line after `[[autodoc]]` to ensure it passes the CI/CD checks.
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:

View File

@ -50,7 +50,7 @@ Begin translating the text!
1. Start with the `_toctree.yml` file that corresponds to your documentation chapter. This file is essential for rendering the table of contents on the website.
- If the `_toctree.yml` file doesnt exist for your language, create one by copying the English version and removing unrelated sections.
- If the `_toctree.yml` file doesn't exist for your language, create one by copying the English version and removing unrelated sections.
- Ensure it is placed in the `docs/source/LANG-ID/` directory.
Heres an example structure for the `_toctree.yml` file:

View File

@ -123,8 +123,6 @@
title: تشغيل التدريب على Amazon SageMaker
- local: serialization
title: التصدير إلى ONNX
- local: tflite
title: التصدير إلى TFLite
- local: torchscript
title: التصدير إلى TorchScript
- local: notebooks
@ -184,8 +182,6 @@
# title: التدريب الفعال على وحدة المعالجة المركزية (CPU)
# - local: perf_train_cpu_many
# title: التدريب الموزع لوحدة المعالجة المركزية (CPU)
# - local: perf_train_tpu_tf
# title: التدريب على (TPU) باستخدام TensorFlow
# - local: perf_train_special
# title: تدريب PyTorch على Apple silicon
# - local: perf_hardware
@ -203,8 +199,6 @@
# title: إنشاء نموذج كبير
# - local: debugging
# title: تصحيح الأخطاء البرمجية
# - local: tf_xla
# title: تكامل XLA لنماذج TensorFlow
# - local: perf_torch_compile
# title: تحسين الاستدلال باستخدام `torch.compile()`
# title: الأداء وقابلية التوسع
@ -260,8 +254,6 @@
# title: التكوين
# - local: main_classes/data_collator
# title: مجمع البيانات
# - local: main_classes/keras_callbacks
# title: استدعاءات Keras
# - local: main_classes/logging
# title: التسجيل
# - local: main_classes/model

View File

@ -115,8 +115,6 @@
## النموذج التلقائي (AutoModel)
<frameworkcontent>
<pt>
تسمح لك فئات `AutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -133,35 +131,11 @@
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
<Tip warning={true}>
بالنسبة لنماذج PyTorch، تستخدم طريقة `from_pretrained()` `torch.load()` التي تستخدم داخليًا `pickle` والتي يُعرف أنها غير آمنة. بشكل عام، لا تقم مطلقًا بتحميل نموذج قد يكون مصدره مصدرًا غير موثوق به، أو قد يكون تم العبث به. يتم تخفيف هذا الخطر الأمني جزئيًا للنماذج العامة المستضافة على Hub Hugging Face، والتي يتم [فحصها بحثًا عن البرامج الضارة](https://huggingface.co/docs/hub/security-malware) في كل ارتكاب. راجع [توثيق Hub](https://huggingface.co/docs/hub/security) للحصول على أفضل الممارسات مثل [التحقق من التوقيع](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) باستخدام GPG.
لا تتأثر نقاط تفتيش TensorFlow و Flax، ويمكن تحميلها داخل بنيات PyTorch باستخدام `from_tf` و `from_flax` kwargs لطريقة `from_pretrained` للتحايل على هذه المشكلة.
</Tip>
> [!WARNING]
> بالنسبة لنماذج PyTorch، تستخدم طريقة `from_pretrained()` `torch.load()` التي تستخدم داخليًا `pickle` والتي يُعرف أنها غير آمنة. بشكل عام، لا تقم مطلقًا بتحميل نموذج قد يكون مصدره مصدرًا غير موثوق به، أو قد يكون تم العبث به. يتم تخفيف هذا الخطر الأمني جزئيًا للنماذج العامة المستضافة على Hub Hugging Face، والتي يتم [فحصها بحثًا عن البرامج الضارة](https://huggingface.co/docs/hub/security-malware) في كل ارتكاب. راجع [توثيق Hub](https://huggingface.co/docs/hub/security) للحصول على أفضل الممارسات مثل [التحقق من التوقيع](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) باستخدام GPG.
>
> لا تتأثر نقاط تفتيش TensorFlow و Flax، ويمكن تحميلها داخل بنيات PyTorch باستخدام `from_tf` و `from_flax` kwargs لطريقة `from_pretrained` للتحايل على هذه المشكلة.
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</pt>
<tf>
أخيرًا، تسمح لك فئات `TFAutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام نفس نقطة التفتيش لتحميل بنية لمهمة مختلفة:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `TFAutoModelFor` لتحميل نسخ لنماذج مُدربة مسبقًا. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، ستتعرف على كيفية استخدام المُجزّئ اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</tf>
</frameworkcontent>

View File

@ -230,11 +230,10 @@ The sun.</s>
من هنا، استمر في التدريب كما تفعل مع مهمة نمذجة اللغة القياسية، باستخدام عمود `formatted_chat`.
<Tip>
بشكل افتراضي ، تضيف بعض *tokenizers* رموزًا خاصة مثل `<bos>` و `<eos>` إلى النص الذي تقوم بتقسيمه إلى رموز. يجب أن تتضمن قوالب المحادثة بالفعل جميع الرموز الخاصة التي تحتاجها ، وبالتالي فإن الرموز الخاصة الإضافية ستكون غالبًا غير صحيحة أو مُكررة ، مما سيؤثر سلبًا على أداء النموذج .
لذلك ، إذا قمت بتنسيق النص باستخدام `apply_chat_template(tokenize=False)` ، فيجب تعيين المعامل `add_special_tokens=False` عندما تقوم بتقسيم ذلك النص إلى رموز لاحقًا . إذا كنت تستخدم `apply_chat_template(tokenize=True)` ، فلن تحتاج إلى القلق بشأن ذلك !
</Tip>
> [!TIP]
> بشكل افتراضي ، تضيف بعض *tokenizers* رموزًا خاصة مثل `<bos>` و `<eos>` إلى النص الذي تقوم بتقسيمه إلى رموز. يجب أن تتضمن قوالب المحادثة بالفعل جميع الرموز الخاصة التي تحتاجها ، وبالتالي فإن الرموز الخاصة الإضافية ستكون غالبًا غير صحيحة أو مُكررة ، مما سيؤثر سلبًا على أداء النموذج .
>
> لذلك ، إذا قمت بتنسيق النص باستخدام `apply_chat_template(tokenize=False)` ، فيجب تعيين المعامل `add_special_tokens=False` عندما تقوم بتقسيم ذلك النص إلى رموز لاحقًا . إذا كنت تستخدم `apply_chat_template(tokenize=True)` ، فلن تحتاج إلى القلق بشأن ذلك !
## متقدّم: مدخلات إضافية لِقوالب الدردشة
@ -304,7 +303,7 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
model = AutoModelForCausalLM.from_pretrained(checkpoint, dtype=torch.bfloat16, device_map="auto")
```python
messages = [
@ -361,9 +360,8 @@ print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
The current temperature in Paris, France is 22.0 ° Celsius.<|im_end|>
```
<Tip>
لا تستخدم جميع نماذج استخدام الأدوات جميع ميزات استدعاء الأدوات الموضحة أعلاه. يستخدم البعض معرفات استدعاء الأدوات، بينما يستخدم البعض الآخر ببساطة اسم الدالة ويقارن استدعاءات الأدوات بالنتائج باستخدام الترتيب، وهناك عدة نماذج لا تستخدم أيًا منهما ولا تصدر سوى استدعاء أداة واحد في كل مرة لتجنب الارتباك. إذا كنت تريد أن يكون رمزك متوافقًا مع أكبر عدد ممكن من النماذج، فإننا نوصي بهيكلة استدعاءات الأدوات الخاصة بك كما هو موضح هنا، وإعادة نتائج الأدوات بالترتيب الذي أصدرها النموذج. يجب أن تتعامل قوالب الدردشة على كل نموذج مع الباقي.
</Tip>
> [!TIP]
> لا تستخدم جميع نماذج استخدام الأدوات جميع ميزات استدعاء الأدوات الموضحة أعلاه. يستخدم البعض معرفات استدعاء الأدوات، بينما يستخدم البعض الآخر ببساطة اسم الدالة ويقارن استدعاءات الأدوات بالنتائج باستخدام الترتيب، وهناك عدة نماذج لا تستخدم أيًا منهما ولا تصدر سوى استدعاء أداة واحد في كل مرة لتجنب الارتباك. إذا كنت تريد أن يكون رمزك متوافقًا مع أكبر عدد ممكن من النماذج، فإننا نوصي بهيكلة استدعاءات الأدوات الخاصة بك كما هو موضح هنا، وإعادة نتائج الأدوات بالترتيب الذي أصدرها النموذج. يجب أن تتعامل قوالب الدردشة على كل نموذج مع الباقي.
### فهم مخططات الأدوات
@ -514,9 +512,8 @@ print(gen_text)
إن مُدخل documents للتوليد القائم على الاسترجاع غير مدعوم على نطاق واسع، والعديد من النماذج لديها قوالب دردشة تتجاهل هذا المُدخل ببساطة.
للتحقق مما إذا كان النموذج يدعم مُدخل `documents`، يمكنك قراءة بطاقة النموذج الخاصة به، أو `print(tokenizer.chat_template)` لمعرفة ما إذا كان مفتاح `documents` مستخدمًا في أي مكان.
<Tip>
ومع ذلك، فإن أحد فئات النماذج التي تدعمه هي [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) و [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-pluse-08-2024) من Cohere، من خلال قالب الدردشة rag الخاص بهم. يمكنك رؤية أمثلة إضافية على التوليد باستخدام هذه الميزة في بطاقات النموذج الخاصة بهم.
</Tip>
> [!TIP]
> ومع ذلك، فإن أحد فئات النماذج التي تدعمه هي [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) و [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-pluse-08-2024) من Cohere، من خلال قالب الدردشة rag الخاص بهم. يمكنك رؤية أمثلة إضافية على التوليد باستخدام هذه الميزة في بطاقات النموذج الخاصة بهم.
## متقدم: كيف تعمل قوالب الدردشة؟
يتم تخزين قالب الدردشة للنموذج في الخاصية `tokenizer.chat_template`. إذا لم يتم تعيين قالب دردشة، فسيتم استخدام القالب الافتراضي لفئة النموذج هذه بدلاً من ذلك. دعونا نلقي نظرة على قالب دردشة `Zephyr`، ولكن لاحظ أن هذا القالب مُبسّط قليلاً عن القالب الفعلي!
@ -587,9 +584,8 @@ tokenizer.push_to_hub("model_name") # تحميل القالب الجديد إل
يتم استدعاء الدالة [`~PreTrainedTokenizer.apply_chat_template`] الذي نستخدم قالب الدردشة الخاص بك بواسطة فئة [`TextGenerationPipeline`] لذلك بمجرد تعيين قالب الدردشة الصحيح، سيصبح نموذجك متوافقًا تلقائيًا مع [`TextGenerationPipeline`].
<Tip>
إذا كنت تُجري ضبطًا دقيقًا لنموذج للدردشة، بالإضافة إلى تعيين قالب دردشة، فربما يجب عليك إضافة أي رموز تحكم دردشة جديدة كرموز خاصة في المجزىء اللغوي. لا يتم تقسيم الرموز الخاصة أبدًا، مما يضمن معالجة رموز التحكم الخاصة بك دائمًا كرموز فردية بدلاً من تجزئتها إلى أجزاء. يجب عليك أيضًا تعيين خاصية `eos_token` للمجزىء اللغوي إلى الرمز الذي يُشير إلى نهاية توليدات المساعد في قالبك. سيضمن هذا أن أدوات توليد النصوص يمكنها تحديد وقت إيقاف توليد النص بشكل صحيح.
</Tip>
> [!TIP]
> إذا كنت تُجري ضبطًا دقيقًا لنموذج للدردشة، بالإضافة إلى تعيين قالب دردشة، فربما يجب عليك إضافة أي رموز تحكم دردشة جديدة كرموز خاصة في المجزىء اللغوي. لا يتم تقسيم الرموز الخاصة أبدًا، مما يضمن معالجة رموز التحكم الخاصة بك دائمًا كرموز فردية بدلاً من تجزئتها إلى أجزاء. يجب عليك أيضًا تعيين خاصية `eos_token` للمجزىء اللغوي إلى الرمز الذي يُشير إلى نهاية توليدات المساعد في قالبك. سيضمن هذا أن أدوات توليد النصوص يمكنها تحديد وقت إيقاف توليد النص بشكل صحيح.
### لماذا تحتوي بعض النماذج على قوالب متعددة؟
تستخدم بعض النماذج قوالب مختلفة لحالات استخدام مختلفة. على سبيل المثال، قد تستخدم قالبًا واحدًا للدردشة العادية وآخر لاستخدام الأدوات، أو التوليد القائم على الاسترجاع. في هذه الحالات، تكون `tokenizer.chat_template` قاموسًا. يمكن أن يتسبب هذا في بعض الارتباك، وحيثما أمكن، نوصي باستخدام قالب واحد لجميع حالات الاستخدام. يمكنك استخدام عبارات Jinja مثل `if tools is defined` وتعريفات `{% macro %}` لتضمين مسارات تعليمات برمجية متعددة بسهولة في قالب واحد.
@ -640,10 +636,8 @@ I'm doing great!<|im_end|>
## متقدم: نصائح لكتابة القوالب
<Tip>
أسهل طريقة للبدء في كتابة قوالب Jinja هي إلقاء نظرة على بعض القوالب الموجودة. يمكنك استخدام `print(tokenizer.chat_template)` لأي نموذج دردشة لمعرفة القالب الذي يستخدمه. بشكل عام، تحتوي النماذج التي تدعم استخدام الأدوات على قوالب أكثر تعقيدًا بكثير من النماذج الأخرى - لذلك عندما تبدأ للتو، فمن المحتمل أنها مثال سيئ للتعلم منه! يمكنك أيضًا إلقاء نظرة على [وثائق Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) للحصول على تفاصيل حول تنسيق Jinja العام وتركيبه.
</Tip>
> [!TIP]
> أسهل طريقة للبدء في كتابة قوالب Jinja هي إلقاء نظرة على بعض القوالب الموجودة. يمكنك استخدام `print(tokenizer.chat_template)` لأي نموذج دردشة لمعرفة القالب الذي يستخدمه. بشكل عام، تحتوي النماذج التي تدعم استخدام الأدوات على قوالب أكثر تعقيدًا بكثير من النماذج الأخرى - لذلك عندما تبدأ للتو، فمن المحتمل أنها مثال سيئ للتعلم منه! يمكنك أيضًا إلقاء نظرة على [وثائق Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) للحصول على تفاصيل حول تنسيق Jinja العام وتركيبه.
تُطابق قوالب Jinja في `transformers` قوالب Jinja في أي مكان آخر. الشيء الرئيسي الذي يجب معرفته هو أن سجل الدردشة سيكون متاحًا داخل قالبك كمتغير يسمى `messages`. ستتمكن من الوصول إلى `messages` في قالبك تمامًا كما يمكنك في Python، مما يعني أنه يمكنك التكرار خلاله باستخدام `{% for message in messages %}` أو الوصول إلى رسائل فردية باستخدام `{{ messages[0] }}`، على سبيل المثال.
@ -680,11 +674,8 @@ I'm doing great!<|im_end|>
- **الرموز الخاصة** مثل `bos_token` و `eos_token`. يتم استخراجها من `tokenizer.special_tokens_map`. ستختلف الرموز الدقيقة المتاحة داخل كل قالب اعتمادًا على المجزىء اللغوي الأصلي.
<Tip>
يمكنك في الواقع تمرير أي `kwarg` إلى `apply_chat_template`، وستكون متاحة داخل القالب كمتغير. بشكل عام، نوصي بمحاولة الالتزام بالمتغيرات الأساسية المذكورة أعلاه، لأن ذلك سيجعل نموذجك أكثر صعوبة في الاستخدام إذا كان على المستخدمين كتابة تعليمات برمجية مخصصة لتمرير `kwargs` خاصة بالنموذج. ومع ذلك، فنحن نُدرك أن هذا المجال يتحرك بسرعة، لذلك إذا كانت لديك حالة استخدام جديدة لا تتناسب مع واجهة برمجة التطبيقات الأساسية، فلا تتردد في استخدام `kwarg` معامل جديد لها! إذا أصبح `kwarg` المعامل الجديد شائعًا، فقد نقوم بترقيته إلى واجهة برمجة التطبيقات الأساسية وإنشاء وتوثيق الخاص به.
</Tip>
> [!TIP]
> يمكنك في الواقع تمرير أي `kwarg` إلى `apply_chat_template`، وستكون متاحة داخل القالب كمتغير. بشكل عام، نوصي بمحاولة الالتزام بالمتغيرات الأساسية المذكورة أعلاه، لأن ذلك سيجعل نموذجك أكثر صعوبة في الاستخدام إذا كان على المستخدمين كتابة تعليمات برمجية مخصصة لتمرير `kwargs` خاصة بالنموذج. ومع ذلك، فنحن نُدرك أن هذا المجال يتحرك بسرعة، لذلك إذا كانت لديك حالة استخدام جديدة لا تتناسب مع واجهة برمجة التطبيقات الأساسية، فلا تتردد في استخدام `kwarg` معامل جديد لها! إذا أصبح `kwarg` المعامل الجديد شائعًا، فقد نقوم بترقيته إلى واجهة برمجة التطبيقات الأساسية وإنشاء وتوثيق الخاص به.
### دوال قابلة للاستدعاء

View File

@ -25,7 +25,7 @@ chat = [
import torch
from transformers import pipeline
pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", dtype=torch.bfloat16, device_map="auto")
response = pipe(chat, max_new_tokens=512)
print(response[0]['generated_text'][-1]['content'])
```
@ -126,7 +126,7 @@ chat = [
]
# 1: تحميل النموذج والمحلل
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", torch_dtype=torch.bfloat16)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
# 2: تطبيق قالب الدردشة
@ -164,7 +164,7 @@ print("Decoded output:\n", decoded_output)
### اعتبارات الذاكرة
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "torch_dtype" كما فعلنا أعلاه.
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "dtype" كما فعلنا أعلاه.
ومن الممكن أيضًا النزول إلى أقل من 16 بت باستخدام "التكميم"، وهي طريقة لضغط أوزان النموذج بطريقة تفقد بعض المعلومات. يسمح هذا بضغط كل معلمة إلى 8 بتات أو 4 بتات أو حتى أقل. لاحظ أنه، خاصة في 4 بتات، قد تتأثر جودة ناتج النموذج سلبًا، ولكن غالبًا ما يكون هذا مقايضة تستحق القيام بها لتناسب نموذج محادثة أكبر وأكثر قدرة في الذاكرة. دعنا كيف يمكننا تطبيق ذلك باستخدام مكتبة `bitsandbytes`:
@ -188,11 +188,8 @@ pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", device
### اعتبارات الأداء
<Tip>
للحصول على دليل أكثر شمولاً حول أداء نموذج اللغة والتحسين، راجع [تحسين استدلال LLM](./llm_optims).
</Tip>
> [!TIP]
> للحصول على دليل أكثر شمولاً حول أداء نموذج اللغة والتحسين، راجع [تحسين استدلال LLM](./llm_optims).
كقاعدة عامة، ستكون نماذج المحادثة الأكبر حجمًا أبطأ في توليد النصوص بالإضافة إلى احتياجها لذاكرة أكبرة. من الممكن أن تكون أكثر تحديدًا بشأن هذا: إن توليد النص من نموذج دردشة أمر غير عادي في أنه يخضع لقيود **سعة الذاكرة** بدلاً من قوة الحوسبة، لأن كل معلمة نشطة يجب قراءتها من الذاكرة لكل رمز ينشئه النموذج. وهذا يعني أن عدد الرموز في الثانية التي يمكنك توليدها من نموذج الدردشة يتناسب بشكل عام مع إجمالي حجم الذاكرة التي بوجد بها ا، مقسومًا على حجم النموذج.

View File

@ -72,17 +72,14 @@ DistilBertConfig {
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
```
<Tip>
يمكنك أيضًا حفظ ملف التكوين كقاموس أو حتى كفرق بين خصائص التكوين المُعدّلة والخصائص التكوين الافتراضية! راجع وثائق [التكوين](main_classes/configuration) لمزيد من التفاصيل.
</Tip>
> [!TIP]
> يمكنك أيضًا حفظ ملف التكوين كقاموس أو حتى كفرق بين خصائص التكوين المُعدّلة والخصائص التكوين الافتراضية! راجع وثائق [التكوين](main_classes/configuration) لمزيد من التفاصيل.
## النموذج
الخطوة التالية هي إنشاء [نموذج](main_classes/models). النموذج - ويُشار إليه أحيانًا باسم البنية - يُحدد وظيفة كل طبقة والعمليات الحسابية المُنفذة. تُستخدم خصائص مثل `num_hidden_layers` من التكوين لتحديد هذه البنية. تشترك جميع النماذج في فئة أساسية واحدة هي [`PreTrainedModel`] وبعض الوظائف المُشتركة مثل غيير حجم مُدخلات الكلمات وتقليص رؤوس آلية الانتباه الذاتي. بالإضافة إلى ذلك، فإن جميع النماذج هي فئات فرعية إما من [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)، [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) أو [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . هذا يعني النماذج متوافقة مع كل استخدام لإطار العمل الخاص بها.
<frameworkcontent>
<pt>
قم بتحميل خصائص التكوين المخصصة الخاصة بك في النموذج:
```py
@ -105,39 +102,11 @@ DistilBertConfig {
```py
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</pt>
<tf>
قم بتحميل خصائص التكوين المُخصصة الخاصة بك في النموذج:
```py
>>> from transformers import TFDistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> tf_model = TFDistilBertModel(my_config)
```
هذا ينشئ نموذجًا بقيم عشوائية بدلاً من الأوزان المُدربة مسبقًا. لن يكون هذا النموذج مفيدًا حتى يتم تدريبه. تُعد عملية التدريب مكلفة وتستغرق وقتًا طويلاً. من الأفضل بشكل عام استخدام نموذج مُدرب مسبقًا للحصول على نتائج أفضل بشكل أسرع، مع استخدام جزء بسيط فقط من الموارد المطلوبة للتدريب.
قم بإنشاء نموذج مُدرب مسبقًا باستخدام [`~TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
عندما تقوم بتحميل الأوزان المُدربة مسبقًا،يتم تحميل إعدادات النموذج الافتراضي تلقائيًا إذا كان النموذج من مكتبة 🤗 Transformers. ومع ذلك، يمكنك أيضًا استبدال - بعض أو كل - إعدادات النموذج الافتراضية بإعداداتك الخاصة:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</tf>
</frameworkcontent>
### رؤوس النموذج
في هذه المرحلة، لديك نموذج DistilBERT الأساسي الذي يخرج *حالات الكامنة*. تُمرَّر هذه الحالات الكامنة كمدخلات لرأس النموذج لإنتاج المخرجات النهائية. توفر مكتبة 🤗 Transformers رأس نموذج مختلف لكل مهمة طالما أن النموذج يدعم المهمة (أي لا يمكنك استخدام DistilBERT لمهمة تسلسل إلى تسلسل مثل الترجمة).
<frameworkcontent>
<pt>
على سبيل المثال، [`DistilBertForSequenceClassification`] هو نموذج DistilBERT الأساس مزودًا برأس تصنيف تسلسلي. يُشكّل رأس التصنيف التسلسلي طبقة خطية فوق المخرجات المجمعة.
```py
@ -153,25 +122,6 @@ DistilBertConfig {
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</pt>
<tf>
على سبيل المثال، [`TFDistilBertForSequenceClassification`] هو نموذج DistilBERT الأساسي برأس تصنيف تسلسل. رأس التصنيف التسلسلي هو طبقة خطية أعلى المخرجات المجمعة.
```py
>>> from transformers import TFDistilBertForSequenceClassification
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام هذا نقطة التحقق لمهمة أخرى عن طريق التبديل إلى رأس نموذج مختلف. لمهمة الإجابة على الأسئلة، ستستخدم رأس النموذج [`TFDistilBertForQuestionAnswering`]. رأس الإجابة على الأسئلة مشابه لرأس التصنيف التسلسلي باستثناء أنه طبقة خطية أعلى حالات الإخراج المخفية.
```py
>>> from transformers import TFDistilBertForQuestionAnswering
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</tf>
</frameworkcontent>
## مجزئ النصوص
@ -182,11 +132,8 @@ DistilBertConfig {
يدعم كلا النوعين من المجزئات طرقًا شائعة مثل الترميز وفك الترميز، وإضافة رموز جديدة، وإدارة الرموز الخاصة.
<Tip warning={true}>
لا يدعم كل نموذج مجزئ النصوص سريع. الق نظرة على هذا [جدول](index#supported-frameworks) للتحقق مما إذا كان النموذج يحتوي على دعم مجزئ النصوص سريع.
</Tip>
> [!WARNING]
> لا يدعم كل نموذج مجزئ النصوص سريع. الق نظرة على هذا [جدول](index#supported-frameworks) للتحقق مما إذا كان النموذج يحتوي على دعم مجزئ النصوص سريع.
إذا دربت مجزئ النصوص خاص بك، فيمكنك إنشاء واحد من *قاموسك*:```
@ -212,9 +159,8 @@ DistilBertConfig {
>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
```
<Tip>
افتراضيًا، سيحاول [`AutoTokenizer`] تحميل مجزئ نصوص سريع. يمكنك تعطيل هذا السلوك عن طريق تعيين `use_fast=False` في `from_pretrained`.
</Tip>
> [!TIP]
> افتراضيًا، سيحاول [`AutoTokenizer`] تحميل مجزئ نصوص سريع. يمكنك تعطيل هذا السلوك عن طريق تعيين `use_fast=False` في `from_pretrained`.
## معالج الصور
@ -246,11 +192,8 @@ ViTImageProcessor {
}
```
<Tip>
إذا كنت لا تبحث عن أي تخصيص، فما عليك سوى استخدام طريقة `from_pretrained` لتحميل معلمات معالج الصور الافتراضية للنموذج.
</Tip>
> [!TIP]
> إذا كنت لا تبحث عن أي تخصيص، فما عليك سوى استخدام طريقة `from_pretrained` لتحميل معلمات معالج الصور الافتراضية للنموذج.
عدل أيًا من معلمات [`ViTImageProcessor`] لإنشاء معالج الصور المخصص الخاص بك:
@ -383,9 +326,8 @@ Wav2Vec2FeatureExtractor {
}
```
<Tip>
إذا لم تكن بحاجة لأي تخصيص، فاستخدم فقط طريقة `from_pretrained` لتحميل معلمات مستخرج الميزات الافتراضية للنموذج.
</Tip>
> [!TIP]
> إذا لم تكن بحاجة لأي تخصيص، فاستخدم فقط طريقة `from_pretrained` لتحميل معلمات مستخرج الميزات الافتراضية للنموذج.
قم بتعديل أي من معلمات [`Wav2Vec2FeatureExtractor`] لإنشاء مستخرج ميزات مخصص:

View File

@ -11,11 +11,8 @@
لنبدأ بكتابة إعدادات النموذج. إعدادات النموذج هو كائنٌ يحتوي على جميع المعلومات اللازمة لبنائه. كما سنرى لاحقًا، يتطلب النموذج كائن `config` لتهيئته، لذا يجب أن يكون هذا الكائن كاملاً.
<Tip>
تتبع النماذج في مكتبة `transformers` اتفاقية قبول كائن `config` في دالة `__init__` الخاصة بها، ثم تمرر كائن `config` بالكامل إلى الطبقات الفرعية في النموذج، بدلاً من تقسيمه إلى معامﻻت متعددة. يؤدي كتابة نموذجك بهذا الأسلوب إلى كود أبسط مع "مصدر حقيقة" واضح لأي فرط معلمات، كما يسهل إعادة استخدام الكود من نماذج أخرى في `transformers`.
</Tip>
> [!TIP]
> تتبع النماذج في مكتبة `transformers` اتفاقية قبول كائن `config` في دالة `__init__` الخاصة بها، ثم تمرر كائن `config` بالكامل إلى الطبقات الفرعية في النموذج، بدلاً من تقسيمه إلى معامﻻت متعددة. يؤدي كتابة نموذجك بهذا الأسلوب إلى كود أبسط مع "مصدر حقيقة" واضح لأي فرط معلمات، كما يسهل إعادة استخدام الكود من نماذج أخرى في `transformers`.
في مثالنا، سنعدّل بعض الوسائط في فئة ResNet التي قد نرغب في ضبطها. ستعطينا التكوينات المختلفة أنواع ResNets المختلفة الممكنة. سنقوم بتخزين هذه الوسائط بعد التحقق من صحته.
@ -154,11 +151,8 @@ class ResnetModelForImageClassification(PreTrainedModel):
```
في كلتا الحالتين، لاحظ كيف نرث من `PreTrainedModel` ونستدعي مُهيئ الفئة الرئيسية باستخدام `config` (كما تفعل عند إنشاء وحدة `torch.nn.Module` عادية). ليس من الضروري تعريف `config_class` إلا إذا كنت ترغب في تسجيل نموذجك مع الفئات التلقائية (راجع القسم الأخير).
<Tip>
إذا كان نموذجك مشابهًا جدًا لنموذج داخل المكتبة، فيمكنك إعادة استخدام نفس التكوين مثل هذا النموذج.
</Tip>
> [!TIP]
> إذا كان نموذجك مشابهًا جدًا لنموذج داخل المكتبة، فيمكنك إعادة استخدام نفس التكوين مثل هذا النموذج.
يمكن لنموذجك أن يعيد أي شيء تريده، ولكن إعادة قاموس مثلما فعلنا لـ
`ResnetModelForImageClassification`، مع تضمين الخسارة عند تمرير العلامات، سيجعل نموذجك قابلًا للاستخدام مباشرة داخل فئة [`Trainer`]. يعد استخدام تنسيق إخراج آخر أمرًا جيدًا طالما أنك تخطط لاستخدام حلقة تدريب خاصة بك أو مكتبة أخرى للتدريب.
@ -204,11 +198,8 @@ AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassi
## إرسال الكود إلى Hub
<Tip warning={true}>
هذا API تجريبي وقد يكون له بعض التغييرات الطفيفة في الإصدارات القادمة.
</Tip>
> [!WARNING]
> هذا API تجريبي وقد يكون له بعض التغييرات الطفيفة في الإصدارات القادمة.
أولاً، تأكد من تعريف نموذجك بالكامل في ملف `.py`. يمكن أن يعتمد على الاستيراد النسبي لملفات أخرى طالما أن جميع الملفات موجودة في نفس الدليل (لا ندعم الوحدات الفرعية لهذه الميزة حتى الآن). في مثالنا، سنحدد ملف `modeling_resnet.py` وملف `configuration_resnet.py` في مجلد باسم "resnet_model" في دليل العمل الحالي. يحتوي ملف التكوين على كود لـ `ResnetConfig` ويحتوي ملف النمذجة على كود لـ `ResnetModel` و`ResnetModelForImageClassification`.
@ -222,12 +213,9 @@ AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassi
يمكن أن يكون ملف `__init__.py` فارغًا، فهو موجود فقط حتى يتمكن Python من اكتشاف أن `resnet_model` يمكن استخدامه كموديل.
<Tip warning={true}>
إذا كنت تقوم بنسخ ملفات النمذجة من المكتبة، فسوف تحتاج إلى استبدال جميع الواردات النسبية في أعلى الملف
لاستيرادها من حزمة `transformers`.
</Tip>
> [!WARNING]
> إذا كنت تقوم بنسخ ملفات النمذجة من المكتبة، فسوف تحتاج إلى استبدال جميع الواردات النسبية في أعلى الملف
> لاستيرادها من حزمة `transformers`.
لاحظ أنه يمكنك إعادة استخدام (أو توسيع) تكوين/نموذج موجود.
@ -251,21 +239,18 @@ ResnetModelForImageClassification.register_for_auto_class("AutoModelForImageClas
[`AutoConfig`]) ولكن الأمر يختلف بالنسبة للنماذج. قد يكون نموذجك المخصص مناسبًا للعديد من المهام المختلفة، لذلك يجب
تحديد أي من الفئات التلقائية هو الصحيح لنموذجك.
<Tip>
استخدم `register_for_auto_class()` إذا كنت تريد نسخ ملفات الكود. إذا كنت تفضل استخدام الكود على Hub من مستودع آخر،
فلا تحتاج إلى استدعائه. في الحالات التي يوجد فيها أكثر من فئة تلقائية واحدة، يمكنك تعديل ملف `config.json` مباشرة باستخدام
الهيكل التالي:
```json
"auto_map": {
"AutoConfig": "<your-repo-name>--<config-name>",
"AutoModel": "<your-repo-name>--<config-name>",
"AutoModelFor<Task>": "<your-repo-name>--<config-name>",
},
```
</Tip>
> [!TIP]
> استخدم `register_for_auto_class()` إذا كنت تريد نسخ ملفات الكود. إذا كنت تفضل استخدام الكود على Hub من مستودع آخر،
> فلا تحتاج إلى استدعائه. في الحالات التي يوجد فيها أكثر من فئة تلقائية واحدة، يمكنك تعديل ملف `config.json` مباشرة باستخدام
> الهيكل التالي:
>
> ```json
> "auto_map": {
> "AutoConfig": "<your-repo-name>--<config-name>",
> "AutoModel": "<your-repo-name>--<config-name>",
> "AutoModelFor<Task>": "<your-repo-name>--<config-name>",
> },
> ```
بعد ذلك، دعنا نقوم بإنشاء التكوين والنماذج كما فعلنا من قبل:
@ -280,7 +265,7 @@ resnet50d.model.load_state_dict(pretrained_model.state_dict())
الآن لإرسال النموذج إلى Hub، تأكد من تسجيل الدخول. إما تشغيل في المحطة الأوامر الطرفية الخاصة بك:
```bash
huggingface-cli login
hf auth login
```
أو من دفتر ملاحظات:

View File

@ -245,11 +245,8 @@
نماذج اكتشاف الأجسام: ([DetrForObjectDetection]) يتوقع النموذج قائمة من القواميس تحتوي على مفتاح class_labels و boxes حيث تتوافق كل قيمة من المجموعة مع الملصق المتوقع وعدد المربعات المحيطة بكل صورة فردية.
نماذج التعرف التلقائي على الكلام: ([Wav2Vec2ForCTC]) يتوقع النموذج مصفوفة ذات بعد (batch_size, target_length) حيث تتوافق كل قيمة مع الملصق المتوقع لكل رمز فردي.
<Tip>
قد تختلف تسميات كل نموذج، لذا تأكد دائمًا من مراجعة وثائق كل نموذج للحصول على معلومات حول التسميات الخاصة به.
</Tip>
> [!TIP]
> قد تختلف تسميات كل نموذج، لذا تأكد دائمًا من مراجعة وثائق كل نموذج للحصول على معلومات حول التسميات الخاصة به.
لا تقبل النماذج الأساسية ([`BertModel`]) الملصقات ، لأنها نماذج المحول الأساسية، والتي تقوم ببساطة بإخراج الميزات.
### نماذج اللغة الكبيرة large language models (LLM)

View File

@ -48,17 +48,14 @@ pip install 'transformers[torch]'
pip install 'transformers[tf-cpu]'
```
<Tip warning={true}>
لمستخدمي M1 / ARM
ستحتاج إلى تثبيت ما يلي قبل تثبيت TensorFLow 2.0
```bash
brew install cmake
brew install pkg-config
```
</Tip>
> [!WARNING]
> لمستخدمي M1 / ARM
>
> ستحتاج إلى تثبيت ما يلي قبل تثبيت TensorFLow 2.0
> ```bash
> brew install cmake
> brew install pkg-config
> ```
🤗 Transformers وFlax:
@ -117,11 +114,8 @@ pip install -e .
ستقوم هذه الأوامر بربط المجلد الذي قمت باستنساخ المستودع فيه بمسارات مكتبة Python. بمعنى آخر، سيبحث Python داخل المجلد الذي قمت باستنساخه بالإضافة إلى المسارات المعتادة للمكتبات. على سبيل المثال، إذا تم تثبيت حزم Python الخاصة بك عادةً في `~/anaconda3/envs/main/lib/python3.7/site-packages/`, فسيقوم Python أيضًا بالبحث في المجلد الذي قمت باستنساخه: `~/transformers/`.
<Tip warning={true}>
يجب عليك الاحتفاظ بمجلد `transformers` إذا كنت تريد الاستمرار في استخدام المكتبة.
</Tip>
> [!WARNING]
> يجب عليك الاحتفاظ بمجلد `transformers` إذا كنت تريد الاستمرار في استخدام المكتبة.
الآن يمكنك تحديث المستنسخ الخاص بك بسهولة إلى أحدث إصدار من 🤗 Transformers باستخدام الأمر التالي:
@ -148,21 +142,15 @@ conda install conda-forge::transformers
2. متغير البيئة: `HF_HOME`.
3. متغير البيئة: `XDG_CACHE_HOME` + `/huggingface`.
<Tip>
سيستخدم 🤗 Transformers متغيرات البيئة `PYTORCH_TRANSFORMERS_CACHE` أو `PYTORCH_PRETRAINED_BERT_CACHE` إذا كنت قادمًا من إصدار سابق من هذه المكتبة وقمت بتعيين متغيرات البيئة هذه، ما لم تحدد متغير البيئة `TRANSFORMERS_CACHE`.
</Tip>
> [!TIP]
> سيستخدم 🤗 Transformers متغيرات البيئة `PYTORCH_TRANSFORMERS_CACHE` أو `PYTORCH_PRETRAINED_BERT_CACHE` إذا كنت قادمًا من إصدار سابق من هذه المكتبة وقمت بتعيين متغيرات البيئة هذه، ما لم تحدد متغير البيئة `TRANSFORMERS_CACHE`.
## الوضع دون اتصال بالإنترنت
قم بتشغيل 🤗 Transformers في بيئة محمية بجدار حماية أو غير متصلة باستخدام الملفات المخزنة مؤقتًا محليًا عن طريق تعيين متغير البيئة `HF_HUB_OFFLINE=1`.
<Tip>
أضف [🤗 Datasets](https://huggingface.co/docs/datasets/) إلى سير عمل التدريب غير المتصل باستخدام متغير البيئة `HF_DATASETS_OFFLINE=1`.
</Tip>
> [!TIP]
> أضف [🤗 Datasets](https://huggingface.co/docs/datasets/) إلى سير عمل التدريب غير المتصل باستخدام متغير البيئة `HF_DATASETS_OFFLINE=1`.
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
@ -239,8 +227,5 @@ model = T5Model.from_pretrained("./path/to/local/directory", local_files_only=Tr
>>> config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
```
<Tip>
راجع قسم [كيفية تنزيل الملفات من Hub](https://huggingface.co/docs/hub/how-to-downstream) لمزيد من التفاصيل حول تنزيل الملفات المخزنة على Hub.
</Tip>
> [!TIP]
> راجع قسم [كيفية تنزيل الملفات من Hub](https://huggingface.co/docs/hub/how-to-downstream) لمزيد من التفاصيل حول تنزيل الملفات المخزنة على Hub.

View File

@ -51,11 +51,8 @@ pip install transformers bitsandbytes>=0.39.0 -q
دعنا نتحدث عن الكود!
<Tip>
إذا كنت مهتمًا بالاستخدام الأساسي لـ LLM، فإن واجهة [`Pipeline`](pipeline_tutorial) عالية المستوى هي نقطة انطلاق رائعة. ومع ذلك، غالبًا ما تتطلب LLMs ميزات متقدمة مثل التكميم والتحكم الدقيق في خطوة اختيار الرمز، والتي يتم تنفيذها بشكل أفضل من خلال [`~generation.GenerationMixin.generate`]. التوليد التلقائي باستخدام LLMs يستهلك الكثير من المواردد ويجب تنفيذه على وحدة معالجة الرسومات للحصول على أداء كافٍ.
</Tip>
> [!TIP]
> إذا كنت مهتمًا بالاستخدام الأساسي لـ LLM، فإن واجهة [`Pipeline`](pipeline_tutorial) عالية المستوى هي نقطة انطلاق رائعة. ومع ذلك، غالبًا ما تتطلب LLMs ميزات متقدمة مثل التكميم والتحكم الدقيق في خطوة اختيار الرمز، والتي يتم تنفيذها بشكل أفضل من خلال [`~generation.GenerationMixin.generate`]. التوليد التلقائي باستخدام LLMs يستهلك الكثير من المواردد ويجب تنفيذه على وحدة معالجة الرسومات للحصول على أداء كافٍ.
أولاً، تحتاج إلى تحميل النموذج.

View File

@ -13,11 +13,11 @@
في هذا الدليل، سنستعرض التقنيات الفعالة لتُحسِّن من كفاءة نشر نماذج اللغة الكبيرة:
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization.md).
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization).
2. **اFlash Attention:** إن Flash Attention وهي نسخة مُعدَّلة من خوارزمية الانتباه التي لا توفر فقط نهجًا أكثر كفاءة في استخدام الذاكرة، ولكنها تحقق أيضًا كفاءة متزايدة بسبب الاستخدام الأمثل لذاكرة GPU.
3. **الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)]((https://huggingface.co/papers/2305.13245)).
3. **الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)](https://huggingface.co/papers/2305.13245).
على مدار هذا الدليل، سنقدم تحليلًا للتوليد التنبؤي التلقائي من منظور المُوتِّرات. نتعمق في مزايا وعيوب استخدام دقة أقل، ونقدم استكشافًا شاملاً لخوارزميات الانتباه الأحدث، ونناقش بنيات نماذج نماذج اللغة الكبيرة المحسنة. سندعم الشرح بأمثلة عملية تُبرِز كل تحسين على حدة.
@ -73,7 +73,7 @@ model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="aut
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", torch_dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
tokenizer = AutoTokenizer.from_pretrained("bigcode/octocoder")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
@ -114,7 +114,7 @@ bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
> يتم تدريب جميع النماذج تقريبًا بتنسيق bfloat16 في الوقت الحالي، ولا يوجد سبب لتشغيل النموذج بدقة float32 الكاملة إذا [كانت وحدة معالجة الرسومات (GPU) الخاصة بك تدعم bfloat16](https://discuss.pytorch.org/t/bfloat16-native-support/117155/5). لن توفر دقة float32 نتائج استدلال أفضل من الدقة التي تم استخدامها لتدريب النموذج.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"torch_dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., torch_dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
دعونا نحدد وظيفة `flush(...)` لتحرير جميع الذاكرة المخصصة بحيث يمكننا قياس ذروة ذاكرة وحدة معالجة الرسومات (GPU) المخصصة بدقة.
@ -389,7 +389,7 @@ long_prompt = 10 * system_prompt + prompt
نقوم بتنفيذ نموذجنا مرة أخرى بدقة bfloat16.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", torch_dtype=torch.bfloat16, device_map="auto")
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("bigcode/octocoder")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
@ -673,11 +673,8 @@ length of key-value cache 24
> يجب *دائمًا* استخدام ذاكرة التخزين المؤقت للمفاتيح والقيم حيث يؤدي ذلك إلى نتائج متطابقة وزيادة كبيرة في السرعة لتسلسلات الإدخال الأطول. ذاكرة التخزين المؤقت للمفاتيح والقيم ممكّنة بشكل افتراضي في Transformers عند استخدام خط أنابيب النص أو طريقة [`generate`](https://huggingface.co/docs/transformers/main_classes/text_generation).
<Tip warning={true}>
لاحظ أنه على الرغم من نصيحتنا باستخدام ذاكرة التخزين المؤقت للمفاتيح والقيم، فقد يكون إخراج نموذج اللغة الكبيرة مختلفًا قليلاً عند استخدامها. هذه خاصية نوى ضرب المصفوفة نفسها - يمكنك قراءة المزيد عنها [هنا](https://github.com/huggingface/transformers/issues/25420#issuecomment-1775317535).
</Tip>
> [!WARNING]
> لاحظ أنه على الرغم من نصيحتنا باستخدام ذاكرة التخزين المؤقت للمفاتيح والقيم، فقد يكون إخراج نموذج اللغة الكبيرة مختلفًا قليلاً عند استخدامها. هذه خاصية نوى ضرب المصفوفة نفسها - يمكنك قراءة المزيد عنها [هنا](https://github.com/huggingface/transformers/issues/25420#issuecomment-1775317535).
#### 3.2.1 محادثة متعددة الجولات

View File

@ -116,11 +116,8 @@ default_args = {
}
```
<Tip>
إذا كنت تخطط لتشغيل عدة تجارب، من أجل مسح الذاكرة بشكل صحيح بين التجارب، قم بإعادة تشغيل نواة Python بين التجارب.
</Tip>
> [!TIP]
> إذا كنت تخطط لتشغيل عدة تجارب، من أجل مسح الذاكرة بشكل صحيح بين التجارب، قم بإعادة تشغيل نواة Python بين التجارب.
## استخدام الذاكرة في التدريب الأساسي

View File

@ -12,11 +12,8 @@
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
<Tip>
لمشاركة نموذج مع المجتمع، تحتاج إلى حساب على [huggingface.co](https://huggingface.co/join). يمكنك أيضًا الانضمام إلى منظمة موجودة أو إنشاء منظمة جديدة.
</Tip>
> [!TIP]
> لمشاركة نموذج مع المجتمع، تحتاج إلى حساب على [huggingface.co](https://huggingface.co/join). يمكنك أيضًا الانضمام إلى منظمة موجودة أو إنشاء منظمة جديدة.
## ميزات المستودع
@ -41,7 +38,7 @@ picture-in-picture" allowfullscreen></iframe>
قبل مشاركة نموذج على Hub، ستحتاج إلى بيانات اعتماد حساب Hugging Face الخاصة بك. إذا كنت تستخدم منصة الأوامر، فقم بتشغيل الأمر التالي في بيئة افتراضية حيث تم تثبيت 🤗 Transformers. سيقوم هذا الأمر بتخزين رمز الدخول الخاص بك في مجلد تخزين المؤقت لـ Hugging Face (`~/.cache/` بشكل افتراضي):
```bash
huggingface-cli login
hf auth login
```
إذا كنت تستخدم دفتر ملاحظات مثل Jupyter أو Colaboratory، فتأكد من تثبيت مكتبة [`huggingface_hub`](https://huggingface.co/docs/hub/adding-a-library). تسمح لك هذه المكتبة بالتفاعل برمجيًا مع Hub.
@ -65,43 +62,15 @@ pip install huggingface_hub
تحويل نقطة التحقق لإطار عمل آخر أمر سهل. تأكد من تثبيت PyTorch و TensorFlow (راجع [هنا](installation) لتعليمات التثبيت)، ثم ابحث عن النموذج الملائم لمهمتك في الإطار الآخر.
<frameworkcontent>
<pt>
حدد `from_tf=True` لتحويل نقطة تحقق من TensorFlow إلى PyTorch:
```py
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
```
</pt>
<tf>
حدد `from_pt=True` لتحويل نقطة تحقق من PyTorch إلى TensorFlow:
```py
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
بعد ذلك، يمكنك حفظ نموذج TensorFlow الجديد بنقطة التحقق الجديدة:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
```
</tf>
<jax>
إذا كان النموذج متاحًا في Flax، فيمكنك أيضًا تحويل نقطة تحقق من PyTorch إلى Flax:
```py
>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
... "path/to/awesome-name-you-picked", from_pt=True
... )
```
</jax>
</frameworkcontent>
## دفع نموذج أثناء التدريب
<frameworkcontent>
<pt>
<Youtube id="Z1-XMy-GNLQ"/>
مشاركة نموذجك على Hub مر بسيط للغاية كل ما عليك هو إضافة معلمة أو استدعاء رد إضافي. كما تذكر من درس [التدريب الدقيق](training)، فإن فئة [`TrainingArguments`] هي المكان الذي تحدد فيه المعلمات الفائقة وخيارات التدريب الإضافية. تشمل إحدى خيارات التدريب هذه القدرة على دفع النموذج مباشرة إلى المنصة Hub. قم بتعيين `push_to_hub=True` في [`TrainingArguments`]:
@ -127,29 +96,6 @@ pip install huggingface_hub
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
شارك نموذجًا على Hub باستخدام [`PushToHubCallback`]. في دالة [`PushToHubCallback`], أضف:
- دليل إخراج لنموذجك.
- مُجزّئ اللغوي.
- `hub_model_id`، والذي هو اسم مستخدم Hub واسم النموذج الخاص بك.
```py
>>> from transformers import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
... )
```
أضف الاستدعاء إلى [`fit`](https://keras.io/api/models/model_training_apis/)، وسيقوم 🤗 Transformers بدفع النموذج المدرب إلى Hub:
```py
>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
```
</tf>
</frameworkcontent>
## استخدام دالة `push_to_hub`

View File

@ -39,7 +39,6 @@
| [كيفية ضبط نموذج بدقة على التلخيص](https://github.com/huggingface/notebooks/blob/main/examples/summarization.ipynb)| يوضح كيفية معالجة البيانات مسبقًا وضبط نموذج مُدرَّب مسبقًا بدقة على XSUM. | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)|
| [كيفية تدريب نموذج لغة من البداية](https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| تسليط الضوء على جميع الخطوات لتدريب نموذج Transformer بشكل فعال على بيانات مخصصة | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)|
| [كيفية إنشاء نص](https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| كيفية استخدام أساليب فك التشفير المختلفة لإنشاء اللغة باستخدام المحولات | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)|
| [كيفية إنشاء نص (مع قيود)](https://github.com/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| كيفية توجيه إنشاء اللغة باستخدام القيود التي يوفرها المستخدم | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)|
| [Reformer](https://github.com/huggingface/blog/blob/main/notebooks/03_reformer.ipynb)| كيف يدفع Reformer حدود النمذجة اللغوية | [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [![Open in AWS Studio](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)|
#### رؤية الكمبيوتر[[pytorch-cv]]

View File

@ -51,11 +51,8 @@ peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id)
```
<Tip>
يمكنك تحميل محول PEFT باستخدام فئة `AutoModelFor` أو فئة النموذج الأساسي مثل `OPTForCausalLM` أو `LlamaForCausalLM`.
</Tip>
> [!TIP]
> يمكنك تحميل محول PEFT باستخدام فئة `AutoModelFor` أو فئة النموذج الأساسي مثل `OPTForCausalLM` أو `LlamaForCausalLM`.
يمكنك أيضًا تحميل محول PEFT عن طريق استدعاء طريقة `load_adapter`:
@ -162,11 +159,8 @@ output = model.generate(**inputs)
يدعم محول PEFT فئة [`Trainer`] بحيث يمكنك تدريب محول لحالتك الاستخدام المحددة. فهو يتطلب فقط إضافة بضع سطور أخرى من التعليمات البرمجية. على سبيل المثال، لتدريب محول LoRA:
<Tip>
إذا لم تكن معتادًا على ضبط نموذج دقيق باستخدام [`Trainer`، فراجع البرنامج التعليمي](training) لضبط نموذج مُدرب مسبقًا.
</Tip>
> [!TIP]
> إذا لم تكن معتادًا على ضبط نموذج دقيق باستخدام [`Trainer`، فراجع البرنامج التعليمي](training) لضبط نموذج مُدرب مسبقًا.
1. حدد تكوين المحول باستخدام نوع المهمة والمعاملات الزائدة (راجع [`~peft.LoraConfig`] لمزيد من التفاصيل حول وظيفة هذه المعلمات).

View File

@ -6,11 +6,8 @@
* استخدم مُجزّئ أو نموذجًا محددًا.
* استخدم [`pipeline`] للمهام الصوتية والبصرية والمتعددة الوسائط.
<Tip>
اطلع على وثائق [`pipeline`] للحصول على القائمة كاملة بالمهام المدعومة والمعلمات المتاحة.
</Tip>
> [!TIP]
> اطلع على وثائق [`pipeline`] للحصول على القائمة كاملة بالمهام المدعومة والمعلمات المتاحة.
## استخدام الأنابيب
@ -90,7 +87,7 @@ out = transcriber(...) # سيتم الرجوع إلى استخدام `my_parame
transcriber = pipeline(model="openai/whisper-large-v2", device=0)
```
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `torch_dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
بدلاً من ذلك، يمكنك تعيين `device_map="auto"` لتحديد كيفية تحميل مخزنات النموذج وتخزينها تلقائيًا. يتطلب استخدام معامل `device_map` مكتبه 🤗 [Accelerate](https://huggingface.co/docs/accelerate):
@ -189,9 +186,8 @@ for out in pipe(KeyDataset(dataset, "audio")):
## استخدام خطوط الأنابيب لخادم ويب
<Tip>
إن إنشاء محرك استدلال هو موضوع معقد يستحق صفحته الخاصة.
</Tip>
> [!TIP]
> إن إنشاء محرك استدلال هو موضوع معقد يستحق صفحته الخاصة.
[Link](./pipeline_webserver)
@ -251,16 +247,13 @@ for out in pipe(KeyDataset(dataset, "audio")):
[{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]
```
<Tip>
لتشغيل المثال أعلاه، تحتاج إلى تثبيت [`pytesseract`](https://pypi.org/project/pytesseract/) بالإضافة إلى 🤗 Transformers:
```bash
sudo apt install -y tesseract-ocr
pip install pytesseract
```
</Tip>
> [!TIP]
> لتشغيل المثال أعلاه، تحتاج إلى تثبيت [`pytesseract`](https://pypi.org/project/pytesseract/) بالإضافة إلى 🤗 Transformers:
>
> ```bash
> sudo apt install -y tesseract-ocr
> pip install pytesseract
> ```
## استخدام `pipeline` على نماذج كبيرة مع 🤗 `accelerate`:
@ -273,7 +266,7 @@ pip install pytesseract
import torch
from transformers import pipeline
pipe = pipeline(model="facebook/opt-1.3b", torch_dtype=torch.bfloat16, device_map="auto")
pipe = pipeline(model="facebook/opt-1.3b", dtype=torch.bfloat16, device_map="auto")
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)
```

View File

@ -1,11 +1,8 @@
# استخدام قنوات المعالجة لخادم ويب
<Tip>
يُعدّ إنشاء محرك استدلال أمرًا معقدًا، ويعتمد الحل "الأفضل" على مساحة مشكلتك. هل تستخدم وحدة المعالجة المركزية أم وحدة معالجة الرسومات؟ هل تريد أقل زمن وصول، أم أعلى معدل نقل، أم دعمًا للعديد من النماذج، أم مجرد تحقيق أقصى تحسين نموذج محدد؟
توجد طرق عديدة لمعالجة هذا الموضوع، لذلك ما سنقدمه هو إعداد افتراضي جيد للبدء به قد لا يكون بالضرورة هو الحل الأمثل لك.```
</Tip>
> [!TIP]
> يُعدّ إنشاء محرك استدلال أمرًا معقدًا، ويعتمد الحل "الأفضل" على مساحة مشكلتك. هل تستخدم وحدة المعالجة المركزية أم وحدة معالجة الرسومات؟ هل تريد أقل زمن وصول، أم أعلى معدل نقل، أم دعمًا للعديد من النماذج، أم مجرد تحقيق أقصى تحسين نموذج محدد؟
> توجد طرق عديدة لمعالجة هذا الموضوع، لذلك ما سنقدمه هو إعداد افتراضي جيد للبدء به قد لا يكون بالضرورة هو الحل الأمثل لك.```
الشيء الرئيسي الذي يجب فهمه هو أننا يمكن أن نستخدم مؤشرًا، تمامًا كما تفعل [على مجموعة بيانات](pipeline_tutorial#using-pipelines-on-a-dataset)، نظرًا لأن خادم الويب هو أساسًا نظام ينتظر الطلبات ويعالجها عند استلامها.
@ -71,11 +68,8 @@ curl -X POST -d "test [MASK]" http://localhost:8000/
المهم حقًا هو أننا نقوم بتحميل النموذج **مرة واحدة** فقط، لذلك لا توجد نسخ من النموذج على خادم الويب. بهذه الطريقة، لا يتم استخدام ذاكرة الوصول العشوائي غير الضرورية. تسمح آلية وضع قائمة الانتظار بالقيام بأشياء متقدمة مثل تجميع بعض العناصر قبل الاستدلال لاستخدام معالجة الدفعات الديناميكية:
<Tip warning={true}>
تم كتابة نموذج الكود البرمجى أدناه بشكل مقصود مثل كود وهمي للقراءة. لا تقم بتشغيله دون التحقق مما إذا كان منطقيًا لموارد النظام الخاص بك!
</Tip>
> [!WARNING]
> تم كتابة نموذج الكود البرمجى أدناه بشكل مقصود مثل كود وهمي للقراءة. لا تقم بتشغيله دون التحقق مما إذا كان منطقيًا لموارد النظام الخاص بك!
```py
(string, rq) = await q.get()

View File

@ -9,11 +9,8 @@
* تستخدم مدخلات الصورة [ImageProcessor](./main_classes/image_processor) لتحويل الصور إلى موترات.
* تستخدم مدخلات متعددة الوسائط [معالجًا](./main_classes/processors) لدمج مُجزّئ الرموز ومستخرج الميزات أو معالج الصور.
<Tip>
`AutoProcessor` **يعمل دائمًا** ويختار تلقائيًا الفئة الصحيحة للنموذج الذي تستخدمه، سواء كنت تستخدم مُجزّئ رموز أو معالج صور أو مستخرج ميزات أو معالجًا.
</Tip>
> [!TIP]
> `AutoProcessor` **يعمل دائمًا** ويختار تلقائيًا الفئة الصحيحة للنموذج الذي تستخدمه، سواء كنت تستخدم مُجزّئ رموز أو معالج صور أو مستخرج ميزات أو معالجًا.
قبل البدء، قم بتثبيت 🤗 Datasets حتى تتمكن من تحميل بعض مجموعات البيانات لتجربتها:
@ -27,11 +24,8 @@ pip install datasets
أداة المعالجة المسبقة الرئيسية للبيانات النصية هي [مُجزّئ اللغوي](main_classes/tokenizer). يقوم مُجزّئ اللغوي بتقسيم النص إلى "أجزاء لغوية" (tokens) وفقًا لمجموعة من القواعد. يتم تحويل الأجزاء اللغوية إلى أرقام ثم إلى منسوجات، والتي تصبح مدخلات للنموذج. يقوم المجزئ اللغوي بإضافة أي مدخلات إضافية يحتاجها النموذج.
<Tip>
إذا كنت تخطط لاستخدام نموذج مُدرب مسبقًا، فمن المهم استخدامالمجزئ اللغوي المقترن بنفس ذلك النموذج. يضمن ذلك تقسيم النص بنفس الطريقة التي تم بها تقسيم النصوص ما قبل التدريب، واستخدام نفس القاموس الذي يربط بين الأجزاء اللغوية وأرقامها ( يُشار إليها عادةً باسم المفردات *vocab*) أثناء التدريب المسبق.
</Tip>
> [!TIP]
> إذا كنت تخطط لاستخدام نموذج مُدرب مسبقًا، فمن المهم استخدامالمجزئ اللغوي المقترن بنفس ذلك النموذج. يضمن ذلك تقسيم النص بنفس الطريقة التي تم بها تقسيم النصوص ما قبل التدريب، واستخدام نفس القاموس الذي يربط بين الأجزاء اللغوية وأرقامها ( يُشار إليها عادةً باسم المفردات *vocab*) أثناء التدريب المسبق.
ابدأ بتحميل المُجزّئ اللغوي مُدرب مسبقًا باستخدام طريقة [`AutoTokenizer.from_pretrained`]. يقوم هذا بتنزيل المفردات *vocab* الذي تم تدريب النموذج عليه:
@ -140,11 +134,8 @@ pip install datasets
[1، 1، 1، 1، 1، 1، 1، 0، 0، 0، 0، 0، 0، 0، 0، 0]]}
```
<Tip>
تحقق من دليل المفاهيم [Padding and truncation](./pad_truncation) لمعرفة المزيد حول معامﻻت الحشو و البتر المختلفة.
</Tip>
> [!TIP]
> تحقق من دليل المفاهيم [Padding and truncation](./pad_truncation) لمعرفة المزيد حول معامﻻت الحشو و البتر المختلفة.
### بناء الموترات Build tensors
@ -152,8 +143,6 @@ pip install datasets
قم بتعيين معلمة `return_tensors` إلى إما `pt` لـ PyTorch، أو `tf` لـ TensorFlow:
<frameworkcontent>
<pt>
```py
>>> batch_sentences = [
@ -173,42 +162,12 @@ pip install datasets
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
</pt>
<tf>
```py
>>> batch_sentences = [
... "But what about second breakfast?",
... "Don't think he knows about second breakfast, Pip.",
... "What about elevensies?",
... ]
>>> encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(encoded_input)
{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0],
[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
[101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}
```
</tf>
</frameworkcontent>
<Tip>
تدعم خطوط الأنابيب المختلفة معامل مُجزِّئ الرموز(tokenizer) بشكل مختلف في طريقة `()__call__` الخاصة بها.
و خطوط الأنابيب `text-2-text-generation` تدعم فقط `truncation`.
و خطوط الأنابيب `text-generation` تدعم `max_length` و`truncation` و`padding` و`add_special_tokens`.
أما في خطوط الأنابيب `fill-mask`، يمكن تمرير معامل مُجزِّئ الرموز (tokenizer) في المتغير `tokenizer_kwargs` (قاموس).
</Tip>
> [!TIP]
> تدعم خطوط الأنابيب المختلفة معامل مُجزِّئ الرموز(tokenizer) بشكل مختلف في طريقة `()__call__` الخاصة بها.
> و خطوط الأنابيب `text-2-text-generation` تدعم فقط `truncation`.
> و خطوط الأنابيب `text-generation` تدعم `max_length` و`truncation` و`padding` و`add_special_tokens`.
> أما في خطوط الأنابيب `fill-mask`، يمكن تمرير معامل مُجزِّئ الرموز (tokenizer) في المتغير `tokenizer_kwargs` (قاموس).
## الصوت Audio
@ -320,24 +279,18 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
بالنسبة لمهام رؤية الحاسوبية، ستحتاج إلى معالج صور [image processor](main_classes/image_processor) لإعداد مجموعة البيانات الخاصة بك لتناسب النموذج. تتكون معالجة الصور المسبقة من عدة خطوات لتحويل الصور إلى الشكل الذي يتوقعه النموذج. وتشمل هذه الخطوات، على سبيل المثال لا الحصر، تغيير الحجم والتطبيع وتصحيح قناة الألوان وتحويل الصور إلى موترات(tensors).
<Tip>
عادة ما تتبع معالجة الصور المسبقة شكلاً من أشكال زيادة البيانات (التضخيم). كلا العمليتين، معالجة الصور المسبقة وزيادة الصور تغيران بيانات الصورة، ولكنها تخدم أغراضًا مختلفة:
*زيادة البيانات: تغيير الصور عن طريق زيادة الصور بطريقة يمكن أن تساعد في منع الإفراط في التعميم وزيادة متانة النموذج. يمكنك أن تكون مبدعًا في كيفية زيادة بياناتك - ضبط السطوع والألوان، واالقص، والدوران، تغيير الحجم، التكبير، إلخ. ومع ذلك، كن حذرًا من عدم تغيير معنى الصور بزياداتك.
*معالجة الصور المسبقة: تضمن معالجة الصور اتتطابق الصور مع تنسيق الإدخال المتوقع للنموذج. عند ضبط نموذج رؤية حاسوبية بدقة، يجب معالجة الصور بالضبط كما كانت عند تدريب النموذج في البداية.
يمكنك استخدام أي مكتبة تريدها لزيادة بيانات الصور. لمعالجة الصور المسبقة، استخدم `ImageProcessor` المرتبط بالنموذج.
</Tip>
> [!TIP]
> عادة ما تتبع معالجة الصور المسبقة شكلاً من أشكال زيادة البيانات (التضخيم). كلا العمليتين، معالجة الصور المسبقة وزيادة الصور تغيران بيانات الصورة، ولكنها تخدم أغراضًا مختلفة:
>
> *زيادة البيانات: تغيير الصور عن طريق زيادة الصور بطريقة يمكن أن تساعد في منع الإفراط في التعميم وزيادة متانة النموذج. يمكنك أن تكون مبدعًا في كيفية زيادة بياناتك - ضبط السطوع والألوان، واالقص، والدوران، تغيير الحجم، التكبير، إلخ. ومع ذلك، كن حذرًا من عدم تغيير معنى الصور بزياداتك.
> *معالجة الصور المسبقة: تضمن معالجة الصور اتتطابق الصور مع تنسيق الإدخال المتوقع للنموذج. عند ضبط نموذج رؤية حاسوبية بدقة، يجب معالجة الصور بالضبط كما كانت عند تدريب النموذج في البداية.
>
> يمكنك استخدام أي مكتبة تريدها لزيادة بيانات الصور. لمعالجة الصور المسبقة، استخدم `ImageProcessor` المرتبط بالنموذج.
قم بتحميل مجموعة بيانات [food101](https://huggingface.co/datasets/food101) (راجع دليل 🤗 [Datasets tutorial](https://huggingface.co/docs/datasets/load_hub) لمزيد من التفاصيل حول كيفية تحميل مجموعة بيانات) لمعرفة كيف يمكنك استخدام معالج الصور مع مجموعات بيانات رؤية الحاسب:
<Tip>
استخدم معامل `split` من 🤗 Datasets لتحميل عينة صغيرة فقط من مجموعة التدريب نظرًا لحجم البيانات كبيرة جدًا!
</Tip>
> [!TIP]
> استخدم معامل `split` من 🤗 Datasets لتحميل عينة صغيرة فقط من مجموعة التدريب نظرًا لحجم البيانات كبيرة جدًا!
```py
>>> from datasets import load_dataset
@ -391,15 +344,13 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
... return examples
```
<Tip>
في المثال أعلاه، قمنا بتعيين `do_resize=False` لأننا قمنا بالفعل بتغيير حجم الصور في تحويل زيادة الصور،
واستفدنا من خاصية `size` من `image_processor` المناسب. إذا لم تقم بتغيير حجم الصور أثناء زيادة الصور،
فاترك هذا المعلمة. بشكل افتراضي، ستتعامل `ImageProcessor` مع تغيير الحجم.
إذا كنت ترغب في تطبيع الصور كجزء من تحويل زيادة الصور، فاستخدم قيم `image_processor.image_mean`،
و `image_processor.image_std`.
</Tip>
> [!TIP]
> في المثال أعلاه، قمنا بتعيين `do_resize=False` لأننا قمنا بالفعل بتغيير حجم الصور في تحويل زيادة الصور،
> واستفدنا من خاصية `size` من `image_processor` المناسب. إذا لم تقم بتغيير حجم الصور أثناء زيادة الصور،
> فاترك هذا المعلمة. بشكل افتراضي، ستتعامل `ImageProcessor` مع تغيير الحجم.
>
> إذا كنت ترغب في تطبيع الصور كجزء من تحويل زيادة الصور، فاستخدم قيم `image_processor.image_mean`،
> و `image_processor.image_std`.
3. ثم استخدم 🤗 Datasets[`~datasets.Dataset.set_transform`] لتطبيق التحولات أثناء التنقل:
```py
@ -426,13 +377,10 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/preprocessed_image.png"/>
</div>
<Tip>
بالنسبة للمهام مثل الكشف عن الأشياء، والتجزئة الدلالية، والتجزئة المثالية، والتجزئة الشاملة، يوفر `ImageProcessor`
تقوم هذه الطرق بتحويل النواتج الأولية للنموذج إلى تنبؤات ذات معنى مثل مربعات الحدود،
أو خرائط التجزئة.
</Tip>
> [!TIP]
> بالنسبة للمهام مثل الكشف عن الأشياء، والتجزئة الدلالية، والتجزئة المثالية، والتجزئة الشاملة، يوفر `ImageProcessor`
> تقوم هذه الطرق بتحويل النواتج الأولية للنموذج إلى تنبؤات ذات معنى مثل مربعات الحدود،
> أو خرائط التجزئة.
### الحشو Pad

View File

@ -12,20 +12,10 @@
ستحتاج أيضًا إلى تثبيت إطار عمل التعلم الآلي المفضل لديك:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
## خط الأنابيب
@ -33,11 +23,8 @@ pip install tensorflow
يمثل [`pipeline`] أسهل وأسرع طريقة لاستخدام نموذج مُدرب مسبقًا للاستنتاج. يمكنك استخدام [`pipeline`] جاهزًا للعديد من المهام عبر طرق مختلفة، والتي يظهر بعضها في الجدول أدناه:
<Tip>
للاطلاع على القائمة الكاملة للمهام المتاحة، راجع [مرجع واجهة برمجة التطبيقات الخاصة بخط الأنابيب](./main_classes/pipelines).
</Tip>
> [!TIP]
> للاطلاع على القائمة الكاملة للمهام المتاحة، راجع [مرجع واجهة برمجة التطبيقات الخاصة بخط الأنابيب](./main_classes/pipelines).
<div dir="rtl">
@ -122,8 +109,6 @@ label: NEGATIVE, with score: 0.5309
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
استخدم [`AutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `AutoClass` في القسم التالي):
```py
@ -132,18 +117,6 @@ label: NEGATIVE, with score: 0.5309
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
استخدم [`TFAutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `TFAutoClass` في القسم التالي):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
حدد النموذج والمعالج في [`pipeline`]. الآن يمكنك تطبيق `classifier` على النص الفرنسي:
@ -192,8 +165,6 @@ label: NEGATIVE, with score: 0.5309
يمكن المجزئ أيضًا قبول قائمة من المدخلات، ويقوم بـ "حشو" و"تقصير" النص لإرجاع كدفعة بطول موحد:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
@ -204,31 +175,12 @@ label: NEGATIVE, with score: 0.5309
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
<Tip>
اطلع على [الدليل التمهيدي للمعالجة المسبقة](./preprocessing) للحصول على مزيد من التفاصيل حول المعالجة، وكيفية استخدام [`AutoImageProcessor`] و [`AutoFeatureExtractor`] و [`AutoProcessor`] لمعالجة الصور والصوت والإدخالات متعددة الوسائط.
</Tip>
> [!TIP]
> اطلع على [الدليل التمهيدي للمعالجة المسبقة](./preprocessing) للحصول على مزيد من التفاصيل حول المعالجة، وكيفية استخدام [`AutoImageProcessor`] و [`AutoFeatureExtractor`] و [`AutoProcessor`] لمعالجة الصور والصوت والإدخالات متعددة الوسائط.
### AutoModel
<frameworkcontent>
<pt>
تقدم مكتبة 🤗 Transformers طريقة بسيطة وموحدة لتحميل نماذج مدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`AutoModel`] كما لو كنت تقوم بتحميل [`AutoTokenizer`]. الفرق الوحيد هو اختيار فئة [`AutoModel`] المناسبة للمهمة. بالنسبة لتصنيف النص (أو التسلسل)، يجب عليك تحميل [`AutoModelForSequenceClassification`]:
```py
@ -238,11 +190,8 @@ label: NEGATIVE, with score: 0.5309
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
راجع [ملخص المهمة](./task_summary) للاطلاع على المهام التي تدعمها فئة [`AutoModel`].
</Tip>
> [!TIP]
> راجع [ملخص المهمة](./task_summary) للاطلاع على المهام التي تدعمها فئة [`AutoModel`].
الآن قم بتمرير دفعة المدخلات المُعالجة مسبقًا مباشرة إلى النموذج. عليك فقط فك تعبئة القاموس عن طريق إضافة `**`:
@ -264,50 +213,12 @@ label: NEGATIVE, with score: 0.5309
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
يوفر 🤗 Transformers طريقة بسيطة وموحدة لتحميل مثيلات مُدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`TFAutoModel`] مثل تحميل [`AutoTokenizer`]. والفرق الوحيد هو تحديد [`TFAutoModel`] الصحيح للمهمة. للتصنيف النصي (أو التسلسلي)، يجب تحميل [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
راجع [ملخص المهام](./task_summary) للمهام المدعومة بواسطة فئة [`AutoModel`].
</Tip>
الآن، مرر دفعة المدخلات المعالجة مسبقًا مباشرة إلى النموذج. يمكنك تمرير المصفوفات كما هي:
```py
>>> tf_outputs = tf_model(tf_batch)
```
يقوم النموذج بإخراج التنشيطات النهائية في سمة `logits`. طبق دالة softmax على `logits` لاسترداد الاحتمالات:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
تخرج جميع نماذج 🤗 Transformers (PyTorch أو TensorFlow) المصفوفات *قبل* دالة التنشيط النهائية (مثل softmax) لأن دالة التنشيط النهائية غالبًا ما تكون مدمجة مع دالة الخسارة. نواتج النموذج عبارة عن فئات بيانات خاصة، لذلك يتم استكمال سماتها تلقائيًا في IDE. وتتصرف مخرجات النموذج مثل زوج مرتب أو قاموس (يمكنك الفهرسة باستخدام عدد صحيح ، شريحة، أو سلسلة)، وفي هذه الحالة، يتم تجاهل السمات التي تساوي None.
</Tip>
> [!TIP]
> تخرج جميع نماذج 🤗 Transformers (PyTorch أو TensorFlow) المصفوفات *قبل* دالة التنشيط النهائية (مثل softmax) لأن دالة التنشيط النهائية غالبًا ما تكون مدمجة مع دالة الخسارة. نواتج النموذج عبارة عن فئات بيانات خاصة، لذلك يتم استكمال سماتها تلقائيًا في IDE. وتتصرف مخرجات النموذج مثل زوج مرتب أو قاموس (يمكنك الفهرسة باستخدام عدد صحيح ، شريحة، أو سلسلة)، وفي هذه الحالة، يتم تجاهل السمات التي تساوي None.
### حفظ النموذج
<frameworkcontent>
<pt>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`PreTrainedModel.save_pretrained`]:
```py
@ -321,28 +232,9 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`TFPreTrainedModel.save_pretrained`]:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
عندما تكون مستعدًا لاستخدام النموذج مرة أخرى، أعد تحميله باستخدام [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
من الميزات الرائعة في 🤗 Transformers القدرة على حفظ نموذج وإعادة تحميله كنموذج PyTorch أو TensorFlow. يمكن أن يحول معامل `from_pt` أو `from_tf` النموذج من إطار عمل إلى آخر:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
@ -350,17 +242,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</tf>
</frameworkcontent>
## إنشاء نماذج مخصصة
@ -375,8 +256,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`AutoModel.from_config`]:
```py
@ -384,17 +263,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
الق نظرة على دليل [إنشاء بنية مخصصة](./create_a_model) لمزيد من المعلومات حول بناء التكوينات المخصصة.
@ -483,11 +351,8 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> trainer.train() # doctest: +SKIP
```
<Tip>
بالنسبة للمهام - مثل الترجمة أو التلخيص - التي تستخدم نموذج تسلسل إلى تسلسل، استخدم فئات [`Seq2SeqTrainer`] و [`Seq2SeqTrainingArguments`] بدلاً من ذلك.
</Tip>
> [!TIP]
> بالنسبة للمهام - مثل الترجمة أو التلخيص - التي تستخدم نموذج تسلسل إلى تسلسل، استخدم فئات [`Seq2SeqTrainer`] و [`Seq2SeqTrainingArguments`] بدلاً من ذلك.
يمكنك تخصيص سلوك حلقة التدريب عن طريق إنشاء فئة فرعية من الطرق داخل [`Trainer`]. يسمح لك ذلك بتخصيص ميزات مثل دالة الخسارة، والمحسن، والمجدول. راجع مرجع [`Trainer`] للتعرف على الطرق التي يمكن إنشاء فئات فرعية منها.

View File

@ -76,8 +76,6 @@ pip install -r requirements.txt
## تشغيل نص برمجي
<frameworkcontent>
<pt>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) على بنية تدعم الملخص.
@ -98,28 +96,6 @@ python examples/pytorch/summarization/run_summarization.py \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets/).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام Keras على بنية تدعم الملخص.
- يوضح المثال التالي كيفية ضبط نموذج [T5-small](https://huggingface.co/google-t5/t5-small) على مجموعة بيانات [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail).
- يتطلب نموذج T5 ماعمل `source_prefix` إضافية بسبب الطريقة التي تم تدريبه بها. يتيح هذا المطالبة لـ T5 معرفة أن هذه مهمة التلخيص.
```bash
python examples/tensorflow/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## التدريب الموزع والدقة المختلطة
@ -149,8 +125,6 @@ torchrun \
## تشغيل نص برمجي على وحدة معالجة الدقة الفائقة (TPU)
<frameworkcontent>
<pt>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. يدعم PyTorch وحدات معالجة الدقة الفائقة (TPUs) مع [XLA](https://www.tensorflow.org/xla) مجمع الدقة الفائقة للتعلم العميق (راجع [هنا](https://github.com/pytorch/xla/blob/master/README.md) لمزيد من التفاصيل). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتشغيل نص `xla_spawn.py` البرمجي واستخدم معامل `num_cores` لتعيين عدد وحدات معالجة الدقة الفائقة (TPU) التي تريد استخدامها.
@ -169,25 +143,6 @@ python xla_spawn.py --num_cores 8 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. تستخدم نصوص TensorFlow البرمجية استراتيجية [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) للتدريب على وحدات معالجة الدقة الفائقة (TPUs). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتمرير اسم مورد وحدة معالجة الدقة الفائقة (TPU) إلى حجة `tpu`.
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## تشغيل نص برمجي باستخدام 🤗 Accelerate
@ -324,7 +279,7 @@ python examples/pytorch/summarization/run_summarization.py
يمكن لجميع النصوص البرمجية رفع نموذجك النهائي إلى [مركز النماذج](https://huggingface.co/models). تأكد من تسجيل الدخول إلى Hugging Face قبل البدء:
```bash
huggingface-cli login
hf auth login
```
ثم أضف المعلمة `push_to_hub` إلى النص البرمجي . ستقوم هذه المعلمة بإنشاء مستودع باستخدام اسم مستخدم Hugging Face واسم المجلد المحدد في `output_dir`.

View File

@ -114,11 +114,8 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
### تصدير نموذج باستخدام `transformers.onnx`
<Tip warning={true}>
لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
</Tip>
> [!WARNING]
> لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:

Some files were not shown because too many files have changed in this diff Show More