Compare commits

...

379 Commits

Author SHA1 Message Date
c8121549b9 let's break all 2025-10-10 14:03:37 +02:00
792d14b503 how bad it woud be anyway? 2025-10-10 13:58:38 +02:00
60f6ec438a Fix detectron2 import (#41510)
* fix

* fix

* typo
2025-10-10 13:33:47 +02:00
f9f8bf5a10 Revert local_rank deletion and some cleaning (#41504)
* forgot those

* clean

* Fix

* merge

* fix

* fix
2025-10-10 12:23:04 +02:00
b4067472ae Bump to hfh 1.0.0.rc5 to fix test (#41508) 2025-10-10 12:12:08 +02:00
bc529a3368 More trainer cleaning (#41489)
clean
2025-10-10 11:55:43 +02:00
b92fc0c6e1 [QoL] modular conversion shows LoC saved (#41500)
smol qol conversion
2025-10-10 11:55:23 +02:00
2eae7c7452 Set truncation to False in Qwen3Omni to avoid default truncation (#41473)
* Set `truncation` to `False` in Qwen3Omni to avoid default truncation

* move `padding` and `truncation` to audio default args

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-10 09:55:18 +00:00
c5094a4f97 [voxtral] language detection + skipping lang:xx (#41225)
* proc + doc update

* improve doc

* add lang:xx in decode

* update voxtral test

* nit

* nit

* update test value

* use regex
2025-10-10 09:18:30 +00:00
f4487ec521 fix gemma3n case failure (#41426)
* fix gemma3n case failure

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update dependency_versions_table.py

* change the case argument passing way to make the case PASS,
generation_config way need re-visit

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-10 09:15:27 +00:00
e8194fe84f Fix some tests (#41503)
* fix

* fix

* doc
2025-10-10 11:05:09 +02:00
9556b36b2f [causallm tester] automate pipeline mappings + bloom tests (#41318) 2025-10-10 10:02:00 +01:00
5aca530b34 [Parakeet] unnecessary warning & auto mapping (#41412)
* add parakeet to CONFIG_MAPPING_NAMES

* TOKENIZER_MAPPING_NAMES update

* fix auto tokenizer

* update

* fix
2025-10-10 11:00:15 +02:00
4f323369db Fixed tiny incorrect imports in glm4v (#41483)
Fixed tiny import issue in glm4v
2025-10-10 08:57:01 +00:00
f5f3457278 Try to remove pickle - BloomTokenizerFast (#41466)
* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 10:52:51 +02:00
3585737746 [kernels] rm yoso kernel (#41495)
* disable kernel mapping

* rm kernel

* delete files

* style

* typo
2025-10-10 10:50:12 +02:00
b543679d0e [kernels] Remove RWKV kernel finally ! (#41493)
* rm kernel

* fix style
2025-10-10 10:32:05 +02:00
ac7777be16 fix bnb model loading (#41499) 2025-10-10 08:27:29 +00:00
17c31a98ac Streaming should be handled at the request-level rather than at the istance level (#41444)
* Streaming should be handled at the request-level rather than at the instance level

* Add tests

* Require torch GPU
2025-10-10 10:24:55 +02:00
b28902c86b Remove DISABLE_KERNEL_MAPPING flag (#41475)
rm disable
2025-10-10 10:19:25 +02:00
d0271be18f Update philosophy (#41438)
* update philosophy

* Update docs/source/en/philosophy.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* emphasis

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-10 06:52:18 +00:00
0419ff881d Remove local_rank arg from TrainingArguments (#41382) 2025-10-09 18:54:12 +02:00
081391b20e deprecate jit_mode_eval (#41376) 2025-10-09 18:50:45 +02:00
1ddbbdef48 [Trainer] deprecate ray scope (#41403) 2025-10-09 18:50:00 +02:00
c20849bad1 [CI] Fix copies on main (#41486)
fix copies
2025-10-09 18:38:14 +02:00
776eea8612 deprecate overwrite_output_dir (#41323)
* dep

* style

* rm

* wut

* style
2025-10-09 18:36:19 +02:00
3839d51013 report_to default changed to "none" + cleaning deprecated env var (#41375)
* reporting

* fix

* fix
2025-10-09 18:28:48 +02:00
78f79ba5af Update GLM-4.6 doc (#41471)
Update glm4_moe.md
2025-10-09 09:18:05 -07:00
11c597b1b8 Remove deprecated args in Trainer for v5 (#41404)
remove deprecated code
2025-10-09 18:10:14 +02:00
b450d55a91 Remove past_index (#41384)
* remove-tpu-num-cores

* fix

* rm past index

* Revert "fix"

This reverts commit 7608a6c059210957d3a77812e66178c8b79a9313.

* Revert "remove-tpu-num-cores"

This reverts commit ef08a51d71389849851518d67d8ad6c9ea8f04fc.
2025-10-09 18:06:46 +02:00
1a3a5f5289 Remove SigOpt (#41479)
* remove sigopt

* style
2025-10-09 18:05:55 +02:00
823fab4860 Fix bnb fsdp loading for pre-quantized checkpoint (#41415)
* fix

* fix

* get_param_name

* fix device name
2025-10-09 18:05:35 +02:00
42d4e13a0b RT-Detr correct 2d positional embeddings for non-square images (#41380)
* Correct 2d positional embeddings for non-square images

* Simplify bug fix propagate changes to other models

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-09 17:58:22 +02:00
0eae41ad36 Add Code World Model (CWM) (#41199)
* [wip][cwm] Code World Model stubs and setup in HF Transformers

* [wip] Get other things working

* [wip] Working

* Tokenizer pad

* fix: cwm window attn

* temp remove test

* temp remove test

* Fixes

* Temporarily add auto config remapping option until VLLM 0.11 is out

* Fix model type and add layer validation

* Lint, remove CwmForSequenceClassification

* Lint, tests

* Remove CwmForSequenceClassification

* Lint

* Remove intermediary layer expors/doc errorss, fix tests

* Lint

* run python utils/sort_auto_mappings.py --check_only

* Remove Cwm processor mapping, get check_repo passing

* Remove CwmTextConfig from test

* Add docstring for CwmConfig

* remove global_window and window_pattern params from config

* Fix docstrings

* Revert change to auto docstring util

* lint

* Fixes minus test improvements

* Alter tests to simply check logits

* lint

* Have slow tests use repo, make CwmPretrainedModel passthrough

* Remove decoder layer implementation, use Llama3Decoder + CwmAttetion

* Use linear w/o bias for CwmAttention, add token-level integration test

* Don't ignore config attention bias

* Remove attention bias parameter entirely from config

---------

Co-authored-by: galco <galco@meta.com>
2025-10-09 17:57:45 +02:00
589fc29c9d enhance patched_tearDown to support python 3.11+ (#41429)
* enhance to support python 3.11+

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-09 21:19:29 +05:30
26b5b52676 [Fix] Fix test file error (#40973)
Fix test file error
2025-10-09 15:30:53 +00:00
34b861abd1 🚨 [Attention Masks] Bidirectional masks for encoder and encoder-decoder models (#41265)
* new masks

* fixes

* adjust comments

* fix unnecessary mask creation on sdpa

* simplify masks more

* propogate to other models

* style + repo consistency

* copies

* no comment

* fix attempt

* finally fix grounding dinos

* fix distilbert

* fix executorch

* move to own module

* address first few comments WIP

* revert device comments, simplify executorch further

* fix typo

* add a test for cuda graphs

* move cleanup...

* fix conflict with new main

* fix esm and evolla
2025-10-09 16:56:11 +02:00
b44d91570f [v5] remove load_in_4bit and load_in_8bit (#41287)
* [v5] remove load_in_4bit and load_in_8bit

* fix

* reveert

* fix

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-09 16:34:04 +02:00
d99069195b Cleaning hub kernels (#41477)
* disable kernel mapping

* cleaning

* revert

* fix style
2025-10-09 16:32:18 +02:00
bf38b2d11d Change RT-Detr docs to reflect fixed 640x640 input size (#41364)
* Update rt_detr docs to mention 640x640 input size

The authors of RT-Detr mention that the model was trained on 640x640 images and was meant to be used for inference on 640x640 images.
Also, the current implementation has certain quirks that make training/inferring on images of different sizes problematic. For example,
the pixel masks used for batches of varying image sizes are discarded. I've added a few lines in the docs to notify the user about these issues.

* Batching not possible with variable image sizes

* Remove reference to batching

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
2025-10-09 14:29:16 +00:00
72a3fc275c Remove infer_device (#41088)
* Remove infer_device

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix docs using accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 14:05:39 +00:00
9ef804472b Pickle - part 2 (#41476)
* pickle 2

* pickle 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-09 13:46:53 +00:00
2b5e4c0d13 Import Callable from collections.abc (#41130)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 12:12:43 +00:00
add4df62ba Fix tests fsdp (#41422)
* Fix tests

* fix !

* fix
2025-10-09 14:09:52 +02:00
3e87072666 Fix auto model configuration for encoder of perceptionlm (#41464)
* fix auto model configuration for encoder of perceptionlm

* delete perception_encoder auto registrations
2025-10-09 14:08:03 +02:00
f0544d7e7c Remove KERAS_NLP_IMPORT_ERROR (#41468)
Remove unused variables of error messages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 11:58:30 +00:00
d1c6310d6a 🚨 [v5] Rendundant code in nested configs (#41314)
* batch update models

* delete even more

* fix modular super init location

* fix

* fix copies

* fix again, these have force-set values in configs

* fix copies
2025-10-09 13:47:44 +02:00
927aa8bef2 [kernels] Cleanup deta kernel (#41470)
* cleanup deta kernel

* fix modeling
2025-10-09 13:17:42 +02:00
1951f3be8e Update GLM-4.1V MMRope implementation (#41182)
* update for 4D mask

* update

* Update modular_glm4v.py

* 1

* Revert "1"

This reverts commit d13a763e876fa049c5fb70a8b3447b335dbb6098.

* update as glm4v logtic

* update

* 1

* update

* Create convert_glm4v_moe_mgt_weights_to_hf.py

* update

* update
2025-10-09 12:15:47 +02:00
f50fd7fb6b [v5] rm utils/tf_ops/ (#41402)
rm utils/tf_ops/
2025-10-09 10:27:47 +01:00
be3fa93b29 Subconfig is a class attribute (#41308)
* delete

* fix this test

* fix copies

* oke, more tests to fix

* fix last tests on DPT

* deleted accidentally
2025-10-09 10:46:44 +02:00
8137dbdbbd 🚨 [v5] Rename left traces of past_key_value in BERT-like models (#41448)
rename everything
2025-10-09 10:44:44 +02:00
7aa888b7fa Fix doc (#41457)
* dummy

* remove
2025-10-08 20:13:21 +02:00
bfe2b623ef Fix generate outputs and simplify cache tests (#41440)
* start refactoring

* simplify

* tests

* tests

* fix

* zamba

* final fix

* fix
2025-10-08 19:04:18 +02:00
b9be8a8775 enable some falcon-mamba uts on xpu (#41428)
* enable some falcon-mamba uts on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 18:48:04 +02:00
bef73bf8d7 Update hqq.md (#41452)
mistake in loading model
2025-10-08 07:44:56 -07:00
89a4115a6b Validate processing kwargs with @strict from huggingface_hub (#40793)
* initial design draft

* delete

* fix a few tests

* fix

* fix the rest of tests

* common-kwargs

* why the runner complains about typing with "|"?

* revert

* forgot to delete

* update

* fix last issues

* add more detalis in docs

* pin the latest hub release

* fix tests for new models

* also fast image processor

* fix copies

* image processing ast validated

* fix more tests

* typo.and fix copies

* bump

* style

* fix some tests

* fix copies

* pin rc4 and mark all TypedDict as non-total

* delete typed dict adaptor

* address comments

* delete optionals
2025-10-08 16:14:09 +02:00
82ffeb28ad Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)
* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
2025-10-08 13:37:51 +00:00
e064dc05c2 [testing] Fix JetMoeIntegrationTest (#41377)
* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-08 13:11:53 +00:00
20282f13fa [JetMoe] Fix KV head repetition and padding free (#41423)
fix jetmoe
2025-10-08 14:27:22 +02:00
c528f50663 Remove Python 3.9 classifier (#41410)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:20:36 +00:00
8dfc8e8cfc 🤦 CB nit! (#41413)
* 🤦

* updates

* update cb simple

* merge

* up

* update

* fix

* up

* nit

* rumble this is annoying

* update

* update

* up

* fix

* ....

* cleanup a bit

* nit

* typo

* typing and typo

* nit

* updates

* up

* final fix!

* update

* fix more import issues

* nuke is paged

* up
2025-10-08 13:36:27 +02:00
2166e26cb1 [torchao] Add regex support for ModuleFqnToConfig (#41242)
* Add regex support for ModuleFqnToConfig

Summary:
Similar to https://github.com/pytorch/ao/pull/3084 we added regex support
in transformers so people can use regex to quantize the models.

See https://github.com/pytorch/ao/pull/3084 for docs and precedence of different
configurations

Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev

Test Plan:
pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex

Reviewers:

Subscribers:

Tasks:

Tags:

* Apply style fixes

* add assert for

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-08 11:05:15 +00:00
b13ee63b5a enable new model uts to xpu and fix some failures on xpu (#41386)
* enable new model uts to xpu and fix some failures on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_internvl.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* Update test_modeling_llava_next_video.py

* Update test_modeling_qwen3.py

* Update test_modeling_whisper.py

* Update test_modeling_whisper.py

* Update test_modeling_llava.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 10:14:50 +00:00
1c5ac899e8 Use accelerator API to free device memory (#41195)
* Use accelerator API to free device memory

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use clear_device_cache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:11:18 +02:00
957b1f3696 Fixing comments in __init__ file (#41414)
nit
2025-10-08 12:07:26 +02:00
13791d8f48 [v5] Bump min version of bitsandbytes to 0.46.1 (#41283)
* bump bitsandbytes to 0.46.1

* huge cleanup

* style

* fix

* req

* fix

* importerror

* fix
2025-10-08 12:04:26 +02:00
7e475552be 🚨 [v5] Prune prune_heads (#41417)
* remove _prune_heads

* remove prune_heads

* finalize the purge

* remove another patterns
2025-10-08 10:25:13 +01:00
46db0edf3b 🚨🚨 Remove all traces of legacy cache format (#41378)
* remove

* more

* add back

* tests

* revert classes

* tests

* add exceptions

* reapply modular

* rename

* oupsi

* start with whisper

* fix tests

* fix

* fix

* fix

* typing
2025-10-08 11:14:44 +02:00
ee5488440b Tiny Cleanup - Removed duplicate class field definition's (#41293)
* Removed duplicate-class-field-definition
's using RUFF PIE794

* Removed duplicate-class-field-definition
's using RUFF PIE794

* Ruff format.

* Removed duplicate-class-field-definition

* Added New ruff rule to detect duplicate class field defs

* remove comment

* order

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-08 10:49:34 +02:00
34dcd73b57 v5 dev version (#41436) 2025-10-08 10:45:33 +02:00
3553f0bc23 Fix overriding common_kwargs defaults in processor calls (#41381)
* set common_kwargs defaults before updating with kwargs

* change order to override defaults common_kwargs
2025-10-07 23:13:56 -04:00
242eb9cbdc Remove deprecation warning (#41425)
* remove

* fix space
2025-10-07 19:21:14 +02:00
50090c3fc8 [v5] Delete left traces of feature extractor (#41321)
delete the left traces
2025-10-07 18:24:08 +02:00
ccbaa1670a Fix incorrect assignment in update_device_map for GPTQ quantizer (#41328)
Fix incorrect assignment in update_device_map for GPTQ quantizer

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-07 17:28:55 +02:00
c562c5d801 [v5] Bump accelerate to 1.1.0 (#41234)
* bump to 1.1.0 !

* bump accelerate

* fix

* None

* fixed !

* style
2025-10-07 17:18:32 +02:00
88e946e062 Fix early CUDA initialisation (#41409)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-07 14:37:17 +01:00
93464a0279 Prefer raising TypeError exception for invalid type (#41346)
* Fixed raising of TypeError exception for invalid type

* Fixed failing tests.
2025-10-07 13:11:42 +00:00
0c9a72e457 [Model] Lfm2Moe (#41401)
* [new-models] LFM2-MoE

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] add in template lfm2_moe doc files

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration] update configuration class

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm] minor: fix rotary_emb typo

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling] modular/modeling files for Lfm2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] fix Lfm2Moe modular/modeling

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] update configuration keys with latest config changes

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make fixup

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] address comments: dtype, mlp, buffers

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] add initializer_range

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] include init_weights to pass test_initialization

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests][causal_lm] include pos_emb as possible rope attribute

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] remove load_balancing_loss_func due to lack of support for hooking expert biases

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make style

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] MoE refactor PR update in LFM2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2_moe: unit tests

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] update LFM2-8B-A1B repo id

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2: update ModelTests for lfm2

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* Update LFM2 documentation

Updated the LFM2 documentation to reflect the addition of a new model size and clarified architectural details.

* Add Lfm2Moe documentation

Add Lfm2Moe model documentation with overview and example usage.

* [misc] fix ci

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] remove trust_remote_code

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] ci: fix modular

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* reapply modular

* simplify

* remove static address and inplace op

* simplify

* simplify a bit more the modular

* imports

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-07 15:09:58 +02:00
b4428d545f Fix test for model with dotted name and relative imports (#41343) 2025-10-07 13:55:54 +01:00
0464d9eb37 [Cache] lfm2 cache: allocate empty kv layers during init (#41396)
* [Cache] lfm2 cache: allocate empty kv layers during init

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [Cache] lfm2_cache: update modular file

Signed-off-by: Paul Pak <paulpak58@gmail.com>

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-07 14:01:31 +02:00
da7b8ce11f [kernels] Kernel Config (#41232)
* first config

* add kernel_config

* add import logic

* fixing style

* compare class name

* add comments

* rm import

* adding kernel md files

* add to toctree

* adding to main_classes

* simplify required config

* add to doc

* style

* store the mapping

* remove nested func

* add hub mixin

* fix

* imports

* fix
2025-10-07 13:58:20 +02:00
4763b8c5b8 Correct numerical regression in vision embeddings (#41374)
created modeling file
2025-10-07 13:43:24 +02:00
caa14e7dab fix resample in asr pipeline (#41298) 2025-10-06 17:31:10 +00:00
73f8c4b8ad fix asr ut failures (#41332)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-06 17:12:19 +00:00
57e82745f9 [v5] Sync Bert and Bart eager attention (#41248)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops

* sync attn impl

* fix style/copies

* fix distilbert

* remove dim check
2025-10-06 18:49:01 +02:00
505387c05b Update from pretrained error when loading (#33380)
* init commit

* style

* take comments into account

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* super small update!

* add another test

* up up

* update

* fixes

* sort them by default
2025-10-06 16:10:19 +00:00
e00f46f16e serve: add non-streaming mode to /v1/responses; stream event parity; remove placeholder logprobs (#41353) 2025-10-06 16:04:17 +00:00
0395ed52ae [CB] Refactors the way we access paged (#41370)
* up

* refactor the way we handle paged attention

* affect serve as well

* update

* fix

* cup
2025-10-06 17:55:31 +02:00
39b0c9491b Remove unused function patameters (#41358)
Remove unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 15:38:17 +00:00
11e4b5e5ee make some ut cases pass on xpu w/ latest torch (#41337)
* make some ut cases pass on xpu w/ latest torch

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_llava_onevision.py

* Apply style fixes

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-06 15:38:00 +00:00
fa36c973fc Remove unnecessary list comprehension (#41305)
Remove unnecessary comprehension

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 14:49:02 +00:00
7a1aeec36e Fixes in check_model_inputs, GPTBigCodeModel and ImageGPTModel (#40811)
* misc fixes

* fix

* Update src/transformers/models/imagegpt/modeling_imagegpt.py

* Apply suggestion from @IlyasMoutawwakil

* pickup use_cache from args input as well

* fix
2025-10-06 16:34:24 +02:00
297a41a6cf Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939 (#41284)
* Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939

* Fix import sorting/style

* Fix import order

* Refactor: use canonical get_size_with_aspect_ratio across image processors (except YOLOS)

This commit updates image processing utilities in multiple model processors to use the shared
transformers.image_transforms.get_size_with_aspect_ratio for consistent resizing logic and
aspect ratio handling.

YOLOS processors are intentionally left unchanged in this commit to preserve their current
behavior and avoid breaking model-specific padding/resizing assumptions. YOLOS will be updated
in a dedicated follow-up PR once compatibility is fully verified.

* ruff fixes

* Fix check_copies.py references for get_size_with_aspect_ratio to use canonical transformers.image_transforms version

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-06 10:15:56 -04:00
ae60c77689 Fix flash_attention.py: wrong argument passing for attn_implementation (#41347)
* Fix flash_attention.py: wrong argument passing for attn_implementation

The name of the attn type argument for `_flash_attention_forward()` should be `implementation`, instead of `attn_implementation` which currently uses in the function call. This would result in wrong type specification.

* modify the kwargs inside _flash_attention_forward

* fix the doc

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-06 15:36:40 +02:00
6bf6e36d3b [testing] update test_longcat_generation_cpu (#41368)
* fix

* Update tests/models/longcat_flash/test_modeling_longcat_flash.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-10-06 13:21:29 +00:00
4903cd4087 🚨 Remove BetterTransformer (#41367)
remove
2025-10-06 15:18:12 +02:00
a5700c497e Better typehints for apply_chat_template (#41355) 2025-10-06 13:14:03 +00:00
089d573aca Fix typo in model proposal template (#41352) 2025-10-06 13:06:50 +00:00
c27b67f0cd 🚨 [v5] Remove relative position embeddings (for bert like models) (#41170)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops
2025-10-06 14:21:41 +02:00
a89bdcf5f1 Fixing a typo for BLT model (#41325) 2025-10-06 12:16:45 +00:00
0452f28544 [ModularChecker] QOL for the modular checker (#41361)
* update

* fancy table fancy prints

* download to cache folder, never need it everagain

* stule

* update based on review
2025-10-06 12:52:10 +02:00
9db58abd6e Check model inputs - hidden states (#40994)
* update all models

* fix copies

* skip aria tests

* update other models

* skip should be in test, not tester

* i think this is more descriptive as a name

* find and replace for new models
2025-10-06 11:48:52 +02:00
db711210d2 Fix trainer for py3.9 (#41359)
fix
2025-10-06 11:36:05 +02:00
163601c619 Standardize PretrainedConfig to PreTrainedConfig (#41300)
* replace

* add metaclass for full BC

* doc

* consistency

* update deprecation message

* revert
2025-10-06 11:34:02 +02:00
55b172b8eb 🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence (#41268)
* cleanup

* add check

* fix

* remove all global variables

* fix

* add lru caches everywhere

* fix

* fix

* style

* improve

* reorder all functions

* fix order

* improve

* fix

* fix

* fix
2025-10-06 11:04:19 +02:00
1ec0b54414 Rope for Qwen2--5-vl (#41173)
qwen2--5-vl
2025-10-06 10:56:29 +02:00
0947b9042c Fixed tiny incorrect import in gemma3 (#41354)
Fixed tiny import issue in gemma3
2025-10-06 10:55:42 +02:00
e11a00a16f JetMoe Fix jetmoe after #40132 (#41324)
* update

* up
2025-10-04 11:02:13 +02:00
1bc75db9bd Fix lr_scheduler_parsing (#41322)
* fix

* fix
2025-10-03 17:51:17 +02:00
c2b3cc3e64 Fix jamba (#41309)
* reactivate tests

* first pass

* fix

* fix bias

* fix and simplify

* finally fix this stupid bug

* add skips

* remove bad stuff

* fix copies

* simplify
2025-10-03 16:54:19 +02:00
5abfa43f02 Security/fuyu (#41320)
remove reference to compromised repo
2025-10-03 14:13:41 +00:00
217ff1e4ef AutoAWQ tests (#41295)
* initial commit

* fix

* fix multi gpu

* fix expected output

* fix

* latest

* add comment

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-03 15:17:10 +02:00
5339f72b9b 🚨 [unbloating] unify TypedDict usage in processing (#40931)
* just squash commits into one

* fix style
2025-10-03 14:17:59 +02:00
42bcc81ba2 Minor security fix for ssh-runner.yml (#41317)
security issue

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 14:14:34 +02:00
cd4422922e Add modular detector (#41289)
* doc

* doc

* no remote code

* safe-ize the release + remove remote

* fixes

* add some documentation as well
2025-10-03 14:11:10 +02:00
59eba49237 download and use HF Hub Cache (#41181)
use hub cache

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 11:11:37 +02:00
de3ee737cf Fix README.md error when installing from source (#41303) 2025-10-02 16:08:27 -07:00
b914445f77 Italian translation for README.md (#41269)
chore: add Italian translation for README.md
2025-10-02 15:59:28 -07:00
41e5abac5c FIX: Bug in PEFT integration delete_adapter method (#41252)
The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.

Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.

(Note: I tested locally with that the test will pass with PEFT 0.18.0)

While working on this, I also cleaned up the following:

- The active_adapter property has been deprecated for more than 2 years
  (#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
  the docstrings, which have been addressed.

When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.
2025-10-02 18:36:57 +02:00
da3c7d1d36 🚨 [DistilBert] Refactor Attention (#41163)
* refactor

* allow pos ids for flattened sequences
2025-10-02 17:50:48 +02:00
e54defcfc2 [Flex Attn] Fix lse x attention sinks logic (#41249)
fix
2025-10-02 17:49:39 +02:00
b3bd815786 Fix mxfp4 dequantization (#41292)
fix
2025-10-02 16:47:42 +02:00
e4930d6bde 🚨 [V5] Remove deprecated resume_download (#41122)
Remove deprecated `resume_download`

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 16:44:34 +02:00
7adb43e60a Build doc in 2 jobs: en and other languages (#41290)
* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 14:33:57 +00:00
e1f1d32af0 Remove some previous team members from allow list of triggering Github Actions (#41263)
* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 16:32:28 +02:00
1d7ebff398 Fix - remove deprecated args checking in deepspeed intergrations (#41282)
Remove deprecated args checking in deepspeed intergrations

Signed-off-by: nguyen599 <pnvmanh2123@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 13:59:50 +00:00
9d02602f0f Remove test_initialization (#41261)
remove it
2025-10-02 15:23:43 +02:00
248e7ef8bc [docs] remove references to recently deleted classes in non-en docs (onnx, feature processors) (#41286)
remove references to old classes
2025-10-02 12:59:28 +00:00
bc33fd3fc2 Add processor and intergration test for qwen3vl (#41277)
* support aux loss in qwen3vlmoe

* update qwen3vl processor test!

* add integration tests for qwen3vl-30a3

* remove duplicated decorator

* code clean

* fix consistency

* do not inherit from nn.Linear for better quantization

* pass check
2025-10-02 14:59:04 +02:00
639ad8ccd9 feat: use aws-highcpu-32-priv for amd docker img build (#41285)
* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI
2025-10-02 12:53:14 +00:00
894a2bdd8c Fix pylint generator warnings (#41258)
Fix pylint generator warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-10-02 12:35:42 +00:00
1cc9069551 Fix unnecessary single-item container checks (#41279)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:35:11 +00:00
4f286fbbf8 Biogptlogits (#41270)
added logits slicing to BioGpt for seq classifier

Signed-off-by: Aviral <aviralkamaljain@gmail.com>
2025-10-02 12:33:48 +00:00
1d91a8a454 Use max/min (#41280)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:15:27 +00:00
f1b64c5b06 Unify is_torchvision_v2_available with is_torchvision_available (#41259)
Fix is_torchvision_v2_available

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 11:56:37 +00:00
2f3e266692 fix async client for transformers chat (#41255)
* fix-client

* fix
2025-10-02 13:23:37 +02:00
313504bcdd 🚨 [v5] remove deprecated generate classes (constraints and beam scorers) (#41223)
rm
2025-10-02 12:11:11 +01:00
8f14300663 Allow private Space id for Trackio (#40948)
* allow prive space id for trackio

* complete docstring
2025-10-02 12:38:25 +02:00
734732140a Deprecate Trackio environment variables and deploy to Spaces by default (#40950)
* allow prive space id for trackio

* complete docstring

* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default

* style

* Enhance documentation for Trackio Space ID in TrainingArguments
2025-10-02 12:37:55 +02:00
7938e91faa MoE + vllm = 😻 (#40132)
* update modeling mixtral

* oups[13;2u

* fix

* better naming?

* compute softmax and top_k inside the experts

* update minamax as well

* models that will need an update

* more models that need a fix

* stash

* fix mixtral

* update olmoe

* update

* update

* current changes

* nits

* molmoe is now fixed

* olmoe is good to go!

* refactor qwen2_moe

* fixes

* fixed moe

* fix qwen2 modular

* nit

* qwen2_moie test script works

* tricky rope !

* fix qwen3

* DeepSeek v3 MoE Standardization (#40538)

* DeepSeek-v3

Shared

Shared

* Dependents of DS3

* Standardize GLM4V MoE (#40539)

* up

* Standardize VitPose's MoE (#40549)

* VitPose

* outside

* outside

* outside

* fix

* update dbrx

* dbrx... the magix

* Refactor Ernie 4.5's MoE (#40547)

* Isolate Ernie fixes

* fix moe

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>

* fix style

* style

* fix copies

* style

* latest changes

* fixes

* had to stage

* current updaters

* up

* another modular

* modular graniteMoe

* some update

* draft another modular moe

* updaters

* up

* fix nit

* q3 nit

* fix phi moe

* we're going up up up up its our mooooment

* fix switch transformers this time around

* up

* gptsan japanese is deprecated forget about it

* fix mixtral to not be a linear (gives us more freedom)

* update

* fix copies gone wrong try catch nothing

* fix mixtral

* new refactor again

* update aria as well

* up dbrx and deepseekv3

* nit

* fix phimoe?

* fix deepseek v3

* nits

* don't bother with this one please

* up olmoe

* ??

* fix olmoe

* yups

* fiupx

* ish

* hot patch

* new qwen3

* updates

* up

* nit

* fix copies

* fix

* nits

* we're going up up up

* nits

* switch_transformesr edge case

* lol modular gptsan?

* fix deepseek

* finally all modeling match modular

* update

* up

* up

* dang

* up

* up aria

* fix dbrx

* nits here and there

* finish fixing dbrx

* fix deepseek

* upd

* up

* fix flex olmo

* updated

* update jamba

* JAMBA is stil a bit todo

* forward forward

* fix dots11

* update

* fix hunyuan

* fix some other

* update phimoe

* fuck you phimoe you are now submitted

* submit granitemoe as well

* try to fix some other models, reduces some of the failures

* fix olmoe and qwem2moe

* up

* up

* fix qwen2_moe

* update modular make it again, simpler

* nits

* up

* up

* fix

* someswitch reductions

* up

* fix qwen3vl

* some fixes to jetmo

* these should be shipped to the modular to fix jetmoe

* fix most of the nllb failures

* more nllb fixes

* fix the modular

* remove nllb modular as it sucks for now

* ?

* fix granitemoe

* granitemoehybrid don't have rope

* use rope when rope, no rope when no rope

* updates

* finish fixing dumbgrainite

* fix most of minimax

* fix

* update modular

* ?

* up

* up jetmoe still broken

* up

* fix, now align the moe

* fix jetmoe

* fix styling and qwen3 repo consitency

* updatge

* up up

* update ruff?

* nits

* modeling is goot now for switch

* fix

* more fixses to switch!

* fix some siwtch test

* ?

* ?

* up

* fix switch modular!

* nit?

* uip

* subtest

* can't believe I wasted so much time on this...

* fix

* updates

* nits

* nit jamba is fucking annoying

* ?

* fix?

* oups

* good good

* styling

* up

* make sure qwen2 sliding works!

* fix dbrx small

* lol

* nits

* fix one test

* fix load balancing loss issue

* fix jamba

* fix nllbmoe

* fix jamba consistency and doc?

* up

* thse are correct

* up

* up

* up

* some of the final cleanup

* update

* up

* fix some revert in granimoe

* bring back attention multipliers for the granite family we'll see later on if they need removal

* small jamba fix docstring and typing

* fix phimoe

* yup

* fix unk returndict in granitemoes

* up

* fix qwen config

* fix phiemoe check quality

* nits

* update based on caught non relative imports!

* fix dbrx

* Apply suggestions from code review

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix copies

* fiuxp

* fix dot1 regression!

* fix phimoe issue

* fix phi moe

* fix float() for some models

* fix jamba regression

* ui

* more dtype issues

* fix deepseek2 and 3?

* proper update

* fix modular deepseek!

* jamba jambaaaaaa

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-02 12:12:44 +02:00
e6a8e7debe Fix binding of video frames to video placeholder in InternVL model (#41237)
* Fix binding video frames to video placeholder in prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add test on binding video frames to prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix code style issues

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix broken tests on `InternVLProcessor`

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add `return_tensors` to video processor defaults

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

---------

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
2025-10-02 09:43:35 +00:00
30b79effb5 Remove SageMakerTrainer (#41267)
* Remove SageMakerTrainer

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More removal

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 09:16:32 +00:00
aabf0a03cb Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)
* fix multi-video timestamp bug in qwen3vl,glm4v

* run make fix-copies to sync modular files

* run make fix-copies to sync modular files

---------

Co-authored-by: UBT <daqin.luo@ubtrobot.com>
2025-10-02 11:15:57 +02:00
bcdd5532bf Use regex defailed flags (#41264)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 08:34:09 +00:00
55d63e86ea fix asr pipeline ut failures (#41275)
* fix asr pipeline ut failures

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* make style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-02 10:32:03 +02:00
522b79a346 add more activation kernels, follow up (#40944)
* add more activation kernels

* fixing style

* fix version
2025-10-02 08:45:05 +02:00
9f2d5666f8 docs: update bitsandbytes platform support (#41266) 2025-10-01 14:27:19 -04:00
9d8f693c7e add peft team members to issue/pr template (#41262)
* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-10-01 17:26:59 +00:00
94bbf8e199 Resolve remote custom module path warnings (#41243) 2025-10-01 15:55:42 +00:00
c4b505d0f7 Don't convert to safetensors on the fly if the call is from testing (#41194)
* don't convert

* disable

* Update src/transformers/modeling_utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

* disable

* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-01 17:46:21 +02:00
01c9e1ba68 [t5gemma] fix get_text_config and related fixes (#40939)
* tmp commit

* t5gemma fixes
2025-10-01 15:55:26 +01:00
025531981c [FA3] Fix masking and loading logic in same process (#41217)
fix loading and fa3 masking
2025-10-01 16:36:12 +02:00
3256773974 FP-Quant NVFP4 and Python 3.9 support (#39876)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

* nvfp4

* nvfp4 tests

* FP-Quant version bumped

* nvfp4 default and docs update

* trainable

* cpu if pseudoquant

* proper group size selection

* gsr

* qutlass requirement version bumo

* Upstream docker copy

* docs update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-01 13:58:22 +00:00
d848a3953a Remove all instances of is_safetensors_available (#41233)
* safetensors is a core dep

* fix

* ok

* simplify branching

* keep it for now

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-01 13:57:28 +00:00
e4913bdf50 🚨 [v5] Remove SinkCache (#41107)
Remove SinkCache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:46:55 +00:00
1c8f206ecc Fix pylint warnings (#41222)
* Remove unused variables

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove reimported packages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:16:22 +00:00
3016717f0d Use removeprefix and removesuffix (#41240)
* Use removeprefix and removesuffix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:13:04 +00:00
ca975f1cb8 [V5] Remove deprecated transformers.onnx (#41214)
* Remove deprecated transformers.onnx

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove onnx docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-01 12:17:04 +00:00
1d1ac07893 [repo utils] Update models_to_deprecate.py (#41231)
* update models_to_deprecate

* exclude this file

* handle typos and aliases

* don't commit files

* PR suggestions; make fixup
2025-10-01 12:01:52 +00:00
bcec3e2175 fix TrainerIntegrationDeepSpeed UT failures (#41236)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-01 13:55:01 +02:00
ae879f67f8 🚨 [v5] Delete feature extractors used for vision (#41174)
* bye bye

* remove from docs

* do not use feature extractor here

* fix docs

* do not delete it

* forgot these
2025-10-01 13:20:58 +02:00
1c4d9982d3 Use math.log2 (#41241)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 09:52:31 +00:00
db1cc65c06 Video processor accepts single frames on cuda (#41218)
* fix

* why was is np if input is in torch
2025-10-01 10:55:11 +02:00
f22cb1e868 fix qwen text config (#41158)
* fix qwen text config

* fix tests

* fix one more test

* address comments
2025-09-30 17:23:44 +00:00
374ded5ea4 Fix white space in documentation (#41157)
* Fix white space

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix autodoc

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 09:41:03 -07:00
16a141765c [docs] Fix tp_plan (#41205)
remove manual
2025-09-30 09:27:50 -07:00
5d1e853032 [Trainer] deprecate num_train_tokens (#41165)
* dep

* fix

* fix
2025-09-30 15:53:16 +00:00
cecd92849e [v5] Remove train kwargs (#41127)
* rm train kwargs

* fix
2025-09-30 17:43:25 +02:00
103fa6d235 [v5] Remove deprecated prediction loop (#41123)
* rem deprecated

* more

* rm all instances of legacy arg
2025-09-30 17:43:01 +02:00
aa3e8798ba [v5] Remove tokenizer from Trainer (#41128)
* tokenizer deprecated

* style

* forgot this

* style
2025-09-30 17:42:10 +02:00
e99dee6470 Remove old sagemaker api support (#41161)
* fix

* fix
2025-09-30 17:41:52 +02:00
dded9fd112 [v5] More Training Args cleaning (#41131)
clean
2025-09-30 17:38:07 +02:00
6fb6117abe Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)
* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"

This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.

* fix
2025-09-30 17:37:42 +02:00
5bdb70450d Fix sliding window attn mask (#41228)
* Fix sliding window attn mask

* Clearer test

* Apply style fixes

* If Picasso made ascii drawings he would have made this

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 17:22:53 +02:00
a61fc6a0b9 Fix typing of train_args (#41142)
* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix fsdp typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:28:02 +00:00
919a4845fb Unify is_torchvision_v2_available with is_torchvision_available (#41227)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 15:21:49 +01:00
8e7b0655f1 update code owners (#41221)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-30 16:21:19 +02:00
2dd175e6bb Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)
Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.

Co-authored-by: frozenleaves <frozen@Mac.local>
2025-09-30 14:19:57 +00:00
cf0887f62c Remove old Python code (#41226)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 14:15:59 +00:00
52f5eca7c9 🚨 [v5] Remove headmasking (#41076)
* first attempt at removing

* copies

* last bits in core

* quick fixes

* tests purge

* docs and examples

* some fixes

* more

* another round of cleanups

* fix

* fix a bunch of models

* fix dummy bert

* fix

* fix new model

* fix signature change

* fix

* fix style/copies

* new models

* fix copies didnt find that damn

* test

* this shouldnt have happened during model addition
2025-09-30 16:04:57 +02:00
a80f05dfcb [generate] cache missing custom generate file (#41216)
* cache missing custom generate file

* make fixup
2025-09-30 13:32:24 +00:00
1f1e93e095 Align pull request template to bug report template (#41220)
The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.
2025-09-30 14:25:41 +02:00
2a596f5b2f [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)
add accepts_loss_kwargs=False to EsmPreTrainedModel

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 12:06:47 +00:00
3edd8048b0 Trainer: Pass num_items_in_batch to compute_loss in prediction_step (#41183)
* Add num_items_in_batch computation to predict_step.

* address comments.

* Fix test cases.

* fixup

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-30 09:45:17 +00:00
59035fd0e1 Avoid assumption that model has config attribute in deepspeed (#41207)
Avoid assumption that model has config in deepspeed
2025-09-30 11:42:50 +02:00
d97397787e Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)
* Update trainer.py

* fix

* fix format

* move barrier, delete redundant
2025-09-30 11:41:03 +02:00
06c04e0851 Deprecate half_precision_backend (#41134)
* deprecate

* remove

* rm apex

* fix

* fix

* fix doc
2025-09-30 11:36:44 +02:00
0e5a975608 Fix Qwen3-Omni audio_token_id serialization issue (#41192)
Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map

- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles

Fixes #41191
2025-09-30 11:15:56 +02:00
42c682514b docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)
* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)

* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes

* make style

* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes

* docs/examples(speech): address review  remove Hub subsection & Whisper tip; align dataset help text

* style: apply ruff/black/usort/codespell on examples/speech-recognition

* Apply style fixes

* Update examples/pytorch/speech-recognition/README.md

* update doc to match load_dataset

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-30 08:38:31 +00:00
aaf1269d83 Remove unnecessary Optional typing (#41198)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-30 08:38:05 +00:00
4a02bc7004 [docs] Fix links (#41110)
fix
2025-09-30 08:53:07 +02:00
def4a37e19 Embed interactive timeline in docs (#41015)
* embed timeline in docs (test web componentand Iframe)

* test scaling

* test multiple scales

* compensate scale in width

* set correct syle and scale

* remove bottom space created by scale

* add timeline as a separate page

* reformulate docs after review
2025-09-30 01:36:08 +00:00
3e975acc8b Fix docker quantization (#41201)
* launch docker

* remove gptq for now

* run tests

* Revert "run tests"

This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.

* revert
2025-09-29 16:36:30 +00:00
8635d8e796 Fix 8bit bnb loading (#41200)
* Fix 8bit

* oups forgot the case where it is not prequantized
2025-09-29 18:34:46 +02:00
1f0e9a4778 Fix EXAONE-4.0 dummy id (#41089)
* Fix EXAONE-4.0 dummy id

* Fix exaone4 dummy (#1)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-29 16:30:55 +00:00
bd37c45354 Add EdgeTAM (#39800)
* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* add rough necessarry changes

* first working edgetam

* fix issue with object pointers

* Use modular as much as possible

* nit fixes + optimization

* refactor spatial perceiver

* cleanup after merge

* add working edgetam

* improve perceiver resampler code

* simplify/unify rope attention logic

* Improve comments in apply_rotary_pos_emb_2d

* add working tests

* fix test timmwrapper

* add docs

* make fixup

* nits

* fix modular

* fix modular

* PR review part 1

* split apply_rotary_pos_emb_2d

* add granularity to _prepare_memory_conditioned_features

* add dates to doc

* add separate mlp for memory attention

* Fix memory on wrong device

* store processed frames in dict

* update checkpoints in tests

* update dates

---------

Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-29 11:54:54 -04:00
c1db38686a [Kernels Attention] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)
* fix

* be more strict

* change logic to include fa3

* fix the case where nothing is requested

* modify old tests + add kernels related tests

* style
2025-09-29 17:10:59 +02:00
5426edecab Make quantizers good citizens loading-wise (#41138)
* fix param_needs_quantization

* rewrite most hqq

* clean

* fix

* comment

* remove it from exception of safetensors

* start on bnb 4bits

* post-rebase fix

* make bnb4 bit a good citizen

* remove forgotten print

* make bnb 8bits a good citizen

* better hqq

* fix

* clean

* remove state dict from signature

* switch method

* make torchao a good citizen

* fixes

* fix torchao

* add check

* typo
2025-09-29 17:04:45 +02:00
399c589dfa Separate docker images for Nvidia and AMD in benchmarking (#41119)
Separate docker images for Nvidia and AMD
2025-09-29 17:03:27 +02:00
52cbc7c868 Fix attention sink implementation in flex attention (#41083)
* Fix attention sink implementation in flex attention

* fix dim

* fix

* Remove print

* raisae error when return_lse is False yet s_aux is providewd

* Clean test files for merge

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* force return lse

* Add to doc

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-09-29 14:33:03 +00:00
de9a75f5b0 fix(trainer): Avoid moving model with device_map (#41032)
* fix(trainer): Avoid moving model with device_map

When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.

This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.

A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.

* Added the logger warning for the move model

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
2025-09-29 14:31:42 +00:00
bcc0dae77c enable flex attention ut cases on XPU (#40989)
* enable flex attention ut cases on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 14:30:49 +00:00
fcd483f0ff Bump hfh prerelease version (#41175) 2025-09-29 16:28:36 +02:00
a3fa1d3993 Fix inaccurate train_tokens_per_second when resuming from checkpoint (#41156)
* fix(trainer): Fix the issue of inaccurate token count in training sessions

During the training process, the initial token count was not saved, leading to inaccurate speed calculation. Now, the initial token count is saved and the increment during the session is calculated, ensuring that the speed metric accurately reflects the performance of the current training session.

* 修复错误

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 16:22:35 +02:00
ad74fba085 [v5] Remove model_parallel deprecated feature (#41166)
* fix

* remove model parallel

* style

* removed a bit too much

* rm comments

* fix
2025-09-29 16:14:03 +02:00
38a08b6e8a More typing fixes (#41102)
* Fix noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* remove noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix chars

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-29 13:11:53 +00:00
4fade1148f [tests] CausalLMTester automatically infers other test classes from base_model_class 🐛 🔫 (#41066)
* halfway through the models

* update test checks

* refactor all

* another one

* use tuples

* more deletions

* solve bad inheritance patterns

* type

* PR ready?

* automatic model class inference from the base class

* vaultgemma

* make fixup

* make fixup

* rebase with gpt2

* make fixup :'(

* gpt2 is special
2025-09-29 15:05:08 +02:00
cdba28c344 [XPU] Add MXFP4 support for XPU (#41117)
* XPU supports gpt-oss MXFP4

* Complete MXFP4 UT file and comment information

* Complete MXFP4 UT file and comment information

* Fix code style

* Fix code style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-29 12:10:41 +02:00
2dcb20dcec CI Runners - move amd runners mi355 and 325 to runner group (#41193)
* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config
2025-09-29 11:14:19 +02:00
d0d574b1e4 Modernbert fix (#41056)
* Add FA to docker

* Fixed padding for mdernbert

* Fixed logits and hidden states extraction in ModernBertForMultipleChoice

* Added a test for ModernBertForMultipleChoice

* fixes

* More fixes and GREEN CI

* consistency

* moar consistency
2025-09-29 10:52:44 +02:00
071eb5334f handle flash slow tests (#41072)
* handle flash slow tests

* update patch mask to 1/0 for flash

* don't skip flash

* flash

* raise tols

* rm flash support :(

* nits

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
2025-09-26 16:24:31 +00:00
50d2448a1a Enable fa in amd docker (#41069)
* Add FA to docker

* Use caching mechanism for qwen2_5

* Fix a typo in important models list

* Partial fixes for gemma3

* Added a commit ID for FA repo

* Detailled  the expectation storage format

* Rebase fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 13:57:58 +02:00
10f6891fc5 Remove data from examples (#41168)
Remove telemetry
2025-09-26 13:52:45 +02:00
97ca0b4712 Fix flash-attn for paged_attention when no kernels (#41078)
* Fix non-kernels flash attention paged implementation

* Cover all cases

* Style

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-26 10:41:21 +02:00
53838edde7 Improve add_dates script (#41167)
* utils/add_dates.py

* put lfm2-vl in correct category
2025-09-25 16:00:05 -04:00
449533af73 Add language specifiers to code blocks of markdown files (#41114)
* Add language specifiers to code blocks of markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/qwen3_omni_moe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update nemotron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phimoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix syntax error

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-25 10:29:57 -07:00
e691f84412 Force new vision models addition to include a fast image processor (#40802)
* add test

* fix test and change cutoff date

* Add documentation to test
2025-09-25 15:58:18 +00:00
e54bb62a73 Simplify and improve model loading logic (#41103)
* remove unexpected keys from inputs (they have nothing to do there)

* remove input

* simplify a lot init

* fix

* fix check for non-persistent buffer

* revert because too many old and bad models...

* remove comment

* type hint

* make it a real test

* remove model_to_load -> always use the same model

* typo

* remove legacy offload_folder (we never waste that memory anymore)

* do not change prefix anymore

* change very bad function name

* create adjust method

* remove useless method

* restrict

* BC

* remove unused method

* CI

* remove unused args

* small fix

* fix

* CI

* CI

* avoid too many loops

* fix regex

* cleaner

* typo

* fix

* fix
2025-09-25 17:28:27 +02:00
6dc9ed87a0 Fix format of compressed_tensors.md (#41155)
* Fix table format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 14:50:15 +00:00
a579de7f5e Add Parakeet (#39062)
* first commit

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update to handle masking for bs>1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tests and docs

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update model ids

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update docs and improve style

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update librosa location

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* import guard torch too

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff code checks fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff format check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* updated to parakeet names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tokenizer decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove other model dependency

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* clean tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* linting

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff lint warnings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to seperate folders

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet ctc model code

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* simplify encoder structure

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update documentation

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet to toctree

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet doc

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Address comments

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix encoding format

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix minor ctc decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
2025-09-25 13:52:24 +00:00
1dd22a234c extend gemma3n integration ut cases on XPU (#41071)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-09-25 13:46:37 +00:00
05fb90c969 Fix single quotes in markdown (#41154)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-25 13:03:26 +00:00
44682e7131 Adapt and test huggingface_hub v1.0.0 (#40889)
* Adapt and test huggingface_hub v1.0.0.rc0

* forgot to bump hfh

* bump

* code quality

* code quality

* relax dependency table

* fix has_file

* install hfh 1.0.0.rc0 in circle ci jobs

* repostiryo

* push to hub now returns a commit url

* catch HfHubHTTPError

* check commit on branch

* add it back

* fix ?

* remove deprecated test

* uncomment another test

* trigger

* no proxies

* many more small changes

* fix load PIL Image from httpx

* require 1.0.0.rc0

* fix mocked tests

* fix others

* unchange

* unchange

* args

* Update .circleci/config.yml

* Bump to 1.0.0.rc1

* bump kernels version

* fix deps
2025-09-25 11:13:50 +00:00
750dd2a401 Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)
Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts
2025-09-25 10:33:46 +00:00
7258ea44bc Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-09-24 16:44:42 +02:00
2c4caa19e7 dummy commit (#41133)
* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-24 16:31:46 +02:00
6d1875924c Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head
2025-09-24 13:13:18 +01:00
3ca43d34b1 Fixed MXFP4 model storage issue (#41118) 2025-09-24 12:11:51 +00:00
b33cb70097 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)
* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment
2025-09-24 11:54:55 +00:00
b0c7034d58 Remove self-assignment (#41062)
* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-24 12:43:17 +01:00
04a0bb569c Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:34:12 +00:00
071c7b1423 Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 11:27:37 +00:00
80f20e0ff8 [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-24 11:18:27 +00:00
1d81247b0c [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors

* enable torchao safetensors support

* add more version checking
2025-09-24 12:32:47 +02:00
b533cec74d Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-24 10:17:41 +00:00
65dcd66cc8 🚨 [V5] Remove deprecated training arguments (#41017)
* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-24 12:01:27 +02:00
43a613c8da Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-09-24 06:37:21 +00:00
f64354e89a Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 16:20:01 -07:00
99b0995138 Remove bad test skips (#41109)
* remove bad skips

* remove more

* fix inits
2025-09-23 20:39:28 +02:00
00f3d90720 Fix _get_test_info for inherited tests (#41106)
* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 19:35:24 +02:00
cfa022e719 [tests] gpt2 + CausalLMModelTester (#41003)
* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder
2025-09-23 18:07:06 +01:00
869735d37d 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)
* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma
2025-09-23 16:20:00 +00:00
71717ce91c docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
2025-09-23 09:18:49 -07:00
946e5f95ea fix wrong height and width when read video use torchvision (#41091) 2025-09-23 12:35:44 +00:00
870add3daf Remove tf and flax from Chinese documentation (#41057)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:43:17 +00:00
ae60692821 Remove unused arguments (#40916)
* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:40:51 +00:00
f682797866 Fix typing (#40788)
* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:36:02 +00:00
f4a6c65951 Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:27:04 +00:00
89e0f472f4 Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)
Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-23 11:14:11 +00:00
62ce6fcb60 Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script

* Adjust vars
2025-09-23 13:05:27 +02:00
257fe5eea8 Switch to python:3.10-slim for CircleCI docker images (#41067)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 12:48:48 +02:00
0ec0325781 Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-09-23 11:53:51 +02:00
577fa6f167 fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
2025-09-23 09:50:23 +00:00
03c92884b5 Update team member list for some CI workflows (#41094)
* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-23 09:48:40 +00:00
cbb290ec23 Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files
2025-09-22 10:36:20 -07:00
8048c614bf [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions
2025-09-22 09:51:39 -07:00
aa30e0642e Update quantization CI (#41068)
* fix

* new everything

* fix
2025-09-22 18:10:16 +02:00
1bb69cce82 Fix CI jobs being all red 🔴 (false positive) (#41059)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 16:51:00 +02:00
f15258dec2 Remove <frameworkcontent> and <pt> tags from documentation (#41055)
* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 14:29:50 +00:00
2ec37649e2 Ci utils (#40978)
* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License
2025-09-22 16:16:19 +02:00
b9d337b6f3 Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload

* Address review comments

* Address review comments
2025-09-22 14:13:46 +00:00
646ff51d1a Simplify unnecessary Optional typing (#40839)
Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:50 +00:00
c9939b3ab6 Remove repeated import (#40937)
* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:57:13 +00:00
4f36011545 [testing] Fix seed_oss (#41052)
* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-09-22 14:54:30 +02:00
2b8a7e82b5 Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking
2025-09-22 13:42:34 +01:00
226667ec2f Remove doc of tf and flax (#41029)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 13:42:26 +01:00
6eff44bb8d Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:38:07 +00:00
9ff47a71e4 Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-09-22 12:21:38 +00:00
ae9ef2e151 docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-09-22 13:21:15 +01:00
f3c481ed87 Use torch.autocast (#40975)
* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 12:18:24 +00:00
37152f8446 Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:31:46 +00:00
8a52288dba Remove optax (#41030)
Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:30:39 +00:00
5f891b36cd Fix typing of tuples (#41028)
* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-22 11:29:07 +00:00
c05f9d2f0e [testing] Fix qwen2_audio (#41018)
* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-22 10:45:31 +00:00
55a1eaf6f0 Fix Qwen video tests (#41049)
fix test
2025-09-22 12:28:11 +02:00
db802aafa4 Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-09-22 10:06:59 +00:00
8a2f24a321 Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved

* redundant if condition fix
2025-09-22 09:47:34 +00:00
ebbcf00ad1 Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-09-21 23:46:27 +02:00
67097bf340 Fix benchmark runner argument name (#41012) 2025-09-20 10:53:56 +02:00
8076e755e5 Update after #41007 (#41014)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 21:55:46 +02:00
022c882e14 Fix Glm4v test (#41011)
fix
2025-09-19 18:54:26 +02:00
966b3dbcbe Fix PhimoeIntegrationTest (#41007)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:43:46 +00:00
04bf4112f2 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-09-19 16:41:22 +01:00
dfc230389c 🚨 [v5] remove deprecated entry point (#40997)
* remove old entry point

* update references to transformers-cli
2025-09-19 14:40:27 +00:00
8010f5d1d9 Patch more unittest.case.TestCase.assertXXX methods (#41008)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 16:38:12 +02:00
5bf633b32a [tests] update test_left_padding_compatibility (and minimize overwrites) (#40980)
* update test (and overwrites)

* better test comment

* 0 as a default for
2025-09-19 15:36:26 +01:00
df12617914 🚨 [v5] remove generate output retrocompatibility aliases (#40998)
remove old type aliases
2025-09-19 14:36:12 +00:00
2a538b2ed4 fix dict like init for ModelOutput (#41002)
* fix dict like init

* style
2025-09-19 16:14:44 +02:00
96a3e898cd RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:50:26 +00:00
98c8523434 Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree

* fix new models
2025-09-19 09:47:28 -04:00
767f8a4c75 Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 13:18:38 +00:00
9d9c4d24c5 Make EfficientLoFTRModelTest faster (#41000)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 12:51:05 +00:00
b4ba4e1da0 [RMSNorm] Fix rms norm init for models that center around 1 (#40796)
* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check
2025-09-19 12:15:36 +00:00
fce746512b [docs] rm stray tf/flax autodocs references (#40999)
rm tf references
2025-09-19 12:04:12 +01:00
ddfa3d4402 blt wip (#38579)
* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
2025-09-19 11:55:55 +02:00
46ea7e613d [testing] test num_hidden_layers being small in model tester (#40992)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-19 11:45:07 +02:00
ebdc17b8e5 ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
2025-09-19 10:39:21 +01:00
e2dbde280f Remove [[autodoc]] refs to TF/Flax objects (#40996)
* remove refs

* more
2025-09-19 11:28:34 +02:00
155f7e2e62 🔴[Attention] Bert-based Models Attention Refactor (#38301)
* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style
2025-09-19 11:23:58 +02:00
61eff450d3 Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description
2025-09-19 08:54:49 +00:00
5f6e278a51 Remove set_model_tester_for_less_flaky_tests (#40982)
remove
2025-09-18 18:56:10 +02:00
4df2529d79 🚨🚨🚨 Fully remove Tensorflow and Jax support library-wide (#40760)
* setup

* start the purge

* continue the purge

* more and more

* more

* continue the quest: remove loading tf/jax checkpoints

* style

* fix configs

* oups forgot conflict

* continue

* still grinding

* always more

* in tje zone

* never stop

* should fix doc

* fic

* fix

* fix

* fix tests

* still tests

* fix non-deterministic

* style

* remove last rebase issues

* onnx configs

* still on the grind

* always more references

* nearly the end

* could it really be the end?

* small fix

* add converters back

* post rebase

* latest qwen

* add back all converters

* explicitly add functions in converters

* re-add
2025-09-18 18:27:39 +02:00
5ac3c5171a Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 18:27:27 +02:00
d9d7f6a6b9 Revert change in compile_friendly_resize (#40645)
fix
2025-09-18 16:25:45 +01:00
738b223f57 Add captured actual outputs to CI artifacts (#40965)
* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 15:40:53 +02:00
dd7ac4cd59 [tests] Really use small models in all fast tests (#40945)
* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency
2025-09-18 15:24:12 +02:00
2ce35a248f Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
2025-09-18 13:22:19 +00:00
6e51ac31ef [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling
2025-09-18 14:09:08 +01:00
9378f874c1 [Trainer] Fix DP loss (#40799)
* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-18 13:07:20 +00:00
7cf1f5ced0 Use skip_predictor=True in vjepa2 get_vision_features (#40966)
use skip_predictor in vjepa2 `get_vision_features`
2025-09-18 11:51:45 +00:00
f6104189fd Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 11:49:14 +00:00
c532575795 Add new model LFM2-VL (#40624)
* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
2025-09-18 11:01:58 +00:00
564fde14f1 FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-18 09:57:21 +00:00
5748352c27 Update expected values for one more test_speculative_generation after #40949 (#40967)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-18 11:47:14 +02:00
438343d93f Don't list dropout in eager_paged_attention_forward (#40924)
Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 09:05:50 +00:00
449da6bb30 Add FlexOlmo model (#40921)
* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`
2025-09-18 09:04:06 +00:00
3bb1b4867c Standardize audio embedding function name for audio multimodal models (#40919)
* Standardize audio embedding function name for audio multimodal models

* PR review
2025-09-18 08:45:04 +00:00
58e13b9f12 Update expected values for some test_speculative_generation (#40949)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 20:50:38 +02:00
529d3a2b06 Fix Glm4vModelTest::test_eager_matches_fa2_generate (#40947)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 19:53:59 +02:00
a2ac4de8b0 Remove nested import logic for torchvision (#40940)
* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports
2025-09-17 13:34:30 -04:00
8e837f6ae2 Consistent naming for images kwargs (#40834)
* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print
2025-09-17 18:40:25 +02:00
eb04363a0d Raise error instead of warning when using meta device in from_pretrained (#40942)
* raise instead of warning

* add timm

* remove
2025-09-17 18:23:37 +02:00
ecc1d778ce Fix Glm4vMoeIntegrationTest (#40930)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 18:21:18 +02:00
c5553b4120 Fix trainer tests (#40823)
* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
2025-09-17 16:05:17 +00:00
14f01aee39 docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941) 2025-09-17 08:48:38 -07:00
26b65fb516 Intel CPU dockerfile (#40806)
* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
2025-09-17 15:42:30 +00:00
66f97d3f64 [models] remove unused import torch.utils.checkpoint (#40934) 2025-09-17 16:37:56 +01:00
3853bfe4d5 [DOC] Add missing dates in model cards (#40922)
add missing dates
2025-09-17 11:17:06 -04:00
6cade29278 Add LongCat-Flash (#40730)
* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism
2025-09-17 14:48:10 +02:00
48a5565179 Add support for Florence-2 training (#40914)
* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-17 11:49:56 +00:00
89949c5d2d Minor fix for #40727 (#40929)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-17 11:42:13 +02:00
c830fc1207 Adding activation kernels (#40890)
* first commit

* add mode

* revert modeling

* add compile

* rm print
2025-09-17 11:36:09 +02:00
f6999b00c3 [torchao safetensors] renaming get_state_dict function (#40774)
renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 11:20:50 +02:00
8428c7b9c8 Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)
* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-09-17 09:15:55 +00:00
ddd4caf066 [Llama4] Remove image_sizes arg and deprecate vision_feature_layer (#40832)
* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment
2025-09-17 09:14:13 +00:00
b82cd1c240 Processor load with multi-processing (#40786)
push
2025-09-17 09:46:49 +02:00
6e50a8afb2 [Docs] Adding documentation of MXFP4 Quantization (#40885)
* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-16 11:31:28 -07:00
cccef4be91 Fix dtype in Paligemma (#40912)
* fix dtypes

* fix copies

* delete unused attr
2025-09-16 16:07:56 +00:00
beb09cbd5a 🔴Make center_crop fast equivalent to slow (#40856)
make center_crop fast equivalent to slow
2025-09-16 16:01:38 +00:00
d4af0d9f03 [generate] misc fixes (#40906)
misc fixes
2025-09-16 15:18:06 +01:00
3b3f6cd0c1 [gemma3] Gemma3ForConditionalGeneration compatible with assisted generation (#40791)
* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase
2025-09-16 15:08:48 +01:00
88ba0f107e disable test_fast_is_faster_than_slow (#40909)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:34:04 +02:00
270da89708 Remove runner_map (#40880)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 15:18:07 +02:00
df03fc1f9c Improve module name handling for local custom code (#40809)
* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-09-16 13:11:48 +00:00
96bc19bcdf remove dummy EncodingFast (#40864)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-16 12:56:11 +00:00
d0af4269ec Add Olmo3 model (#40778)
* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test
2025-09-16 13:28:23 +02:00
65f9ede359 Set seed for Glm4vIntegrationTest (#40905)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 13:01:51 +02:00
0c1839d609 [cache] Only use scalars in get_mask_sizes (#40907)
* remove tensor ops

* style

* style
2025-09-16 12:48:58 +02:00
3688a977d0 Harmonize CacheLayer names (#40892)
* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert
2025-09-16 12:14:12 +02:00
087775d10e [cache] Merge static sliding and static chunked layer (#40893)
* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle
2025-09-16 11:41:20 +02:00
1aff033ec9 Fix flaky Gemma3nAudioFeatureExtractionTest::test_dither (#40902)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 11:00:07 +02:00
65adc3aaa3 Fix getter regression (#40824)
* test things

* style

* move tests to a sane place
2025-09-16 10:57:13 +02:00
8e1a12bbee Fixing the call to kernelize (#40628)
* fix

* style

* overload train and eval

* add getter and setter
2025-09-16 10:50:54 +02:00
21c8379fb0 Make debugging failing tests (check and update expect output values) easier 🔥 (#40727)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-09-16 10:21:48 +02:00
5af248b3e3 [generate] remove docs of a feature that no longer exists (#40895) 2025-09-15 19:22:31 +01:00
20ee3a73f0 🌐 [i18n-KO] Translated imageprocessor.md to Korean (#39557)
* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:07:16 -07:00
2141a5b764 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)
* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-09-15 10:06:57 -07:00
2a83792165 Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)
Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 17:38:13 +02:00
04d1c8f3d4 Fix deta loading & dataclass (#40878)
* fix

* fix 2
2025-09-15 17:23:13 +02:00
ff26fe8302 Add Fast PromptDepthAnything Processor (#40602)
* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-09-15 15:03:43 +00:00
6254bb4a68 Use torch.expm1 and torch.log1p for better numerical results (#40860)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:54:14 +00:00
e674e9dadb Clarify passing is_causal in sdpa_attention_paged_forward (#40838)
* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-15 11:51:22 +00:00
0957999f7f 🔴 Move variable output controls to _prepare_generation_config (#40715)
* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches
2025-09-15 11:08:00 +00:00
5e9ec59d0c Fix modular consistency (#40883)
* reapply modular

* add missing one
2025-09-15 13:07:08 +02:00
3442b2f300 [VaultGemma] Update expectations in integration tests (#40855)
* fix tests

* style
2025-09-15 12:46:30 +02:00
c0dbe095b0 Adding Support for Qwen3-VL Series (#40795)
* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-09-15 12:46:18 +02:00
fc5f9105da [Qwen3 Next] Use numerically stable rsqrt (#40848)
use numerically stable inverse
2025-09-15 12:45:13 +02:00
96d3795cfc Update model tags and integration references in bug report (#40881) 2025-09-15 12:08:29 +02:00
f5e1641857 fix: XIELU act parameters not being casted to correct dtype (#40812) 2025-09-15 11:05:55 +02:00
ada64ce452 fix florence kwargs (#40826) 2025-09-15 11:05:47 +02:00
93f810e6fa [docstrings / type hints] Update outdated annotations for past_key_values (#40803)
* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes
2025-09-15 10:52:32 +02:00
c65fea0b92 [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2025-09-15 10:46:32 +02:00
9c804f7ec4 Redirect MI355 CI results to dummy dataset (#40862) 2025-09-14 18:42:49 +02:00
02ea2b3433 Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)
Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-09-14 15:35:42 +00:00
d42e96a2a7 Use checkpoint in auto_class_docstring (#40844)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-13 00:49:19 +00:00
6eb3255842 [generate] Always use decoder config to init cache (#40772)
* mega derp

* fix

* always use the decoder
2025-09-12 18:24:22 +02:00
e682f90f60 [tests] move generative tests away from test_modeling_common.py (#40854)
move tests
2025-09-12 16:12:27 +00:00
8d8459132a [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)
* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve
2025-09-12 18:07:48 +02:00
3202 changed files with 101237 additions and 277501 deletions

View File

@ -16,10 +16,9 @@
import argparse
import copy
import os
import random
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
import glob
from typing import Any, Optional
import yaml
@ -30,6 +29,7 @@ COMMON_ENV_VARIABLES = {
"RUN_PIPELINE_TESTS": False,
# will be adjust in `CircleCIJob.to_dict`.
"RUN_FLAKY": True,
"DISABLE_SAFETENSORS_CONVERSION": True,
}
# Disable the use of {"s": None} as the output is way too long, causing the navigation on CircleCI impractical
COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "vvv": None, "rsfE":None}
@ -82,15 +82,15 @@ class EmptyJob:
@dataclass
class CircleCIJob:
name: str
additional_env: Dict[str, Any] = None
docker_image: List[Dict[str, str]] = None
install_steps: List[str] = None
additional_env: dict[str, Any] = None
docker_image: list[dict[str, str]] = None
install_steps: list[str] = None
marker: Optional[str] = None
parallelism: Optional[int] = 0
pytest_num_workers: int = 8
pytest_options: Dict[str, Any] = None
pytest_options: dict[str, Any] = None
resource_class: Optional[str] = "xlarge"
tests_to_run: Optional[List[str]] = None
tests_to_run: Optional[list[str]] = None
num_test_files_per_worker: Optional[int] = 10
# This should be only used for doctest job!
command_timeout: Optional[int] = None
@ -130,6 +130,12 @@ class CircleCIJob:
def to_dict(self):
env = COMMON_ENV_VARIABLES.copy()
if self.job_name != "tests_hub":
# fmt: off
# not critical
env.update({"HF_TOKEN": "".join(["h", "f", "_", "H", "o", "d", "V", "u", "M", "q", "b", "R", "m", "t", "b", "z", "F", "Q", "O", "Q", "A", "J", "G", "D", "l", "V", "Q", "r", "R", "N", "w", "D", "M", "V", "C", "s", "d"])})
# fmt: on
# Do not run tests decorated by @is_flaky on pull requests
env['RUN_FLAKY'] = os.environ.get("CIRCLE_PULL_REQUEST", "") == ""
env.update(self.additional_env)
@ -149,7 +155,7 @@ class CircleCIJob:
# Examples special case: we need to download NLTK files in advance to avoid cuncurrency issues
timeout_cmd = f"timeout {self.command_timeout} " if self.command_timeout else ""
marker_cmd = f"-m '{self.marker}'" if self.marker is not None else ""
junit_flags = f" -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
junit_flags = " -p no:warning -o junit_family=xunit1 --junitxml=test-results/junit.xml"
joined_flaky_patterns = "|".join(FLAKY_TEST_FAILURE_PATTERNS)
repeat_on_failure_flags = f"--reruns 5 --reruns-delay 2 --only-rerun '({joined_flaky_patterns})'"
parallel = f' << pipeline.parameters.{self.job_name}_parallelism >> '
@ -180,6 +186,7 @@ class CircleCIJob:
# During the CircleCI docker images build time, we might already (or not) download the data.
# If it's done already, the files are inside the directory `/test_data/`.
{"run": {"name": "fetch hub objects before pytest", "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py"}},
{"run": {"name": "download and unzip hub cache", "command": 'curl -L -o huggingface-cache.tar.gz https://huggingface.co/datasets/hf-internal-testing/hf_hub_cache/resolve/main/huggingface-cache.tar.gz && apt-get install pigz && tar --use-compress-program="pigz -d -p 8" -xf huggingface-cache.tar.gz && mv -n hub/* /root/.cache/huggingface/hub/ && ls -la /root/.cache/huggingface/hub/'}},
{"run": {
"name": "Run tests",
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}
@ -200,9 +207,9 @@ class CircleCIJob:
fi"""
},
},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": f"python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"run": {"name": "Expand to show skipped tests", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --skip"}},
{"run": {"name": "Failed tests: show reasons", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --fail"}},
{"run": {"name": "Errors", "when": "always", "command": "python3 .circleci/parse_test_outputs.py --file tests_output.txt --errors"}},
{"store_test_results": {"path": "test-results"}},
{"store_artifacts": {"path": "test-results/junit.xml"}},
{"store_artifacts": {"path": "reports"}},

View File

@ -1,5 +1,6 @@
import re
import argparse
import re
def parse_pytest_output(file_path):
skipped_tests = {}

View File

@ -36,19 +36,23 @@ body:
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @gante and @Rocketknight1
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
@ -56,6 +60,8 @@ body:
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:
@ -69,19 +75,6 @@ body:
- for issues with a model, report at https://discuss.huggingface.co/ and tag the model's creator.
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @Rocketknight1
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
placeholder: "@Username ..."

View File

@ -39,20 +39,23 @@ members/contributors who may be interested in your PR.
Models:
- text models: @ArthurZucker
- vision models: @amyeroberts, @qubvel
- speech models: @eustlb
- text models: @ArthurZucker @Cyrilvallez
- vision models: @yonigozlan @molbap
- audio models: @eustlb @ebezzam @vasqu
- multimodal models: @zucchini-nlp
- graph models: @clefourrier
Library:
- flax: @gante and @Rocketknight1
- generate: @zucchini-nlp (visual-language models) or @gante (all others)
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tensorflow: @gante and @Rocketknight1
- tokenizers: @ArthurZucker
- trainer: @zach-huggingface, @SunMarc and @qgallouedec
- chat templates: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
@ -60,20 +63,17 @@ Integrations:
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:
- AMD ROCm: @ivarflakstad
- Intel XPU: @IlyasMoutawwakil
- Ascend NPU: @ivarflakstad
Documentation: @stevhliu
HF projects:
- accelerate: [different repo](https://github.com/huggingface/accelerate)
- datasets: [different repo](https://github.com/huggingface/datasets)
- diffusers: [different repo](https://github.com/huggingface/diffusers)
- rust tokenizers: [different repo](https://github.com/huggingface/tokenizers)
Maintained examples (not research project or legacy):
- Flax: @Rocketknight1
- PyTorch: See Models above and tag the person corresponding to the modality of the example.
- TensorFlow: @Rocketknight1
Research projects are not maintained and should be taken as is.
-->

View File

@ -13,14 +13,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import github
import json
from github import Github
import os
import re
from collections import Counter
from pathlib import Path
import github
from github import Github
def pattern_to_regex(pattern):
if pattern.startswith("/"):
start_anchor = True

View File

@ -7,8 +7,8 @@ docs/ @stevhliu
/docker/ @ydshieh @ArthurZucker
# More high-level globs catch cases when specific rules later don't apply
/src/transformers/models/*/processing* @molbap @yonigozlan @qubvel
/src/transformers/models/*/image_processing* @qubvel
/src/transformers/models/*/processing* @molbap @yonigozlan
/src/transformers/models/*/image_processing* @yonigozlan
/src/transformers/models/*/image_processing_*_fast* @yonigozlan
# Owners of subsections of the library
@ -186,65 +186,65 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/zamba/mod*_zamba* @ArthurZucker
# Vision models
/src/transformers/models/beit/mod*_beit* @amyeroberts @qubvel
/src/transformers/models/bit/mod*_bit* @amyeroberts @qubvel
/src/transformers/models/conditional_detr/mod*_conditional_detr* @amyeroberts @qubvel
/src/transformers/models/convnext/mod*_convnext* @amyeroberts @qubvel
/src/transformers/models/convnextv2/mod*_convnextv2* @amyeroberts @qubvel
/src/transformers/models/cvt/mod*_cvt* @amyeroberts @qubvel
/src/transformers/models/deformable_detr/mod*_deformable_detr* @amyeroberts @qubvel
/src/transformers/models/deit/mod*_deit* @amyeroberts @qubvel
/src/transformers/models/depth_anything/mod*_depth_anything* @amyeroberts @qubvel
/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @amyeroberts @qubvel
/src/transformers/models/deta/mod*_deta* @amyeroberts @qubvel
/src/transformers/models/detr/mod*_detr* @amyeroberts @qubvel
/src/transformers/models/dinat/mod*_dinat* @amyeroberts @qubvel
/src/transformers/models/dinov2/mod*_dinov2* @amyeroberts @qubvel
/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @amyeroberts @qubvel
/src/transformers/models/dit/mod*_dit* @amyeroberts @qubvel
/src/transformers/models/dpt/mod*_dpt* @amyeroberts @qubvel
/src/transformers/models/efficientformer/mod*_efficientformer* @amyeroberts @qubvel
/src/transformers/models/efficientnet/mod*_efficientnet* @amyeroberts @qubvel
/src/transformers/models/focalnet/mod*_focalnet* @amyeroberts @qubvel
/src/transformers/models/glpn/mod*_glpn* @amyeroberts @qubvel
/src/transformers/models/hiera/mod*_hiera* @amyeroberts @qubvel
/src/transformers/models/ijepa/mod*_ijepa* @amyeroberts @qubvel
/src/transformers/models/imagegpt/mod*_imagegpt* @amyeroberts @qubvel
/src/transformers/models/levit/mod*_levit* @amyeroberts @qubvel
/src/transformers/models/mask2former/mod*_mask2former* @amyeroberts @qubvel
/src/transformers/models/maskformer/mod*_maskformer* @amyeroberts @qubvel
/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @amyeroberts @qubvel
/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @amyeroberts @qubvel
/src/transformers/models/mobilevit/mod*_mobilevit* @amyeroberts @qubvel
/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @amyeroberts @qubvel
/src/transformers/models/nat/mod*_nat* @amyeroberts @qubvel
/src/transformers/models/poolformer/mod*_poolformer* @amyeroberts @qubvel
/src/transformers/models/pvt/mod*_pvt* @amyeroberts @qubvel
/src/transformers/models/pvt_v2/mod*_pvt_v2* @amyeroberts @qubvel
/src/transformers/models/regnet/mod*_regnet* @amyeroberts @qubvel
/src/transformers/models/resnet/mod*_resnet* @amyeroberts @qubvel
/src/transformers/models/rt_detr/mod*_rt_detr* @amyeroberts @qubvel
/src/transformers/models/segformer/mod*_segformer* @amyeroberts @qubvel
/src/transformers/models/seggpt/mod*_seggpt* @amyeroberts @qubvel
/src/transformers/models/superpoint/mod*_superpoint* @amyeroberts @qubvel
/src/transformers/models/swiftformer/mod*_swiftformer* @amyeroberts @qubvel
/src/transformers/models/swin/mod*_swin* @amyeroberts @qubvel
/src/transformers/models/swinv2/mod*_swinv2* @amyeroberts @qubvel
/src/transformers/models/swin2sr/mod*_swin2sr* @amyeroberts @qubvel
/src/transformers/models/table_transformer/mod*_table_transformer* @amyeroberts @qubvel
/src/transformers/models/textnet/mod*_textnet* @amyeroberts @qubvel
/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @amyeroberts @qubvel
/src/transformers/models/upernet/mod*_upernet* @amyeroberts @qubvel
/src/transformers/models/van/mod*_van* @amyeroberts @qubvel
/src/transformers/models/vit/mod*_vit* @amyeroberts @qubvel
/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @amyeroberts @qubvel
/src/transformers/models/vitdet/mod*_vitdet* @amyeroberts @qubvel
/src/transformers/models/vit_mae/mod*_vit_mae* @amyeroberts @qubvel
/src/transformers/models/vitmatte/mod*_vitmatte* @amyeroberts @qubvel
/src/transformers/models/vit_msn/mod*_vit_msn* @amyeroberts @qubvel
/src/transformers/models/vitpose/mod*_vitpose* @amyeroberts @qubvel
/src/transformers/models/yolos/mod*_yolos* @amyeroberts @qubvel
/src/transformers/models/zoedepth/mod*_zoedepth* @amyeroberts @qubvel
/src/transformers/models/beit/mod*_beit* @yonigozlan @molbap
/src/transformers/models/bit/mod*_bit* @yonigozlan @molbap
/src/transformers/models/conditional_detr/mod*_conditional_detr* @yonigozlan @molbap
/src/transformers/models/convnext/mod*_convnext* @yonigozlan @molbap
/src/transformers/models/convnextv2/mod*_convnextv2* @yonigozlan @molbap
/src/transformers/models/cvt/mod*_cvt* @yonigozlan @molbap
/src/transformers/models/deformable_detr/mod*_deformable_detr* @yonigozlan @molbap
/src/transformers/models/deit/mod*_deit* @yonigozlan @molbap
/src/transformers/models/depth_anything/mod*_depth_anything* @yonigozlan @molbap
/src/transformers/models/depth_anything_v2/mod*_depth_anything_v2* @yonigozlan @molbap
/src/transformers/models/deta/mod*_deta* @yonigozlan @molbap
/src/transformers/models/detr/mod*_detr* @yonigozlan @molbap
/src/transformers/models/dinat/mod*_dinat* @yonigozlan @molbap
/src/transformers/models/dinov2/mod*_dinov2* @yonigozlan @molbap
/src/transformers/models/dinov2_with_registers/mod*_dinov2_with_registers* @yonigozlan @molbap
/src/transformers/models/dit/mod*_dit* @yonigozlan @molbap
/src/transformers/models/dpt/mod*_dpt* @yonigozlan @molbap
/src/transformers/models/efficientformer/mod*_efficientformer* @yonigozlan @molbap
/src/transformers/models/efficientnet/mod*_efficientnet* @yonigozlan @molbap
/src/transformers/models/focalnet/mod*_focalnet* @yonigozlan @molbap
/src/transformers/models/glpn/mod*_glpn* @yonigozlan @molbap
/src/transformers/models/hiera/mod*_hiera* @yonigozlan @molbap
/src/transformers/models/ijepa/mod*_ijepa* @yonigozlan @molbap
/src/transformers/models/imagegpt/mod*_imagegpt* @yonigozlan @molbap
/src/transformers/models/levit/mod*_levit* @yonigozlan @molbap
/src/transformers/models/mask2former/mod*_mask2former* @yonigozlan @molbap
/src/transformers/models/maskformer/mod*_maskformer* @yonigozlan @molbap
/src/transformers/models/mobilenet_v1/mod*_mobilenet_v1* @yonigozlan @molbap
/src/transformers/models/mobilenet_v2/mod*_mobilenet_v2* @yonigozlan @molbap
/src/transformers/models/mobilevit/mod*_mobilevit* @yonigozlan @molbap
/src/transformers/models/mobilevitv2/mod*_mobilevitv2* @yonigozlan @molbap
/src/transformers/models/nat/mod*_nat* @yonigozlan @molbap
/src/transformers/models/poolformer/mod*_poolformer* @yonigozlan @molbap
/src/transformers/models/pvt/mod*_pvt* @yonigozlan @molbap
/src/transformers/models/pvt_v2/mod*_pvt_v2* @yonigozlan @molbap
/src/transformers/models/regnet/mod*_regnet* @yonigozlan @molbap
/src/transformers/models/resnet/mod*_resnet* @yonigozlan @molbap
/src/transformers/models/rt_detr/mod*_rt_detr* @yonigozlan @molbap
/src/transformers/models/segformer/mod*_segformer* @yonigozlan @molbap
/src/transformers/models/seggpt/mod*_seggpt* @yonigozlan @molbap
/src/transformers/models/superpoint/mod*_superpoint* @yonigozlan @molbap
/src/transformers/models/swiftformer/mod*_swiftformer* @yonigozlan @molbap
/src/transformers/models/swin/mod*_swin* @yonigozlan @molbap
/src/transformers/models/swinv2/mod*_swinv2* @yonigozlan @molbap
/src/transformers/models/swin2sr/mod*_swin2sr* @yonigozlan @molbap
/src/transformers/models/table_transformer/mod*_table_transformer* @yonigozlan @molbap
/src/transformers/models/textnet/mod*_textnet* @yonigozlan @molbap
/src/transformers/models/timm_wrapper/mod*_timm_wrapper* @yonigozlan @molbap
/src/transformers/models/upernet/mod*_upernet* @yonigozlan @molbap
/src/transformers/models/van/mod*_van* @yonigozlan @molbap
/src/transformers/models/vit/mod*_vit* @yonigozlan @molbap
/src/transformers/models/vit_hybrid/mod*_vit_hybrid* @yonigozlan @molbap
/src/transformers/models/vitdet/mod*_vitdet* @yonigozlan @molbap
/src/transformers/models/vit_mae/mod*_vit_mae* @yonigozlan @molbap
/src/transformers/models/vitmatte/mod*_vitmatte* @yonigozlan @molbap
/src/transformers/models/vit_msn/mod*_vit_msn* @yonigozlan @molbap
/src/transformers/models/vitpose/mod*_vitpose* @yonigozlan @molbap
/src/transformers/models/yolos/mod*_yolos* @yonigozlan @molbap
/src/transformers/models/zoedepth/mod*_zoedepth* @yonigozlan @molbap
# Audio models
/src/transformers/models/audio_spectrogram_transformer/mod*_audio_spectrogram_transformer* @eustlb
@ -304,7 +304,7 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/donut/mod*_donut* @zucchini-nlp
/src/transformers/models/flava/mod*_flava* @zucchini-nlp
/src/transformers/models/git/mod*_git* @zucchini-nlp
/src/transformers/models/grounding_dino/mod*_grounding_dino* @qubvel
/src/transformers/models/grounding_dino/mod*_grounding_dino* @yonigozlan
/src/transformers/models/groupvit/mod*_groupvit* @zucchini-nlp
/src/transformers/models/idefics/mod*_idefics* @zucchini-nlp
/src/transformers/models/idefics2/mod*_idefics2* @zucchini-nlp
@ -326,10 +326,10 @@ trainer_utils.py @zach-huggingface @SunMarc
/src/transformers/models/mgp_str/mod*_mgp_str* @zucchini-nlp
/src/transformers/models/mllama/mod*_mllama* @zucchini-nlp
/src/transformers/models/nougat/mod*_nougat* @NielsRogge
/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @qubvel @yonigozlan
/src/transformers/models/omdet_turbo/mod*_omdet_turbo* @yonigozlan
/src/transformers/models/oneformer/mod*_oneformer* @zucchini-nlp
/src/transformers/models/owlvit/mod*_owlvit* @qubvel
/src/transformers/models/owlv2/mod*_owlv2* @qubvel
/src/transformers/models/owlvit/mod*_owlvit* @yonigozlan
/src/transformers/models/owlv2/mod*_owlv2* @yonigozlan
/src/transformers/models/paligemma/mod*_paligemma* @zucchini-nlp @molbap
/src/transformers/models/perceiver/mod*_perceiver* @zucchini-nlp
/src/transformers/models/pix2struct/mod*_pix2struct* @zucchini-nlp

85
.github/workflows/benchmark_v2.yml vendored Normal file
View File

@ -0,0 +1,85 @@
name: Benchmark v2 Framework
on:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
container_image:
description: 'Docker image to use'
required: true
type: string
container_options:
description: 'Container options to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
benchmark-v2:
name: Benchmark v2
runs-on: ${{ inputs.runner }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
(github.event_name == 'schedule')
container:
image: ${{ inputs.container_image }}
options: ${{ inputs.container_options }}
steps:
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install benchmark dependencies
run: |
python3 -m pip install -r benchmark_v2/requirements.txt
- name: Reinstall transformers in edit mode
run: |
python3 -m pip uninstall -y transformers
python3 -m pip install -e ".[torch]"
- name: Show installed libraries and their versions
run: |
python3 -m pip list
python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
nvidia-smi || true
- name: Run benchmark v2
working-directory: benchmark_v2
run: |
echo "Running benchmarks"
python3 run_benchmarks.py \
--commit-id '${{ inputs.commit_sha || github.sha }}' \
--run-id '${{ inputs.run_id }}' \
--push-to-hub '${{ inputs.benchmark_repo_id}}' \
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -0,0 +1,21 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
container_image: huggingface/transformers-pytorch-gpu
container_options: --gpus all --privileged --ipc host --shm-size "16gb"
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -0,0 +1,21 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: amd-mi325-ci-1gpu
container_image: huggingface/transformers-pytorch-amd-gpu
container_options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit

View File

@ -5,6 +5,7 @@ on:
branches:
- build_ci_docker_image*
repository_dispatch:
workflow_dispatch:
workflow_call:
inputs:
image_postfix:
@ -221,7 +222,7 @@ jobs:
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on:
group: aws-general-8-plus
group: aws-highcpu-32-priv
steps:
-
name: Set up Docker Buildx

View File

@ -16,8 +16,20 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de en es fr hi it ko pt tr zh ja te
languages: en
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
build_other_lang:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de es fr hi it ja ko pt zh
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -35,7 +35,6 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1

View File

@ -16,7 +16,6 @@ env:
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:

View File

@ -12,9 +12,6 @@ on:
slice_id:
required: true
type: number
runner_map:
required: false
type: string
docker:
required: true
type: string
@ -41,7 +38,6 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -54,10 +50,12 @@ jobs:
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: ${{ fromJson(inputs.runner_map)[matrix.folders][inputs.machine_type] }}
group: '${{ inputs.machine_type }}'
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
machine_type: ${{ steps.set_machine_type.outputs.machine_type }}
steps:
- name: Echo input and matrix info
shell: bash
@ -111,6 +109,7 @@ jobs:
run: pip freeze
- name: Set `machine_type` for report and artifact names
id: set_machine_type
working-directory: /transformers
shell: bash
run: |
@ -126,29 +125,49 @@ jobs:
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
echo "machine_type=$machine_type" >> $GITHUB_OUTPUT
- name: Create report directory if it doesn't exist
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
exit ${EXIT_CODE:-1}
- name: Failure short reports
if: ${{ failure() }}
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
run: |
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
collated_reports:
name: Collated Reports
@ -159,5 +178,5 @@ jobs:
job: run_models_gpu
report_repo_id: ${{ inputs.report_repo_id }}
gpu_name: ${{ inputs.runner_type }}
machine_type: ${{ inputs.machine_type }}
machine_type: ${{ needs.run_models_gpu.outputs.machine_type }}
secrets: inherit

View File

@ -26,7 +26,6 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -14,7 +14,7 @@ permissions: {}
jobs:
get-pr-number:
name: Get PR number
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:

View File

@ -20,7 +20,6 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -29,7 +28,7 @@ jobs:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:

View File

@ -20,7 +20,7 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
@ -33,7 +33,7 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
@ -46,7 +46,7 @@ jobs:
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci
@ -59,7 +59,7 @@ jobs:
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi325-ci
runner_group: amd-mi325
docker: huggingface/transformers-pytorch-deepspeed-amd-gpu
ci_event: Scheduled CI (AMD) - mi325
report_repo_id: optimum-amd/transformers_daily_ci

View File

@ -3,7 +3,7 @@ name: Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows: ["Self-hosted runner (AMD scheduled CI caller)"]
@ -20,10 +20,10 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: optimum-amd/transformers_daily_ci
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
torch-pipeline:
@ -32,10 +32,10 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: optimum-amd/transformers_daily_ci
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
example-ci:
@ -44,20 +44,20 @@ jobs:
with:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: optimum-amd/transformers_daily_ci
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml@main
with:
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#amd-hf-ci"
runner_scale_set: amd-mi355-ci
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: optimum-amd/transformers_daily_ci
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -88,6 +88,7 @@ jobs:
job: run_trainer_and_fsdp_gpu
slack_report_channel: "#transformers-ci-daily-training"
docker: huggingface/transformers-all-latest-gpu
runner_type: "a10"
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}

View File

@ -26,7 +26,6 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -48,7 +48,6 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
NUM_SLICES: 2
@ -68,7 +67,6 @@ jobs:
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
runner_map: ${{ steps.set-matrix.outputs.runner_map }}
quantization_matrix: ${{ steps.set-matrix-quantization.outputs.quantization_matrix }}
steps:
- name: Update clone
@ -95,7 +93,6 @@ jobs:
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
echo "runner_map=$(python3 ../utils/get_runner_map.py)" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
@ -119,14 +116,13 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [single-gpu, multi-gpu]
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
@ -147,9 +143,10 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit

View File

@ -20,7 +20,6 @@ env:
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -33,14 +32,17 @@ jobs:
steps:
- name: Get runner to use
shell: bash
env:
NUM_GPUS: ${{ github.event.inputs.num_gpus }}
RUNNER_TYPE: ${{ github.event.inputs.runner_type }}
run: |
if [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
if [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "t4" ]]; then
echo "RUNNER=aws-g4dn-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "t4" ]]; then
echo "RUNNER=aws-g4dn-12xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
elif [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache" >> $GITHUB_ENV
else
echo "RUNNER=" >> $GITHUB_ENV
@ -85,9 +87,11 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
GITHUB_ACTOR: ${{ github.actor }}
run: |
echo "${{ github.actor }}"
github_actor=${{ github.actor }}
echo "$GITHUB_ACTOR"
github_actor=$GITHUB_ACTOR
github_actor=${github_actor/'-'/'_'}
echo "$github_actor"
echo "github_actor=$github_actor" >> $GITHUB_ENV

1
.gitignore vendored
View File

@ -13,6 +13,7 @@ tests/fixtures/cached_*_text.txt
logs/
lightning_logs/
lang_code_data/
reports/
# Distribution / packaging
.Python

View File

@ -278,13 +278,14 @@ are working on it).<br>
useful to avoid duplicated work, and to differentiate it from PRs ready to be merged.<br>
☐ Make sure existing tests pass.<br>
☐ If adding a new feature, also add tests for it.<br>
- If you are adding a new model, make sure you use
- If you are adding a new model, make sure you use
`ModelTester.all_model_classes = (MyModel, MyModelWithLMHead,...)` to trigger the common tests.
- If you are adding new `@slow` tests, make sure they pass using
- If you are adding new `@slow` tests, make sure they pass using
`RUN_SLOW=1 python -m pytest tests/models/my_new_model/test_my_new_model.py`.
- If you are adding a new tokenizer, write tests and make sure
- If you are adding a new tokenizer, write tests and make sure
`RUN_SLOW=1 python -m pytest tests/models/{your_model_name}/test_tokenization_{your_model_name}.py` passes.
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
- CircleCI does not run the slow tests, but GitHub Actions does every night!<br>
☐ All public methods must have informative docstrings (see
[`modeling_bert.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/bert/modeling_bert.py)
@ -340,6 +341,7 @@ RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./examples/pytorch/t
```
Like the slow tests, there are other environment variables available which are not enabled by default during testing:
- `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers.
More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py).

View File

@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -154,7 +153,7 @@ You are not required to read the following guidelines before opening an issue. H
cd examples/seq2seq
torchrun --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--output_dir output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
1. https://github.com/huggingface/transformers/issues/9257
2. https://github.com/huggingface/transformers/issues/9257#issuecomment-749945162
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:

View File

@ -48,9 +48,11 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_bn.md">বাংলা</a> |
</p>
</h4>
@ -62,12 +64,11 @@ limitations under the License.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
frameworks (Axolotl, Unsloth, DeepSpeed, FSDP, PyTorch-Lightning, ...), inference engines (vLLM, SGLang, TGI, ...),
and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`.
@ -80,7 +81,7 @@ Explore the [Hub](https://huggingface.com/) today to find a model and use Transf
## Installation
Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+.
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
@ -110,10 +111,10 @@ git clone https://github.com/huggingface/transformers.git
cd transformers
# pip
pip install .[torch]
pip install '.[torch]'
# uv
uv pip install .[torch]
uv pip install '.[torch]'
```
## Quickstart
@ -193,7 +194,6 @@ pipeline("https://huggingface.co/datasets/Narsil/image_dummy/raw/main/parrots.pn
<details>
<summary>Visual question answering</summary>
<h3 align="center">
<a><img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/idefics-few-shot.jpg"></a>
</h3>

View File

@ -6,7 +6,7 @@ developers, researchers, students, professors, engineers, and anyone else to bui
In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [gpt4all](https://github.com/nomic-ai/gpt4all)
@ -49,7 +49,7 @@ Keywords: LLMs, Large Language Models, Agents, Chains
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
## [ParlAI](https://github.com/facebookresearch/ParlAI)
@ -257,7 +257,7 @@ Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusi
Keywords: Text-to-3D, Stable Diffusion
## [txtai](https://github.com/neuml/txtai)
[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
Keywords: Semantic search, LLM
@ -309,8 +309,8 @@ Keywords: OCR, LaTeX, Math formula
OpenCLIP is an open source implementation of OpenAI's CLIP.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
@ -596,7 +596,7 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active
## [BentoML](https://github.com/bentoml/BentoML)
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
Keywords: BentoML, Framework, Deployment, AI Applications
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
Keywords: PEFT, fine-tuning, LLaMA-2, ChatGLM, Qwen

View File

@ -21,6 +21,46 @@ python run_benchmarks.py \
--num-tokens-to-generate 200
```
### Uploading Results to HuggingFace Dataset
You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:
```bash
# Upload to a public dataset with auto-generated run ID
python run_benchmarks.py --upload-to-hub username/benchmark-results
# Upload with a custom run ID for easy identification
python run_benchmarks.py --upload-to-hub username/benchmark-results --run-id experiment_v1
# Upload with custom HuggingFace token (if not set in environment)
python run_benchmarks.py --upload-to-hub username/benchmark-results --token hf_your_token_here
```
**Dataset Directory Structure:**
```
dataset_name/
├── 2025-01-15/
│ ├── runs/ # Non-scheduled runs (manual, PR, etc.)
│ │ └── 123-1245151651/ # GitHub run number and ID
│ │ └── benchmark_results/
│ │ ├── benchmark_summary_20250115_143022.json
│ │ └── model-name/
│ │ └── model-name_benchmark_20250115_143022.json
│ └── benchmark_results_abc123de/ # Scheduled runs (daily CI)
│ ├── benchmark_summary_20250115_143022.json
│ └── model-name/
│ └── model-name_benchmark_20250115_143022.json
└── 2025-01-16/
└── ...
```
**Authentication for Uploads:**
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash

View File

@ -20,7 +20,6 @@ import torch
from benchmark_framework import ModelBenchmark
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")

View File

@ -3,4 +3,5 @@ psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
datasets>=2.10.0
huggingface_hub>=0.16.0

View File

@ -24,6 +24,7 @@ import json
import logging
import os
import sys
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
@ -160,7 +161,12 @@ def run_single_benchmark(
return None
def generate_summary_report(output_dir: str, benchmark_results: dict[str, Any], logger: logging.Logger) -> str:
def generate_summary_report(
output_dir: str,
benchmark_results: dict[str, Any],
logger: logging.Logger,
benchmark_run_uuid: Optional[str] = None,
) -> str:
"""Generate a summary report of all benchmark runs."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.json")
@ -168,6 +174,7 @@ def generate_summary_report(output_dir: str, benchmark_results: dict[str, Any],
summary_data = {
"run_metadata": {
"timestamp": datetime.utcnow().isoformat(),
"benchmark_run_uuid": benchmark_run_uuid,
"total_benchmarks": len(benchmark_results),
"successful_benchmarks": len([r for r in benchmark_results.values() if r is not None]),
"failed_benchmarks": len([r for r in benchmark_results.values() if r is None]),
@ -183,9 +190,114 @@ def generate_summary_report(output_dir: str, benchmark_results: dict[str, Any],
return summary_file
def upload_results_to_hf_dataset(
output_dir: str,
summary_file: str,
dataset_name: str,
run_id: Optional[str] = None,
token: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> Optional[str]:
"""
Upload benchmark results to a HuggingFace Dataset.
Based on upload_collated_report() from utils/collated_reports.py
Args:
output_dir: Local output directory containing results
summary_file: Path to the summary file
dataset_name: Name of the HuggingFace dataset to upload to
run_id: Unique run identifier (if None, will generate one)
token: HuggingFace token for authentication (if None, will use environment variables)
logger: Logger instance
Returns:
The run_id used for the upload, None if upload failed
"""
if logger is None:
logger = logging.getLogger(__name__)
import os
from huggingface_hub import HfApi
api = HfApi()
if run_id is None:
github_run_number = os.getenv("GITHUB_RUN_NUMBER")
github_run_id = os.getenv("GITHUB_RUN_ID")
if github_run_number and github_run_id:
run_id = f"{github_run_number}-{github_run_id}"
date_folder = datetime.now().strftime("%Y-%m-%d")
github_event_name = os.getenv("GITHUB_EVENT_NAME")
if github_event_name != "schedule":
# Non-scheduled runs go under a runs subfolder
repo_path = f"{date_folder}/runs/{run_id}/benchmark_results"
else:
# Scheduled runs go directly under the date
repo_path = f"{date_folder}/{run_id}/benchmark_results"
logger.info(f"Uploading benchmark results to dataset '{dataset_name}' at path '{repo_path}'")
try:
# Upload all files in the output directory
from pathlib import Path
output_path = Path(output_dir)
for file_path in output_path.rglob("*"):
if file_path.is_file():
# Calculate relative path from output_dir
relative_path = file_path.relative_to(output_path)
path_in_repo = f"{repo_path}/{relative_path}"
logger.debug(f"Uploading {file_path} to {path_in_repo}")
api.upload_file(
path_or_fileobj=str(file_path),
path_in_repo=path_in_repo,
repo_id=dataset_name,
repo_type="dataset",
token=token,
commit_message=f"Upload benchmark results for run {run_id}",
)
logger.info(
f"Successfully uploaded results to: https://huggingface.co/datasets/{dataset_name}/tree/main/{repo_path}"
)
return run_id
except Exception as upload_error:
logger.error(f"Failed to upload results: {upload_error}")
import traceback
logger.debug(traceback.format_exc())
return None
def main():
"""Main entry point for the benchmarking script."""
parser = argparse.ArgumentParser(description="Run all benchmarks in the ./benches directory")
# Generate a unique UUID for this benchmark run
benchmark_run_uuid = str(uuid.uuid4())[:8]
parser = argparse.ArgumentParser(
description="Run all benchmarks in the ./benches directory",
epilog="""
Examples:
# Run all available benchmarks
python3 run_benchmarks.py
# Run with specific model and upload to HuggingFace Dataset
python3 run_benchmarks.py --model-id meta-llama/Llama-2-7b-hf --upload-to-hf username/benchmark-results
# Run with custom run ID and upload to HuggingFace Dataset
python3 run_benchmarks.py --run-id experiment_v1 --upload-to-hf org/benchmarks
# Run only specific benchmarks with file logging
python3 run_benchmarks.py --include llama --enable-file-logging
""", # noqa: W293
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--output-dir",
@ -228,20 +340,35 @@ def main():
parser.add_argument("--exclude", type=str, nargs="*", help="Exclude benchmarks matching these names")
parser.add_argument("--enable-mock", action="store_true", help="Enable mock benchmark (skipped by default)")
parser.add_argument("--enable-file-logging", action="store_true", help="Enable file logging (disabled by default)")
parser.add_argument(
"--commit-id", type=str, help="Git commit ID for metadata (if not provided, will auto-detect from git)"
)
parser.add_argument(
"--push-to-hub",
type=str,
help="Upload results to HuggingFace Dataset (provide dataset name, e.g., 'username/benchmark-results')",
)
parser.add_argument(
"--run-id", type=str, help="Custom run ID for organizing results (if not provided, will generate a unique ID)"
)
parser.add_argument(
"--token",
type=str,
help="HuggingFace token for dataset uploads (if not provided, will use HF_TOKEN environment variable)",
)
args = parser.parse_args()
# Setup logging
logger = setup_logging(args.log_level, args.enable_file_logging)
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Benches directory: {args.benches_dir}")
@ -286,9 +413,6 @@ def main():
if args.model_id:
benchmark_kwargs["model_id"] = args.model_id
# Add enable_mock flag for mock benchmark
benchmark_kwargs["enable_mock"] = args.enable_mock
# Add commit_id if provided
if args.commit_id:
benchmark_kwargs["commit_id"] = args.commit_id
@ -306,7 +430,28 @@ def main():
successful_count += 1
# Generate summary report
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger)
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger, benchmark_run_uuid)
# Upload results to HuggingFace Dataset if requested
upload_run_id = None
if args.push_to_hub:
logger.info("=" * 60)
logger.info("UPLOADING TO HUGGINGFACE DATASET")
logger.info("=" * 60)
# Use provided run_id or fallback to benchmark run UUID
effective_run_id = args.run_id or benchmark_run_uuid
upload_run_id = upload_results_to_hf_dataset(
output_dir=args.output_dir,
summary_file=summary_file,
dataset_name=args.push_to_hub,
run_id=effective_run_id,
token=args.token,
logger=logger,
)
if upload_run_id:
logger.info(f"Upload completed with run ID: {upload_run_id}")
else:
logger.warning("Upload failed - continuing with local results")
# Final summary
total_benchmarks = len(filtered_benchmarks)
@ -321,6 +466,16 @@ def main():
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Summary report: {summary_file}")
if args.push_to_hub:
if upload_run_id:
logger.info(f"HuggingFace Dataset: {args.push_to_hub}")
logger.info(f"Run ID: {upload_run_id}")
logger.info(
f"View results: https://huggingface.co/datasets/{args.push_to_hub}/tree/main/{datetime.now().strftime('%Y-%m-%d')}/runs/{upload_run_id}"
)
else:
logger.warning("Upload to HuggingFace Dataset failed")
if failed_count > 0:
logger.warning(f"{failed_count} benchmark(s) failed. Check logs for details.")
return 1

View File

@ -16,6 +16,7 @@
# by pytest before any tests are run
import doctest
import os
import sys
import warnings
from os.path import abspath, dirname, join
@ -27,6 +28,7 @@ from transformers.testing_utils import (
HfDoctestModule,
HfDocTestParser,
is_torch_available,
patch_testing_methods_to_collect_info,
patch_torch_compile_force_graph,
)
@ -52,7 +54,6 @@ NOT_DEVICE_TESTS = {
"test_gradient_checkpointing_backward_compatibility",
"test_gradient_checkpointing_enable_disable",
"test_torch_save_load",
"test_initialization",
"test_forward_signature",
"test_model_get_set_embeddings",
"test_model_main_input_name",
@ -62,11 +63,8 @@ NOT_DEVICE_TESTS = {
"test_load_save_without_tied_weights",
"test_tied_weights_keys",
"test_model_weights_reload_no_missing_tied_weights",
"test_mismatched_shapes_have_properly_initialized_weights",
"test_matched_shapes_have_loaded_weights_when_some_mismatched_shapes_exist",
"test_can_load_ignoring_mismatched_shapes",
"test_model_is_small",
"test_tf_from_pt_safetensors",
"test_flax_from_pt_safetensors",
"ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device
"ModelTester::test_pipeline_",
"/repo_utils/",
@ -91,6 +89,8 @@ def pytest_configure(config):
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "true"
def pytest_collection_modifyitems(items):
for item in items:
@ -145,3 +145,7 @@ if is_torch_available():
# patch `torch.compile`: if `TORCH_COMPILE_FORCE_FULLGRAPH=1` (or values considered as true, e.g. yes, y, etc.),
# the patched version will always run with `fullgraph=True`.
patch_torch_compile_force_graph()
if os.environ.get("PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS", "").lower() in ("yes", "true", "on", "y", "1"):
patch_testing_methods_to_collect_info()

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
USER root
ARG REF=main
@ -6,10 +6,8 @@ RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN git lfs install
RUN uv pip uninstall transformers

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -1,4 +1,4 @@
FROM python:3.9-slim
FROM python:3.10-slim
ENV PYTHONDONTWRITEBYTECODE=1
ARG REF=main
USER root

View File

@ -12,8 +12,6 @@ SHELL ["sh", "-lc"]
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# Disable kernel mapping for now until all tests pass
ENV DISABLE_KERNEL_MAPPING=1
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
@ -26,9 +24,7 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA && python3 -m pip uninstall -y tensorflow tensorflow_text tensorflow_probability
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
RUN python3 -m pip install --no-cache-dir -U timm

View File

@ -15,7 +15,6 @@ RUN apt update && \
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels

View File

@ -0,0 +1,71 @@
FROM intel/deep-learning-essentials:2025.1.3-0-devel-ubuntu24.04 AS base
LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VERSION=3.12
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
apt-get update
RUN apt-get update && \
apt-get -y install \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
vim \
numactl \
gnupg2 \
gpg-agent \
python3-dev \
python3-opencv \
unzip \
ffmpeg \
tesseract-ocr \
espeak-ng \
wget \
ncurses-term \
google-perftools \
libjemalloc-dev \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# Use virtual env because Ubuntu:24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VERSION} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install --upgrade pip wheel
RUN pip install torch torchvision torchaudio torchcodec --index-url https://download.pytorch.org/whl/cpu --no-cache-dir
RUN pip install av pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sentence_transformers sacremoses nltk rouge_score librosa soundfile mpi4py pytorch_msssim
RUN pip install onnx optimum onnxruntime
RUN pip install autoawq
RUN pip install gptqmodel --no-build-isolation
RUN pip install -U datasets timm transformers accelerate peft diffusers opencv-python kenlm evaluate
RUN pip install -U intel-openmp
# install bitsandbytes
RUN git clone https://github.com/bitsandbytes-foundation/bitsandbytes.git && cd bitsandbytes/ && \
cmake -DCOMPUTE_BACKEND=cpu -S . && make && pip install . && cd ../
# CPU don't need triton
RUN pip uninstall triton -y
ENV LD_PRELOAD=${LD_PRELOAD}:/opt/venv/lib/libiomp5.so:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
ENV KMP_AFFINITY=granularity=fine,compact,1,0
RUN touch /entrypoint.sh
RUN chmod +x /entrypoint.sh
RUN echo "#!/bin/bash" >> /entrypoint.sh
RUN echo "/bin/bash" >> /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]

View File

@ -1,59 +0,0 @@
ARG BASE_DOCKER_IMAGE
FROM $BASE_DOCKER_IMAGE
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs libaio-dev
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
ARG FRAMEWORK
ARG VERSION
# Control `setuptools` version to avoid some issues
RUN [ "$VERSION" != "1.10" ] && python3 -m pip install -U setuptools || python3 -m pip install -U "setuptools<=59.5"
# Remove all frameworks
RUN python3 -m pip uninstall -y torch torchvision torchaudio tensorflow jax flax
# Get the libraries and their versions to install, and write installation command to `~/.profile`.
RUN python3 ./transformers/utils/past_ci_versions.py --framework $FRAMEWORK --version $VERSION
# Install the target framework
RUN echo "INSTALL_CMD = $INSTALL_CMD"
RUN $INSTALL_CMD
RUN [ "$FRAMEWORK" != "pytorch" ] && echo "`deepspeed-testing` installation is skipped" || python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Remove `accelerate`: it requires `torch`, and this causes import issues for TF-only testing
# We will install `accelerate@main` in Past CI workflow file
RUN python3 -m pip uninstall -y accelerate
# Uninstall `torch-tensorrt` and `apex` shipped with the base image
RUN python3 -m pip uninstall -y torch-tensorrt apex
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -23,9 +23,6 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# Install transformers
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video,audio]
# Remove tensorflow and flax as they are no longer supported by transformers
RUN python3 -m pip uninstall -y tensorflow flax
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
@ -38,3 +35,10 @@ RUN python3 -m pip uninstall -y kernels
# On ROCm, torchcodec is required to decode audio files and 0.4 or 0.6 fails
RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"
# Install flash attention from source. Tested with commit 6387433156558135a998d5568a9d74c1778666d8
RUN git clone https://github.com/ROCm/flash-attention/ -b tridao && \
cd flash-attention && \
GPU_ARCHS="gfx942" python setup.py install
RUN python3 -m pip install --no-cache-dir einops

View File

@ -25,8 +25,6 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y tensorflow flax
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -1,4 +1,4 @@
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu22.04
FROM nvidia/cuda:12.6.0-cudnn-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -9,11 +9,9 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.6.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu121'
# Disable kernel mapping for quantization tests
ENV DISABLE_KERNEL_MAPPING=1
ARG CUDA='cu126'
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
@ -30,31 +28,20 @@ RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio tor
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
# needed in bnb and awq
RUN python3 -m pip install --no-cache-dir einops
# Add bitsandbytes for mixed int8 testing
RUN python3 -m pip install --no-cache-dir bitsandbytes
# Add gptqmodel for gtpq quantization testing, installed from source for pytorch==2.6.0 compatibility
RUN python3 -m pip install lm_eval
RUN git clone https://github.com/ModelCloud/GPTQModel.git && cd GPTQModel && pip install -v . --no-build-isolation
# Add optimum for gptq quantization testing
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/optimum@main#egg=optimum
# Add PEFT
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/peft@main#egg=peft
# Add aqlm for quantization testing
RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# needed in bnb and awq
RUN python3 -m pip install --no-cache-dir einops
# Add vptq for quantization testing
RUN pip install vptq
# Add bitsandbytes
RUN python3 -m pip install --no-cache-dir bitsandbytes
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# # Add gptqmodel
# RUN python3 -m pip install --no-cache-dir gptqmodel
# Add hqq for quantization testing
RUN python3 -m pip install --no-cache-dir hqq
@ -63,25 +50,11 @@ RUN python3 -m pip install --no-cache-dir hqq
RUN python3 -m pip install --no-cache-dir gguf
# Add autoawq for quantization testing
# New release v0.2.8
RUN python3 -m pip install --no-cache-dir autoawq[kernels]
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir optimum-quanto
# Add eetq for quantization testing
RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add fp-quant for quantization testing
# Requires py3.11 but our CI runs on 3.9
# RUN python3 -m pip install --no-cache-dir "fp-quant>=0.1.6"
# Add compressed-tensors for quantization testing
RUN python3 -m pip install --no-cache-dir compressed-tensors
@ -89,7 +62,10 @@ RUN python3 -m pip install --no-cache-dir compressed-tensors
RUN python3 -m pip install --no-cache-dir amd-quark
# Add AutoRound for quantization testing
RUN python3 -m pip install --no-cache-dir "auto-round>=0.5.0"
RUN python3 -m pip install --no-cache-dir auto-round
# Add torchao for quantization testing
RUN python3 -m pip install --no-cache-dir torchao
# Add transformers in editable mode
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch]
@ -103,3 +79,27 @@ RUN python3 -m pip uninstall -y flash-attn
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
# Add fp-quant for quantization testing
RUN python3 -m pip install --no-cache-dir "fp-quant>=0.2.0"
# Low usage or incompatible lib, will enable later on
# # Add aqlm for quantization testing
# RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# # Add vptq for quantization testing
# RUN pip install vptq
# Add spqr for quantization testing
# Commented for now as No matching distribution found we need to reach out to the authors
# RUN python3 -m pip install --no-cache-dir spqr_quant[gpu]
# # Add eetq for quantization testing
# RUN git clone https://github.com/NetEase-FuXi/EETQ.git && cd EETQ/ && git submodule update --init --recursive && pip install .
# # Add flute-kernel and fast_hadamard_transform for quantization testing
# # Commented for now as they cause issues with the build
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git

View File

@ -1,25 +0,0 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-tensorflow,testing]
# If set to nothing, will install the latest version
ARG TENSORFLOW='2.13'
RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSION='tensorflow'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y torch flax
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir -U "tensorflow_probability<0.22"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -50,7 +50,7 @@ Begin translating the text!
1. Start with the `_toctree.yml` file that corresponds to your documentation chapter. This file is essential for rendering the table of contents on the website.
- If the `_toctree.yml` file doesnt exist for your language, create one by copying the English version and removing unrelated sections.
- If the `_toctree.yml` file doesn't exist for your language, create one by copying the English version and removing unrelated sections.
- Ensure it is placed in the `docs/source/LANG-ID/` directory.
Heres an example structure for the `_toctree.yml` file:

View File

@ -123,8 +123,6 @@
title: تشغيل التدريب على Amazon SageMaker
- local: serialization
title: التصدير إلى ONNX
- local: tflite
title: التصدير إلى TFLite
- local: torchscript
title: التصدير إلى TorchScript
- local: notebooks
@ -184,8 +182,6 @@
# title: التدريب الفعال على وحدة المعالجة المركزية (CPU)
# - local: perf_train_cpu_many
# title: التدريب الموزع لوحدة المعالجة المركزية (CPU)
# - local: perf_train_tpu_tf
# title: التدريب على (TPU) باستخدام TensorFlow
# - local: perf_train_special
# title: تدريب PyTorch على Apple silicon
# - local: perf_hardware
@ -203,8 +199,6 @@
# title: إنشاء نموذج كبير
# - local: debugging
# title: تصحيح الأخطاء البرمجية
# - local: tf_xla
# title: تكامل XLA لنماذج TensorFlow
# - local: perf_torch_compile
# title: تحسين الاستدلال باستخدام `torch.compile()`
# title: الأداء وقابلية التوسع
@ -260,8 +254,6 @@
# title: التكوين
# - local: main_classes/data_collator
# title: مجمع البيانات
# - local: main_classes/keras_callbacks
# title: استدعاءات Keras
# - local: main_classes/logging
# title: التسجيل
# - local: main_classes/model

View File

@ -52,7 +52,7 @@
<figcaption class="mt-2 text-center text-sm text-gray-500">الصورة توضح مخطط مراحل نموذج Swin.</figcaption>
</div>
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PretrainedConfig.from_pretrained`]:
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PreTrainedConfig.from_pretrained`]:
* `out_indices` هو فهرس الطبقة التي تريد الحصول على خريطة الميزات منها
* `out_features` هو اسم الطبقة التي تريد الحصول على خريطة الميزات منها
@ -115,8 +115,6 @@
## النموذج التلقائي (AutoModel)
<frameworkcontent>
<pt>
تسمح لك فئات `AutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -143,25 +141,4 @@
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</pt>
<tf>
أخيرًا، تسمح لك فئات `TFAutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام نفس نقطة التفتيش لتحميل بنية لمهمة مختلفة:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `TFAutoModelFor` لتحميل نسخ لنماذج مُدربة مسبقًا. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، ستتعرف على كيفية استخدام المُجزّئ اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</tf>
</frameworkcontent>

View File

@ -54,19 +54,19 @@ DistilBertConfig {
```
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PretrainedConfig.from_pretrained`] :
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PreTrainedConfig.from_pretrained`] :
```py
>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PretrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PreTrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
```py
>>> my_config.save_pretrained(save_directory="./your_model_save_path")
```
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PretrainedConfig.from_pretrained`]:
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PreTrainedConfig.from_pretrained`]:
```py
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
@ -81,8 +81,6 @@ DistilBertConfig {
الخطوة التالية هي إنشاء [نموذج](main_classes/models). النموذج - ويُشار إليه أحيانًا باسم البنية - يُحدد وظيفة كل طبقة والعمليات الحسابية المُنفذة. تُستخدم خصائص مثل `num_hidden_layers` من التكوين لتحديد هذه البنية. تشترك جميع النماذج في فئة أساسية واحدة هي [`PreTrainedModel`] وبعض الوظائف المُشتركة مثل غيير حجم مُدخلات الكلمات وتقليص رؤوس آلية الانتباه الذاتي. بالإضافة إلى ذلك، فإن جميع النماذج هي فئات فرعية إما من [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)، [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) أو [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . هذا يعني النماذج متوافقة مع كل استخدام لإطار العمل الخاص بها.
<frameworkcontent>
<pt>
قم بتحميل خصائص التكوين المخصصة الخاصة بك في النموذج:
```py
@ -105,39 +103,11 @@ DistilBertConfig {
```py
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</pt>
<tf>
قم بتحميل خصائص التكوين المُخصصة الخاصة بك في النموذج:
```py
>>> from transformers import TFDistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> tf_model = TFDistilBertModel(my_config)
```
هذا ينشئ نموذجًا بقيم عشوائية بدلاً من الأوزان المُدربة مسبقًا. لن يكون هذا النموذج مفيدًا حتى يتم تدريبه. تُعد عملية التدريب مكلفة وتستغرق وقتًا طويلاً. من الأفضل بشكل عام استخدام نموذج مُدرب مسبقًا للحصول على نتائج أفضل بشكل أسرع، مع استخدام جزء بسيط فقط من الموارد المطلوبة للتدريب.
قم بإنشاء نموذج مُدرب مسبقًا باستخدام [`~TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
عندما تقوم بتحميل الأوزان المُدربة مسبقًا،يتم تحميل إعدادات النموذج الافتراضي تلقائيًا إذا كان النموذج من مكتبة 🤗 Transformers. ومع ذلك، يمكنك أيضًا استبدال - بعض أو كل - إعدادات النموذج الافتراضية بإعداداتك الخاصة:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</tf>
</frameworkcontent>
### رؤوس النموذج
في هذه المرحلة، لديك نموذج DistilBERT الأساسي الذي يخرج *حالات الكامنة*. تُمرَّر هذه الحالات الكامنة كمدخلات لرأس النموذج لإنتاج المخرجات النهائية. توفر مكتبة 🤗 Transformers رأس نموذج مختلف لكل مهمة طالما أن النموذج يدعم المهمة (أي لا يمكنك استخدام DistilBERT لمهمة تسلسل إلى تسلسل مثل الترجمة).
<frameworkcontent>
<pt>
على سبيل المثال، [`DistilBertForSequenceClassification`] هو نموذج DistilBERT الأساس مزودًا برأس تصنيف تسلسلي. يُشكّل رأس التصنيف التسلسلي طبقة خطية فوق المخرجات المجمعة.
```py
@ -153,25 +123,6 @@ DistilBertConfig {
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</pt>
<tf>
على سبيل المثال، [`TFDistilBertForSequenceClassification`] هو نموذج DistilBERT الأساسي برأس تصنيف تسلسل. رأس التصنيف التسلسلي هو طبقة خطية أعلى المخرجات المجمعة.
```py
>>> from transformers import TFDistilBertForSequenceClassification
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام هذا نقطة التحقق لمهمة أخرى عن طريق التبديل إلى رأس نموذج مختلف. لمهمة الإجابة على الأسئلة، ستستخدم رأس النموذج [`TFDistilBertForQuestionAnswering`]. رأس الإجابة على الأسئلة مشابه لرأس التصنيف التسلسلي باستثناء أنه طبقة خطية أعلى حالات الإخراج المخفية.
```py
>>> from transformers import TFDistilBertForQuestionAnswering
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</tf>
</frameworkcontent>
## مجزئ النصوص

View File

@ -20,11 +20,11 @@
في مثالنا، سنعدّل بعض الوسائط في فئة ResNet التي قد نرغب في ضبطها. ستعطينا التكوينات المختلفة أنواع ResNets المختلفة الممكنة. سنقوم بتخزين هذه الوسائط بعد التحقق من صحته.
```python
from transformers import PretrainedConfig
from transformers import PreTrainedConfig
from typing import List
class ResnetConfig(PretrainedConfig):
class ResnetConfig(PreTrainedConfig):
model_type = "resnet"
def __init__(
@ -58,11 +58,11 @@ class ResnetConfig(PretrainedConfig):
```
الأشياء الثلاثة المهمة التي يجب تذكرها عند كتابة تكوينك الخاص هي:
- يجب أن ترث من `PretrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PretrainedConfig` أي معامﻻت إضافية kwargs،
- يجب أن ترث من `PreTrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PreTrainedConfig` أي معامﻻت إضافية kwargs،
- يجب تمرير هذه المعامﻻت الإضافية إلى دالة `__init__` فى الفئة الأساسية الاعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PretrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PreTrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
تحديد `model_type` لتكوينك (هنا `model_type="resnet"`) ليس إلزاميًا، ما لم ترغب في
تسجيل نموذجك باستخدام الفئات التلقائية (راجع القسم الأخير).
@ -82,7 +82,7 @@ resnet50d_config.save_pretrained("custom-resnet")
resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
```
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PretrainedConfig`]، مثل [`~PretrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PreTrainedConfig`]، مثل [`~PreTrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
## كتابة نموذج مخصص

View File

@ -60,10 +60,10 @@ pip install transformers bitsandbytes>=0.39.0 -q
أولاً، تحتاج إلى تحميل النموذج.
```py
>>> from transformers import AutoModelForCausalLM
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -113,12 +113,12 @@ pip install transformers bitsandbytes>=0.39.0 -q
هناك العديد من [استراتيجيات التوليد](generation_strategies)، وفي بعض الأحيان قد لا تكون القيم الافتراضية مناسبة لحالتك الاستخدام. إذا لم تكن الإخراج الخاصة بك متوافقة مع ما تتوقعه، فقد قمنا بإنشاء قائمة بأكثر الأخطاء الشائعة وكيفية تجنبها.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
>>> tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -192,7 +192,7 @@ LLMs هي [معماريات فك التشفير فقط](https://huggingface.co/l
```python
>>> tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
>>> model = AutoModelForCausalLM.from_pretrained(
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
>>> set_seed(0)
>>> prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""

View File

@ -231,7 +231,7 @@ flush()
دعنا نرى ما هو استهلاك ذاكرة GPU الذروة الذي يوفره تكميم 4 بت. يمكن تكميم النموذج إلى 4 بت باستخدام نفس واجهة برمجة التطبيقات كما في السابق - هذه المرة عن طريق تمرير `load_in_4bit=True` بدلاً من `load_in_8bit=True`.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", load_in_4bit=True, pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", quantization_config=BitsAndBytesConfig(load_in_4bit=True), pad_token_id=0)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
@ -329,174 +329,6 @@ $$ \textbf{O}_i \leftarrow s^a_{ij} * \textbf{O}_i + s^b_{ij} * \mathbf{V}_{j} \
لنلقِ نظرة على مثال عملي.
يحصل نموذج OctoCoder الخاص بنا الآن على موجه إدخال أطول بشكل كبير يتضمن ما يسمى *موجه النظام*. تُستخدم موجهات النظام لتوجيه LLM إلى مساعد أفضل مصمم لمهام المستخدمين.
فيما يلي، نستخدم موجه النظام الذي سيجعل OctoCoder مساعد ترميز أفضل.
```python
system_prompt = """Below are a series of dialogues between various people and an AI technical assistant.
The assistant tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble but knowledgeable.
The assistant is happy to help with code questions and will do their best to understand exactly what is needed.
It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer.
That said, the assistant is practical really does its best, and doesn't let caution get too much in the way of being useful.
The Starcoder models are a series of 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2) (excluding opt-out requests).
The model uses Multi Query Attention, was trained using the Fill-in-the-Middle objective, and with 8,192 tokens context window for a trillion tokens of heavily deduplicated data.
-----
Question: Write a function that takes two lists and returns a list that has alternating elements from each input list.
Answer: Sure. Here is a function that does that.
def alternating(list1, list2):
results = []
for i in range(len(list1)):
results.append(list1[i])
results.append(list2[i])
return results
Question: Can you write some test cases for this function?
Answer: Sure, here are some tests.
assert alternating([10, 20, 30], [1, 2, 3]) == [10, 1, 20, 2, 30, 3]
assert alternating([True, False], [4, 5]) == [True, 4, False, 5]
assert alternating([], []) == []
Question: Modify the function so that it returns all input elements when the lists have uneven length. The elements from the longer list should be at the end.
Answer: Here is the modified function.
def alternating(list1, list2):
results = []
for i in range(min(len(list1), len(list2))):
results.append(list1[i])
results.append(list2[i])
if len(list1) > len(list2):
results.extend(list1[i+1:])
else:
results.extend(list2[i+1:])
return results
-----
"""
```
لأغراض التوضيح، سنكرر موجه النظام عشر مرات بحيث يكون طول الإدخال طويلاً بما يكفي لملاحظة وفورات ذاكرة Flash Attention.
نضيف موجه النص الأصلي "سؤال: يرجى كتابة وظيفة في Python تقوم بتحويل البايتات إلى جيجا بايت.
```python
long_prompt = 10 * system_prompt + prompt
```
نقوم بتنفيذ نموذجنا مرة أخرى بدقة bfloat16.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("bigcode/octocoder")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
```
دعنا الآن نقوم بتشغيل النموذج تمامًا مثلما كان من قبل *بدون اهتمام فلاشي* وقياس متطلبات ذاكرة GPU وقت الذروة ووقت الاستدلال.
```python
import time
start_time = time.time()
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 10.96854019165039 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس الإخراج كما كان من قبل، ولكن هذه المرة، يقوم النموذج بتكرار الإجابة عدة مرات حتى يتم قطعها عند 60 رمزًا. ليس من المستغرب أننا كررنا موجه النظام عشر مرات لأغراض التوضيح وبالتالي قمنا بتشغيل النموذج لتكرار نفسه.
**ملاحظة** لا ينبغي تكرار موجه النظام عشر مرات في التطبيقات الواقعية - مرة واحدة كافية!
دعنا نقيس متطلبات ذاكرة GPU وقت الذروة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
37.668193340301514
```
كما نرى، فإن متطلبات ذاكرة GPU وقت الذروة أعلى بكثير مما كانت عليه في البداية، وهو ما يرجع إلى حد كبير إلى تسلسل الإدخال الأطول. أيضًا، يستغرق التوليد أكثر من دقيقة بقليل الآن.
نستدعي `flush()` لتحرير ذاكرة GPU لتجربتنا التالية.
```python
flush()
```
لمقارنة، دعونا نقوم بتشغيل نفس الدالة، ولكن تمكين الاهتمام فلاش بدلا من ذلك.
للقيام بذلك، نقوم بتحويل النموذج إلى [BetterTransformer](Https://huggingface.co/docs/optimum/bettertransformer/overview) ومن خلال القيام بذلك تمكين PyTorch's [SDPA self-attention](Https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) والتي بدورها قادرة على استخدام الاهتمام فلاش.
```python
model.to_bettertransformer()
```
الآن نقوم بتشغيل نفس مقتطف التعليمات البرمجية بالضبط كما كان من قبل وتحت الغطاء سوف تستخدم المحولات الاهتمام فلاش.
```py
start_time = time.time()
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 3.0211617946624756 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس النتيجة بالضبط كما كان من قبل، ولكن يمكننا ملاحظة تسريع كبير بفضل الاهتمام فلاش.
دعنا نقيس استهلاك الذاكرة لآخر مرة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
32.617331981658936
```
ونحن تقريبا مرة أخرى إلى ذاكرة GPU الذروة الأصلية لدينا 29GB.
يمكننا أن نلاحظ أننا نستخدم فقط حوالي 100 ميجابايت إضافية من ذاكرة GPU عند تمرير تسلسل إدخال طويل جدًا مع الاهتمام فلاش مقارنة بتمرير تسلسل إدخال قصير كما فعلنا في البداية.
```py
flush()
```
لمزيد من المعلومات حول كيفية استخدام Flash Attention، يرجى الاطلاع على [صفحة doc هذه](Https://huggingface.co/docs/transformers/en/perf_infer_gpu_one#flashattention-2).
## 3. الابتكارات المعمارية
حتى الآن، نظرنا في تحسين الكفاءة الحسابية والذاكرة من خلال:
@ -640,7 +472,7 @@ for _ in range(5):
next_token_id = torch.argmax(next_logits, dim=-1)
print("shape of input_ids", next_token_id.shape)
print("length of key-value cache", len(past_key_values[0][0])) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
print("length of key-value cache", past_key_values.get_seq_length()) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
generated_tokens.append(next_token_id.item())
generated_text = tokenizer.batch_decode(generated_tokens)

View File

@ -65,43 +65,15 @@ pip install huggingface_hub
تحويل نقطة التحقق لإطار عمل آخر أمر سهل. تأكد من تثبيت PyTorch و TensorFlow (راجع [هنا](installation) لتعليمات التثبيت)، ثم ابحث عن النموذج الملائم لمهمتك في الإطار الآخر.
<frameworkcontent>
<pt>
حدد `from_tf=True` لتحويل نقطة تحقق من TensorFlow إلى PyTorch:
```py
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
```
</pt>
<tf>
حدد `from_pt=True` لتحويل نقطة تحقق من PyTorch إلى TensorFlow:
```py
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
بعد ذلك، يمكنك حفظ نموذج TensorFlow الجديد بنقطة التحقق الجديدة:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
```
</tf>
<jax>
إذا كان النموذج متاحًا في Flax، فيمكنك أيضًا تحويل نقطة تحقق من PyTorch إلى Flax:
```py
>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
... "path/to/awesome-name-you-picked", from_pt=True
... )
```
</jax>
</frameworkcontent>
## دفع نموذج أثناء التدريب
<frameworkcontent>
<pt>
<Youtube id="Z1-XMy-GNLQ"/>
مشاركة نموذجك على Hub مر بسيط للغاية كل ما عليك هو إضافة معلمة أو استدعاء رد إضافي. كما تذكر من درس [التدريب الدقيق](training)، فإن فئة [`TrainingArguments`] هي المكان الذي تحدد فيه المعلمات الفائقة وخيارات التدريب الإضافية. تشمل إحدى خيارات التدريب هذه القدرة على دفع النموذج مباشرة إلى المنصة Hub. قم بتعيين `push_to_hub=True` في [`TrainingArguments`]:
@ -127,29 +99,6 @@ pip install huggingface_hub
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
شارك نموذجًا على Hub باستخدام [`PushToHubCallback`]. في دالة [`PushToHubCallback`], أضف:
- دليل إخراج لنموذجك.
- مُجزّئ اللغوي.
- `hub_model_id`، والذي هو اسم مستخدم Hub واسم النموذج الخاص بك.
```py
>>> from transformers import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
... )
```
أضف الاستدعاء إلى [`fit`](https://keras.io/api/models/model_training_apis/)، وسيقوم 🤗 Transformers بدفع النموذج المدرب إلى Hub:
```py
>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
```
</tf>
</frameworkcontent>
## استخدام دالة `push_to_hub`
@ -220,4 +169,4 @@ pip install huggingface_hub
* قم بإنشاء ملف `README.md` وتحميله يدويًا.
* انقر فوق الزر **Edit model card** في مستودع نموذجك.
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).

View File

@ -152,8 +152,6 @@ pip install datasets
قم بتعيين معلمة `return_tensors` إلى إما `pt` لـ PyTorch، أو `tf` لـ TensorFlow:
<frameworkcontent>
<pt>
```py
>>> batch_sentences = [
@ -173,33 +171,6 @@ pip install datasets
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
</pt>
<tf>
```py
>>> batch_sentences = [
... "But what about second breakfast?",
... "Don't think he knows about second breakfast, Pip.",
... "What about elevensies?",
... ]
>>> encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(encoded_input)
{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0],
[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
[101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}
```
</tf>
</frameworkcontent>
<Tip>

View File

@ -12,20 +12,10 @@
ستحتاج أيضًا إلى تثبيت إطار عمل التعلم الآلي المفضل لديك:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
## خط الأنابيب
@ -122,8 +112,6 @@ label: NEGATIVE, with score: 0.5309
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
استخدم [`AutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `AutoClass` في القسم التالي):
```py
@ -132,18 +120,6 @@ label: NEGATIVE, with score: 0.5309
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
استخدم [`TFAutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `TFAutoClass` في القسم التالي):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
حدد النموذج والمعالج في [`pipeline`]. الآن يمكنك تطبيق `classifier` على النص الفرنسي:
@ -192,8 +168,6 @@ label: NEGATIVE, with score: 0.5309
يمكن المجزئ أيضًا قبول قائمة من المدخلات، ويقوم بـ "حشو" و"تقصير" النص لإرجاع كدفعة بطول موحد:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
@ -204,20 +178,6 @@ label: NEGATIVE, with score: 0.5309
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
<Tip>
@ -227,8 +187,6 @@ label: NEGATIVE, with score: 0.5309
### AutoModel
<frameworkcontent>
<pt>
تقدم مكتبة 🤗 Transformers طريقة بسيطة وموحدة لتحميل نماذج مدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`AutoModel`] كما لو كنت تقوم بتحميل [`AutoTokenizer`]. الفرق الوحيد هو اختيار فئة [`AutoModel`] المناسبة للمهمة. بالنسبة لتصنيف النص (أو التسلسل)، يجب عليك تحميل [`AutoModelForSequenceClassification`]:
```py
@ -264,39 +222,6 @@ label: NEGATIVE, with score: 0.5309
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
يوفر 🤗 Transformers طريقة بسيطة وموحدة لتحميل مثيلات مُدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`TFAutoModel`] مثل تحميل [`AutoTokenizer`]. والفرق الوحيد هو تحديد [`TFAutoModel`] الصحيح للمهمة. للتصنيف النصي (أو التسلسلي)، يجب تحميل [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
راجع [ملخص المهام](./task_summary) للمهام المدعومة بواسطة فئة [`AutoModel`].
</Tip>
الآن، مرر دفعة المدخلات المعالجة مسبقًا مباشرة إلى النموذج. يمكنك تمرير المصفوفات كما هي:
```py
>>> tf_outputs = tf_model(tf_batch)
```
يقوم النموذج بإخراج التنشيطات النهائية في سمة `logits`. طبق دالة softmax على `logits` لاسترداد الاحتمالات:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
@ -306,8 +231,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
### حفظ النموذج
<frameworkcontent>
<pt>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`PreTrainedModel.save_pretrained`]:
```py
@ -321,28 +244,9 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`TFPreTrainedModel.save_pretrained`]:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
عندما تكون مستعدًا لاستخدام النموذج مرة أخرى، أعد تحميله باستخدام [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
من الميزات الرائعة في 🤗 Transformers القدرة على حفظ نموذج وإعادة تحميله كنموذج PyTorch أو TensorFlow. يمكن أن يحول معامل `from_pt` أو `from_tf` النموذج من إطار عمل إلى آخر:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
@ -350,17 +254,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</tf>
</frameworkcontent>
## إنشاء نماذج مخصصة
@ -375,8 +268,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`AutoModel.from_config`]:
```py
@ -384,17 +275,6 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
الق نظرة على دليل [إنشاء بنية مخصصة](./create_a_model) لمزيد من المعلومات حول بناء التكوينات المخصصة.

View File

@ -76,8 +76,6 @@ pip install -r requirements.txt
## تشغيل نص برمجي
<frameworkcontent>
<pt>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) على بنية تدعم الملخص.
@ -95,31 +93,8 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets/).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام Keras على بنية تدعم الملخص.
- يوضح المثال التالي كيفية ضبط نموذج [T5-small](https://huggingface.co/google-t5/t5-small) على مجموعة بيانات [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail).
- يتطلب نموذج T5 ماعمل `source_prefix` إضافية بسبب الطريقة التي تم تدريبه بها. يتيح هذا المطالبة لـ T5 معرفة أن هذه مهمة التلخيص.
```bash
python examples/tensorflow/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## التدريب الموزع والدقة المختلطة
@ -141,7 +116,6 @@ torchrun \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -149,8 +123,6 @@ torchrun \
## تشغيل نص برمجي على وحدة معالجة الدقة الفائقة (TPU)
<frameworkcontent>
<pt>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. يدعم PyTorch وحدات معالجة الدقة الفائقة (TPUs) مع [XLA](https://www.tensorflow.org/xla) مجمع الدقة الفائقة للتعلم العميق (راجع [هنا](https://github.com/pytorch/xla/blob/master/README.md) لمزيد من التفاصيل). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتشغيل نص `xla_spawn.py` البرمجي واستخدم معامل `num_cores` لتعيين عدد وحدات معالجة الدقة الفائقة (TPU) التي تريد استخدامها.
@ -166,28 +138,8 @@ python xla_spawn.py --num_cores 8 \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. تستخدم نصوص TensorFlow البرمجية استراتيجية [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) للتدريب على وحدات معالجة الدقة الفائقة (TPUs). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتمرير اسم مورد وحدة معالجة الدقة الفائقة (TPU) إلى حجة `tpu`.
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## تشغيل نص برمجي باستخدام 🤗 Accelerate
@ -242,7 +194,6 @@ python examples/pytorch/summarization/run_summarization.py \
--summary_column summary_column_name \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--predict_with_generate
@ -270,7 +221,6 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -284,8 +234,6 @@ examples/pytorch/summarization/run_summarization.py -h
خيار آخر مفيد لتمكينه هو استئناف التدريب من نقطة تفتيش سابقة. سيضمن ذلك أنك تستطيع الاستمرار من حيث توقفت دون البدء من جديد إذا تم مقاطعة تدريبك. هناك طريقتان لاستئناف التدريب من نقطة تفتيش.
تستخدم الطريقة الأولى المعلمة `output_dir previous_output_dir` لاستئناف التدريب من أحدث نقطة تفتيش مخزنة في `output_dir`. في هذه الحالة، يجب عليك إزالة `overwrite_output_dir`:
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
@ -297,24 +245,6 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--output_dir previous_output_dir \
--predict_with_generate
```
تستخدم الطريقة الثانية معلمة `resume_from_checkpoint path_to_specific_checkpoint` لاستئناف التدريب من مجلد نقطة تفتيش محددة.
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--resume_from_checkpoint path_to_specific_checkpoint \
--predict_with_generate
```
@ -346,6 +276,5 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```

View File

@ -182,8 +182,6 @@ pip install transformers datasets evaluate
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأفضل أن تقوم بـ *الحشو الديناميكي* للجمل إلى الطول الأطول في الدفعة أثناء التجميع، بدلاً من حشو كامل المجموعة من البيانات إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز للحشو، وحدد `mlm_probability` لحجب الرموز بشكل عشوائي عند كل تكرار للبيانات:
```py
@ -193,23 +191,9 @@ pip install transformers datasets evaluate
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
```
</pt>
<tf>
استخدم رمز نهاية التسلسل كرمز للحشو، وحدد `mlm_probability` لحجب الرموز بشكل عشوائي عند كل تكرار للبيانات:
```py
>>> from transformers import DataCollatorForLanguageModeling
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -267,75 +251,6 @@ Perplexity: 49.61
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتدريب نموذج باستخدام Keras، اطلع على [البرنامج التعليمي الأساسي](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لتدريب نموذج في TensorFlow، ابدأ بإعداد دالة المحسن، وجدول معدل التعلم، وبعض معاملات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل DistilGPT2 باستخدام [`TFAutoModelForCausalLM`]:
```py
>>> from transformers import TFAutoModelForCausalLM
>>> model = TFAutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
حول مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... lm_dataset["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... lm_dataset["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة الافتراضية، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # لا يوجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومجمّع البيانات في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_eli5_clm-model",
... tokenizer=tokenizer,
... )
```
أخيراً، أنت جاهز لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العصور، والتعليقات الخاصة بك لتدريب النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -365,8 +280,6 @@ Perplexity: 49.61
[{'generated_text': "Somatic hypermutation allows the immune system to be able to effectively reverse the damage caused by an infection.\n\n\nThe damage caused by an infection is caused by the immune system's ability to perform its own self-correcting tasks."}]
```
<frameworkcontent>
<pt>
قسم النص وإرجع `input_ids` كتنسورات PyTorch:
```py
@ -392,31 +305,3 @@ Perplexity: 49.61
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
["Somatic hypermutation allows the immune system to react to drugs with the ability to adapt to a different environmental situation. In other words, a system of 'hypermutation' can help the immune system to adapt to a different environmental situation or in some cases even a single life. In contrast, researchers at the University of Massachusetts-Boston have found that 'hypermutation' is much stronger in mice than in humans but can be found in humans, and that it's not completely unknown to the immune system. A study on how the immune system"]
```
</pt>
<tf>
قم بتقسيم النص وإرجاع `input_ids` كـ TensorFlow tensors:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_eli5_clm-model")
>>> inputs = tokenizer(prompt, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الملخص. للمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والبارامترات للتحكم في التوليد، راجع صفحة [استراتيجيات توليد النص](../generation_strategies).
```py
>>> from transformers import TFAutoModelForCausalLM
>>> model = TFAutoModelForCausalLM.from_pretrained("username/my_awesome_eli5_clm-model")
>>> outputs = model.generate(input_ids=inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
```
فك ترميز الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Somatic hypermutation allows the immune system to detect the presence of other viruses as they become more prevalent. Therefore, researchers have identified a high proportion of human viruses. The proportion of virus-associated viruses in our study increases with age. Therefore, we propose a simple algorithm to detect the presence of these new viruses in our samples as a sign of improved immunity. A first study based on this algorithm, which will be published in Science on Friday, aims to show that this finding could translate into the development of a better vaccine that is more effective for']
```
</tf>
</frameworkcontent>

View File

@ -176,8 +176,6 @@ pip install transformers datasets evaluate
الآن، قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأكثر كفاءة أن تقوم بـ *الحشو الديناميكي* ليصل طولها إلى أطول جملة في الدفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز الحشو وحدد `mlm_probability` لحجب الرموز عشوائياً كل مرة تكرر فيها البيانات:
@ -187,23 +185,9 @@ pip install transformers datasets evaluate
>>> tokenizer.pad_token = tokenizer.eos_token
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)
```
</pt>
<tf>
استخدم رمز نهاية التسلسل كرمز الحشو وحدد `mlm_probability` لحجب الرموز عشوائياً كل مرة تكرر فيها البيانات:
```py
>>> from transformers import DataCollatorForLanguageModeling
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -263,75 +247,6 @@ Perplexity: 8.76
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام Keras، ألق نظرة على الدليل الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لتعديل نموذج في TensorFlow، ابدأ بإعداد دالة محسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل DistilRoBERTa باستخدام [`TFAutoModelForMaskedLM`]:
```py
>>> from transformers import TFAutoModelForMaskedLM
>>> model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... lm_dataset["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... lm_dataset["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers لديها جميعها دالة خسارة افتراضية ذات صلة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة ما لم تكن تريد ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # لا توجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالج الرموز في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_eli5_mlm_model",
... tokenizer=tokenizer,
... )
```
أخيراً، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد العصور، والتعليقات الخاصة بك لتعديل النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائياً إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -372,8 +287,6 @@ Perplexity: 8.76
'sequence': 'The Milky Way is a small galaxy.'}]
```
<frameworkcontent>
<pt>
قم بتجزئة النص وإرجاع `input_ids` كمتجهات PyTorch. ستحتاج أيضًا إلى تحديد موضع رمز `<mask>`:
```py
@ -405,38 +318,3 @@ The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.
```
</pt>
<tf>
قم بتقسيم النص إلى رموز وإرجاع `input_ids` كـ TensorFlow tensors. ستحتاج أيضًا إلى تحديد موضع رمز `<mask>`:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_eli5_mlm_model")
>>> inputs = tokenizer(text, return_tensors="tf")
>>> mask_token_index = tf.where(inputs["input_ids"] == tokenizer.mask_token_id)[0, 1]
```
قم بتمرير المدخلات إلى النموذج وإرجاع `logits` للرمز المقنع:
```py
>>> from transformers import TFAutoModelForMaskedLM
>>> model = TFAutoModelForMaskedLM.from_pretrained("username/my_awesome_eli5_mlm_model")
>>> logits = model(**inputs).logits
>>> mask_token_logits = logits[0, mask_token_index, :]
```
ثم قم بإرجاع الرموز الثلاثة المقنعة ذات الاحتمالية الأعلى وطباعتها:
```py
>>> top_3_tokens = tf.math.top_k(mask_token_logits, 3).indices.numpy()
>>> for token in top_3_tokens:
... print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.
```
</tf>
</frameworkcontent>

View File

@ -116,8 +116,6 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
يقوم `DataCollatorForMultipleChoice` بتجميع جميع مدخلات النموذج، ويطبق الحشو، ثم يعيد تجميع النتائج في شكلها الأصلي:
<frameworkcontent>
<pt>
```py
>>> from dataclasses import dataclass
@ -158,50 +156,6 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
... batch["labels"] = torch.tensor(labels, dtype=torch.int64)
... return batch
```
</pt>
<tf>
```py
>>> from dataclasses import dataclass
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
>>> from typing import Optional, Union
>>> import tensorflow as tf
>>> @dataclass
... class DataCollatorForMultipleChoice:
... """
... Data collator that will dynamically pad the inputs for multiple choice received.
... """
... tokenizer: PreTrainedTokenizerBase
... padding: Union[bool, str, PaddingStrategy] = True
... max_length: Optional[int] = None
... pad_to_multiple_of: Optional[int] = None
... def __call__(self, features):
... label_name = "label" if "label" in features[0].keys() else "labels"
... labels = [feature.pop(label_name) for feature in features]
... batch_size = len(features)
... num_choices = len(features[0]["input_ids"])
... flattened_features = [
... [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
... ]
... flattened_features = sum(flattened_features, [])
... batch = self.tokenizer.pad(
... flattened_features,
... padding=self.padding,
... max_length=self.max_length,
... pad_to_multiple_of=self.pad_to_multiple_of,
... return_tensors="tf",
... )
... batch = {k: tf.reshape(v, (batch_size, num_choices, -1)) for k, v in batch.items()}
... batch["labels"] = tf.convert_to_tensor(labels, dtype=tf.int64)
... return batch
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -228,8 +182,6 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -283,93 +235,6 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فراجع الدرس الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل التعلم وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_train_epochs = 2
>>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
>>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
```
ثم يمكنك تحميل BERT باستخدام [`TFAutoModelForMultipleChoice`]:
```py
>>> from transformers import TFAutoModelForMultipleChoice
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_swag["train"],
... shuffle=True,
... batch_size=batch_size,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_swag["validation"],
... shuffle=False,
... batch_size=batch_size,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة مناسبة للمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> model.compile(optimizer=optimizer) # لا توجد وسيطة خسارة!
```
الخطوتان الأخيرتان قبل بدء التدريب هما: حساب دقة التنبؤات، وتوفير طريقة لرفع النموذج إلى Hub. ويمكن تحقيق ذلك باستخدام [استدعاءات Keras](../main_classes/keras_callbacks)
مرر دالتك `compute_metrics` إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك ومعالجك في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_model",
... tokenizer=tokenizer,
... )
```
ثم قم بتضمين الاستدعاءات معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! استدعِ[`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب والاستدعاءات لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -390,8 +255,6 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
>>> candidate2 = "The law applies to baguettes."
```
<frameworkcontent>
<pt>
قم بتحليل كل مطالبة وزوج إجابة مرشح وأعد تنسورات PyTorch. يجب عليك أيضًا إنشاء بعض `العلامات`:
```py
@ -419,34 +282,3 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
>>> predicted_class
0
```
</pt>
<tf>
قم بتحليل كل مطالبة وزوج إجابة مرشح وأعد موترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="tf", padding=True)
```
مرر مدخلاتك إلى النموذج وأعد القيم logits:
```py
>>> from transformers import TFAutoModelForMultipleChoice
>>> model = TFAutoModelForMultipleChoice.from_pretrained("username/my_awesome_swag_model")
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in inputs.items()}
>>> outputs = model(inputs)
>>> logits = outputs.logits
```
استخرج الفئة ذات الاحتمالية الأكبر:
```py
>>> predicted_class = int(tf.math.argmax(logits, axis=-1)[0])
>>> predicted_class
0
```
</tf>
</frameworkcontent>

View File

@ -167,29 +167,15 @@ pip install transformers datasets evaluate
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DefaultDataCollator`]. بخلاف مجمّعات البيانات الأخرى في 🤗 Transformers، لا يطبق [`DefaultDataCollator`] أي معالجة مسبقة إضافية مثل الحشو.
<frameworkcontent>
<pt>
```py
>>> from transformers import DefaultDataCollator
>>> data_collator = DefaultDataCollator()
```
</pt>
<tf>
```py
>>> from transformers import DefaultDataCollator
>>> data_collator = DefaultDataCollator(return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -240,82 +226,6 @@ pip install transformers datasets evaluate
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن، وجدول معدل التعلم، وبعض المعاملات الفائقة للتدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_epochs = 2
>>> total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
>>> optimizer, schedule = create_optimizer(
... init_lr=2e-5,
... num_warmup_steps=0,
... num_train_steps=total_train_steps,
... )
```
ثم يمكنك تحميل DistilBERT باستخدام [`TFAutoModelForQuestionAnswering`]:
```py
>>> from transformers import TFAutoModelForQuestionAnswering
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_squad["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_squad["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer)
```
آخر شيء يجب إعداده قبل بدء التدريب هو توفير طريقة لدفع نموذجك إلى Hub. يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالجك المعجمي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_qa_model",
... tokenizer=tokenizer,
... )
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العهود، ومعاودة الاتصال الخاصة بك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -357,8 +267,6 @@ pip install transformers datasets evaluate
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسّم النص وأرجع تنسورات PyTorch:
@ -394,39 +302,3 @@ pip install transformers datasets evaluate
>>> tokenizer.decode(predict_answer_tokens)
'176 billion parameters and can generate text in 46 languages natural languages and 13'
```
</pt>
<tf>
قم بتحليل النص المعجمي وأعد موترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_qa_model")
>>> inputs = tokenizer(question, context, return_tensors="tf")
```
مرر مدخلاتك إلى النموذج وأعد `logits`:
```py
>>> from transformers import TFAutoModelForQuestionAnswering
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("my_awesome_qa_model")
>>> outputs = model(**inputs)
```
احصل على أعلى احتمال من مخرجات النموذج لموضعي البداية والنهاية:
```py
>>> answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
>>> answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
```
استخلاص الإجابة من الرموز المتوقعة:
```py
>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
>>> tokenizer.decode(predict_answer_tokens)
'176 billion parameters and can generate text in 46 languages natural languages and 13'
```
</tf>
</frameworkcontent>

View File

@ -92,24 +92,12 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`]. الأكثر كفاءة هو استخدام الحشو الديناميكي لجعل الجمل متساوية في الطول داخل كل دفعة، بدلًا من حشو كامل البيانات إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorWithPadding
>>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorWithPadding
>>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم(Evaluate)
@ -143,8 +131,6 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> label2id = {"NEGATIVE": 0, "POSITIVE": 1}
```
<frameworkcontent>
<pt>
<Tip>
إذا لم تكن على دراية بضبط نموذج دقيق باستخدام [`Trainer`], فالق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
@ -205,98 +191,6 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بضبط نموذج باستخدام Keras، قم بالاطلاع على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة المحسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> import tensorflow as tf
>>> batch_size = 16
>>> num_epochs = 5
>>> batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
>>> total_train_steps = int(batches_per_epoch * num_epochs)
>>> optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
```
ثم يمكنك تحميل DistilBERT مع [`TFAutoModelForSequenceClassification`] بالإضافة إلى عدد التصنيفات المتوقعة، وتعيينات التسميات:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_imdb["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_imdb["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب الدقة من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
قم بتمرير دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك والمجزئ اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع الاستدعاءات معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد الحقبات، واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -328,8 +222,6 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قم يتجزئة النص وإرجاع تنسورات PyTorch:
```py
@ -356,32 +248,3 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> model.config.id2label[predicted_class_id]
'POSITIVE'
```
</pt>
<tf>
قم بتحليل النص وإرجاع تنسيقات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_model")
>>> inputs = tokenizer(text, return_tensors="tf")
```
قم بتمرير مدخلاتك إلى النموذج وإرجاع `logits`:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("stevhliu/my_awesome_model")
>>> logits = model(**inputs).logits
```
استخرج الفئة ذات الاحتمالية الأعلى، واستخدم `id2label` لتحويلها إلى تصنيف نصي:
```py
>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
>>> model.config.id2label[predicted_class_id]
'POSITIVE'
```
</tf>
</frameworkcontent>

View File

@ -118,24 +118,12 @@ pip install transformers datasets evaluate rouge_score
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء عملية التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -170,8 +158,6 @@ pip install transformers datasets evaluate rouge_score
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -226,91 +212,6 @@ pip install transformers datasets evaluate rouge_score
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل التعلم وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل T5 باستخدام [`TFAutoModelForSeq2SeqLM`]:
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_billsum["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... tokenized_billsum["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة افتراضيًا، لذلك لست بحاجة إلى تحديد واحدة ما لم تكن ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب درجة ROUGE من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_test_set)
```
حدد مكان دفع نموذجك ومُحلِّلك اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_billsum_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع استدعاءاتك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -341,8 +242,6 @@ pip install transformers datasets evaluate rouge_score
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسم النص وإرجع `input_ids` كتنسورات PyTorch:
```py
@ -367,31 +266,3 @@ pip install transformers datasets evaluate rouge_score
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
```
</pt>
<tf>
قسم النص وإرجع `input_ids` كتنسورات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_billsum_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء التلخيص. لمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والمعلمات للتحكم في التوليد، راجع واجهة برمجة تطبيقات [توليد النص](../main_classes/text_generation).
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("username/my_awesome_billsum_model")
>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)
```
فك تشفير معرفات الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
```
</tf>
</frameworkcontent>

View File

@ -151,22 +151,11 @@ pip install transformers datasets evaluate seqeval
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`].من الأفضل استخدام *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بالكامل إلى الطول الأقصى.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForTokenClassification
>>> data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForTokenClassification
>>> data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم(Evaluate)
@ -246,8 +235,6 @@ pip install transformers datasets evaluate seqeval
... }
```
<frameworkcontent>
<pt>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
@ -302,101 +289,6 @@ pip install transformers datasets evaluate seqeval
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام Keras، ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
للتعديل على نموذج في TensorFlow، ابدأ بإعداد دالة محسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_train_epochs = 3
>>> num_train_steps = (len(tokenized_wnut["train"]) // batch_size) * num_train_epochs
>>> optimizer, lr_schedule = create_optimizer(
... init_lr=2e-5,
... num_train_steps=num_train_steps,
... weight_decay_rate=0.01,
... num_warmup_steps=0,
... )
```
ثم يمكنك تحميل DistilBERT مع [`TFAutoModelForTokenClassification`] إلى جانب عدد التسميات المتوقعة، وتخطيطات التسميات:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained(
... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` مع [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_wnut["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_wnut["validation"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
هيّئ النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers تتضمن دالة خسارة افتراضية مرتبطة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب درجات seqeval من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك والمحلل اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_wnut_model",
... tokenizer=tokenizer,
... )
```
ثم جمّع callbacks الخاصة بك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز الآن لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع بيانات التدريب والتحقق، وعدد الحقبات، وcallbacks لتعديل النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -457,8 +349,6 @@ pip install transformers datasets evaluate seqeval
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسّم النص إلى رموز وأرجع المُوتّرات بلغة PyTorch:
```py
@ -502,49 +392,3 @@ pip install transformers datasets evaluate seqeval
'O',
'O']
```
</pt>
<tf>
قسّم النص إلى رموز وأرجع المُوتّرات ب TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_wnut_model")
>>> inputs = tokenizer(text, return_tensors="tf")
```
مرر مدخلاتك إلى النموذج واحصل على `logits`:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("stevhliu/my_awesome_wnut_model")
>>> logits = model(**inputs).logits
```
استخرج الفئة ذات الاحتمالية الأعلى، واستخدم جدول `id2label` الخاصة بالنموذج لتحويلها إلى تسمية نصية:
```py
>>> predicted_token_class_ids = tf.math.argmax(logits, axis=-1)
>>> predicted_token_class = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()]
>>> predicted_token_class
['O',
'O',
'B-location',
'I-location',
'B-group',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'B-location',
'B-location',
'O',
'O']
```
</tf>
</frameworkcontent>

View File

@ -113,24 +113,12 @@ pip install transformers datasets evaluate sacrebleu
الآن أنشئ دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. من الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -177,8 +165,6 @@ pip install transformers datasets evaluate sacrebleu
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -233,91 +219,6 @@ pip install transformers datasets evaluate sacrebleu
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل تعلم وبعض المعلمات الفائقة للتدريب:
```py
>>> from transformers import AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل T5 باستخدام [`TFAutoModelForSeq2SeqLM`]:
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_books["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... tokenized_books["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب مقياس SacreBLEU من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_test_set)
```
حدد مكان دفع نموذجك ومعالجك اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_opus_books_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع استدعاءاتك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -351,8 +252,6 @@ pip install transformers datasets evaluate sacrebleu
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قم بتحويل النص إلى رموز وإرجاع `input_ids` كموترات PyTorch:
```py
@ -377,31 +276,3 @@ pip install transformers datasets evaluate sacrebleu
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lignées partagent des ressources avec des bactéries enfixant l'azote.'
```
</pt>
<tf>
قم بتحويل النص إلى رموز وإرجاع `input_ids` كموترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_opus_books_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الترجمة. لمزيد من التفاصيل حول استراتيجيات توليد النصوص المختلفة والمعلمات للتحكم في التوليد، تحقق من واجهة برمجة تطبيقات [توليد النصوص](../main_classes/text_generation).
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("username/my_awesome_opus_books_model")
>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
```
فك تشفير معرفات الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lugumes partagent les ressources avec des bactéries fixatrices d'azote.'
```
</tf>
</frameworkcontent>

View File

@ -1,40 +0,0 @@
# التصدير إلى TFLite
[TensorFlow Lite](https://www.tensorflow.org/lite/guide) هو إطار عمل خفيف الوزن لنشر نماذج التعلم الآلي على الأجهزة المحدودة الموارد، مثل الهواتف المحمولة، والأنظمة المدمجة، وأجهزة إنترنت الأشياء (IoT). تم تصميم TFLite لتشغيل النماذج وتحسينها بكفاءة على هذه الأجهزة ذات الطاقة الحاسوبية والذاكرة واستهلاك الطاقة المحدودة.
يُمثَّل نموذج TensorFlow Lite بتنسيق محمول فعال خاص يُعرَّف بامتداد الملف `.tflite`.
🤗 Optimum يقدم وظيفة لتصدير نماذج 🤗 Transformers إلى TFLite من خلال الوحدة النمطية `exporters.tflite`. بالنسبة لقائمة هندسات النماذج المدعومة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/tflite/overview).
لتصدير نموذج إلى TFLite، قم بتثبيت متطلبات البرنامج المطلوبة:
```bash
pip install optimum[exporters-tf]
```
للاطلاع على جميع المغامﻻت المتاحة، راجع [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model)، أو عرض المساعدة في سطر الأوامر:
```bash
optimum-cli export tflite --help
```
لتصدير نسخة النموذج ل 🤗 Hub، على سبيل المثال، `google-bert/bert-base-uncased`، قم بتشغيل الأمر التالي:
```bash
optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
```
ستظهر لك السجلات التي تُبيّن التقدم وموقع حفظ ملف `model.tflite` الناتج، كما في المثال التالي:
```bash
Validating TFLite model...
-[] TFLite model output names match reference model (logits)
- Validating TFLite Model output "logits":
-[] (1, 128, 30522) matches (1, 128, 30522)
-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
- logits: max diff = 5.817413330078125e-05.
The exported model was saved at: bert_tflite
```
يُبيّن المثال أعلاه كيفية تصدير نسخة من النموذج ل 🤗 Hub. عند تصدير نموذج محلي، تأكد أولاً من حفظ ملفات أوزان النموذج المجزء اللغوى في نفس المسار (`local_path`). عند استخدام CLI، قم بتمرير `local_path` إلى معامل `model` بدلاً من اسم النسخة على 🤗 Hub.

View File

@ -611,7 +611,6 @@ accelerate launch \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
يمكنك أيضًا تحديد المعلمات من ملف `config_file.yaml` مباشرة في سطر الأوامر:
@ -634,7 +633,6 @@ accelerate launch --num_processes=2 \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
اطلع على برنامج تعليمي [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) لمعرفة المزيد حول `accelerate_launch` والتكوينات المخصصة.

View File

@ -58,8 +58,6 @@
في شريط التنقل الأيمن للقفز إلى الإطار الذي تريده - وإذا كنت تريد إخفاء كل المحتوى لإطار معين،
فاستخدم الزر في الركن العلوي الأيمن من كتلة الإطار!
<frameworkcontent>
<pt>
<Youtube id="nvBXf7s7vTI"/>
## التدريب باستخدام PyTorch Trainer
@ -139,124 +137,10 @@
```py
>>> trainer.train()
```
</pt>
<tf>
<a id='keras'></a>
<Youtube id="rnTGBy2ax1c"/>
## تدريب نموذج TensorFlow باستخدام Keras
يمكنك أيضًا تدريب نماذج 🤗 Transformers في TensorFlow باستخدام واجهة برمجة تطبيقات Keras!
### تحميل البيانات لـ Keras
عندما تريد تدريب نموذج 🤗 Transformers باستخدام واجهة برمجة تطبيقات Keras، فأنت بحاجة إلى تحويل مجموعة البيانات الخاصة بك إلى تنسيق يفهمه
Keras. إذا كانت مجموعة البيانات الخاصة بك صغيرة، فيمكنك ببساطة تحويلها بالكامل إلى مصفوفات NumPy وإرسالها إلى Keras.
دعونا نجرب ذلك أولاً قبل أن نقوم بأي شيء أكثر تعقيدًا.
أولاً، قم بتحميل مجموعة بيانات. سنستخدم مجموعة بيانات CoLA من معيار [GLUE benchmark](https://huggingface.co/datasets/glue
نظرًا لأنه مهمة تصنيف نص ثنائي بسيطة، وسنأخذ فقط قسم التدريب الآن.
```py
from datasets import load_dataset
dataset = load_dataset("glue"، "cola")
dataset = dataset ["train"] # خذ فقط قسم التدريب الآن
```
بعد ذلك، قم بتحميل أداة المُجزّئ اللغوي وقم بترميز البيانات كمصفوفات NumPy. لاحظ أن التصنيفات هي بالفعل قائمة من 0 و 1،
لذا يمكننا ببساطة تحويل ذلك مباشرة إلى مصفوفة NumPy بدون ترميز!
```py
from transformers import AutoTokenizer
import numpy as np
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
```
أخيرًا، قم بتحميل وتجميع وتناسب النموذج. لاحظ أن نماذج Transformers تحتوي جميعها على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذا فأنت لست بحاجة إلى تحديد واحدة ما لم ترغب في ذلك:
```py
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# تحميل وتجميع النموذج الخاص بنا
model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# معدلات التعلم المنخفضة أفضل غالبًا لضبط النماذج الدقيقة
model.compile(optimizer=Adam(3e-5)) # لا توجد دالة خسارة!
model.fit(tokenized_data, labels)
```
<Tip>
أنت لست مضطرًا لتمرير دالة خسارة إلى نماذجك عند تجميعها! تختار نماذج Hugging Face تلقائيًا
دالة خسارة مناسبة لمهمتها وهندسة نموذجها إذا تُركت هذه الحجة فارغة. يمكنك دائمًا
تجاوز ذلك عن طريق تحديد دالة خسارة بنفسك إذا كنت تريد ذلك!
</Tip>
يعمل هذا النهج بشكل رائع لمجموعات البيانات الصغيرة، ولكن بالنسبة لمجموعات البيانات الأكبر، فقد تجد أنه يصبح مشكلة. لماذا؟
لأن المصفوفة المرمزة والتصنيفات يجب أن يتم تحميلها بالكامل في الذاكرة، ولأن NumPy لا يتعامل مع
المصفوفات"غير المنتظمة"، لذا حشو كل عينة إلى طول أطول عينة في مجموعة البيانات بأكملها. سيؤدي ذلك إلى زيادة حجم المصفوفة لديك، وستبطئ الرموز الزائده من عملية التدريب أيضًا!
### تحميل البيانات كـ tf.data.Dataset
إذا كنت تريد تجنب إبطاء التدريب، فيمكنك تحميل بياناتك كـ `tf.data.Dataset` بدلاً من ذلك. على الرغم من أنه يمكنك كتابة خط أنابيب `tf.data` الخاص بك إذا كنت تريد، إلا أن لدينا طريقتين مختصرتين للقيام بذلك:
- [`~TFPreTrainedModel.prepare_tf_dataset`]: هذه هي الطريقة التي نوصي بها في معظم الحالات. نظرًا لأنه طريقة
على نموذجك، فيمكنه فحص النموذج لتحديد الأعمدة القابلة للاستخدام كمدخلات للنموذج تلقائيًا،
واستبعاد الأعمدة الأخرى لإنشاء مجموعة بيانات أبسط وأكثر كفاءة.
- [`~datasets.Dataset.to_tf_dataset`]: هذه الطريقة أكثر أساسية، وهي مفيدة عندما تريد التحكم بدقة في كيفية
إنشاء مجموعة البيانات الخاصة بك، عن طريق تحديد أعمدة `columns` و `label_cols` المحددة التي سيتم تضمينها.
قبل أن تتمكن من استخدام [`~TFPreTrainedModel.prepare_tf_dataset`]، ستحتاج إلى إضافة مخرجات المُجزئ إلى مجموعة البيانات الخاصة بك كأعمدة، كما هو موضح في
عينة التعليمات البرمجية التالية:
```py
def tokenize_dataset (data):
# ستتم إضافة مفاتيح القاموس الذي تمت إعادته كأعمدة إلى مجموعة البيانات
return tokenizer(data["text"])
dataset = dataset.map(tokenize_dataset)
```
تذكر أن مجموعات بيانات Hugging Face يتم تخزينها على القرص بشكل افتراضي، لذا فلن يؤدي ذلك إلى تضخيم استخدام الذاكرة لديك! بمجرد إضافة الأعمدة، يمكنك بث الدفعات من مجموعة البيانات وإضافة الترميز إلى كل دفعة، مما يقلل بشكل كبير من عدد رموز الترقيم مقارنة بترميز مجموعة البيانات بأكملها.
```py
>>> tf_dataset = model.prepare_tf_dataset(dataset["train"], batch_size=16, shuffle=True, tokenizer=tokenizer)
```
لاحظ أنه في عينة التعليمات البرمجية أعلاه، تحتاج إلى تمرير المُجزئ اللغوي إلى `prepare_tf_dataset` حتى تتمكن من حشو الدُفعات بشكل صحيح أثناء تحميلها.
إذا كانت جميع العينات في مجموعة البيانات الخاصة بك بنفس الطول ولم يكن الترميز ضروريًا، فيمكنك تخطي هذا المعامل.
إذا كنت بحاجة إلى القيام بشيء أكثر تعقيدًا من مجرد ترميز العينات (على سبيل المثال، إفساد الرموز للنمذجة اللغوية المُقنعة)،
فيمكنك استخدام معامل `collate_fn` بدلاً من ذلك لتمرير دالة يتم استدعاؤها لتحويل
قائمة العينات إلى دفعة وتطبيق أي معالجة مسبقة تريدها. راجع أمثلةنا [examples](https://github.com/huggingface/transformers/tree/main/examples) أو
[دفاتر الملاحظات](https://huggingface.co/docs/transformers/notebooks) لرؤية هذا النهج في العمل.
بمجرد إنشاء `tf.data.Dataset`، يمكنك تجميع النموذج وتناسبه كما هو الحال من قبل:
```py
model.compile(optimizer=Adam(3e-5)) # No loss argument!
model.fit(tf_dataset)
```
</tf>
</frameworkcontent>
<a id='pytorch_native'></a>
## تدريب في PyTorch الأصلي
<frameworkcontent>
<pt>
<Youtube id="Dh9CL8fyG80"/>
[`Trainer`] يهتم بحلقة التدريب ويسمح لك بضبط نموذج في سطر واحد من التعليمات البرمجية. بالنسبة للمستخدمين الذين يفضلون كتابة حلقة التدريب الخاصة بهم، يمكنك أيضًا ضبط نموذج 🤗 Transformers في PyTorch الأصلي.
@ -397,8 +281,6 @@ torch.cuda.empty_cache()
>>> metric.compute()
```
</pt>
</frameworkcontent>
<a id='additional-resources'></a>
@ -409,4 +291,4 @@ torch.cuda.empty_cache()
- [🤗 أمثلة المحولات](https://github.com/huggingface/transformers/tree/main/examples) تتضمن
النصوص البرمجية لتدريب مهام NLP الشائعة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.

View File

@ -53,7 +53,7 @@ Lassen Sie uns daher ein wenig tiefer in das allgemeine Design der Bibliothek ei
### Überblick über die Modelle
Um ein Modell erfolgreich hinzuzufügen, ist es wichtig, die Interaktion zwischen Ihrem Modell und seiner Konfiguration zu verstehen,
[`PreTrainedModel`] und [`PretrainedConfig`]. Als Beispiel werden wir
[`PreTrainedModel`] und [`PreTrainedConfig`]. Als Beispiel werden wir
das Modell, das zu 🤗 Transformers hinzugefügt werden soll, `BrandNewBert` nennen.
Schauen wir uns das mal an:
@ -81,10 +81,10 @@ model.config # model has access to its config
```
Ähnlich wie das Modell erbt die Konfiguration grundlegende Serialisierungs- und Deserialisierungsfunktionalitäten von
[`PretrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
[`PreTrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
unterschiedliche Formate serialisiert werden - das Modell in eine *pytorch_model.bin* Datei und die Konfiguration in eine *config.json* Datei. Aufruf von
[`~PreTrainedModel.save_pretrained`] wird automatisch
[`~PretrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
[`~PreTrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
### Code-Stil

View File

@ -81,8 +81,6 @@ Laden Sie einen Prozessor mit [`AutoProcessor.from_pretrained`]:
## AutoModel
<frameworkcontent>
<pt>
Mit den `AutoModelFor`-Klassen können Sie schließlich ein vortrainiertes Modell für eine bestimmte Aufgabe laden (siehe [hier](model_doc/auto) für eine vollständige Liste der verfügbaren Aufgaben). Laden Sie zum Beispiel ein Modell für die Sequenzklassifikation mit [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -108,24 +106,3 @@ TensorFlow- und Flax-Checkpoints sind nicht betroffen und können in PyTorch-Arc
</Tip>
Im Allgemeinen empfehlen wir die Verwendung der Klasse "AutoTokenizer" und der Klasse "AutoModelFor", um trainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nächsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz für die Feinabstimmung vorzuverarbeiten.
</pt>
<tf>
Mit den Klassen `TFAutoModelFor` schließlich können Sie ein vortrainiertes Modell für eine bestimmte Aufgabe laden (siehe [hier](model_doc/auto) für eine vollständige Liste der verfügbaren Aufgaben). Laden Sie zum Beispiel ein Modell für die Sequenzklassifikation mit [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Sie können denselben Prüfpunkt problemlos wiederverwenden, um eine Architektur für eine andere Aufgabe zu laden:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Im Allgemeinen empfehlen wir, die Klasse "AutoTokenizer" und die Klasse "TFAutoModelFor" zu verwenden, um vortrainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nächsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz für die Feinabstimmung vorzuverarbeiten.
</tf>
</frameworkcontent>

View File

@ -78,10 +78,10 @@ Wenn Sie an der grundlegenden Verwendung von LLMs interessiert sind, ist unsere
Zunächst müssen Sie das Modell laden.
```py
>>> from transformers import AutoModelForCausalLM
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -119,12 +119,12 @@ Und das war's! Mit ein paar Zeilen Code können Sie sich die Macht eines LLM zun
Es gibt viele [Generierungsstrategien](generation_strategies), und manchmal sind die Standardwerte für Ihren Anwendungsfall vielleicht nicht geeignet. Wenn Ihre Ausgaben nicht mit dem übereinstimmen, was Sie erwarten, haben wir eine Liste der häufigsten Fallstricke erstellt und wie Sie diese vermeiden können.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_7b")
>>> tokenizer.pad_token = tokenizer.eos_token # Llama has no pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```

View File

@ -79,43 +79,15 @@ Um sicherzustellen, dass Ihr Modell von jemandem verwendet werden kann, der mit
Die Konvertierung eines Checkpoints für ein anderes Framework ist einfach. Stellen Sie sicher, dass Sie PyTorch und TensorFlow installiert haben (siehe [hier](installation) für Installationsanweisungen), und finden Sie dann das spezifische Modell für Ihre Aufgabe in dem anderen Framework.
<frameworkcontent>
<pt>
Geben Sie `from_tf=True` an, um einen Prüfpunkt von TensorFlow nach PyTorch zu konvertieren:
```py
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
```
</pt>
<tf>
Geben Sie `from_pt=True` an, um einen Prüfpunkt von PyTorch nach TensorFlow zu konvertieren:
```py
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
Dann können Sie Ihr neues TensorFlow-Modell mit seinem neuen Checkpoint speichern:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
```
</tf>
<jax>
Wenn ein Modell in Flax verfügbar ist, können Sie auch einen Kontrollpunkt von PyTorch nach Flax konvertieren:
```py
>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
... "path/to/awesome-name-you-picked", from_pt=True
... )
```
</jax>
</frameworkcontent>
## Ein Modell während des Trainings hochladen
<frameworkcontent>
<pt>
<Youtube id="Z1-XMy-GNLQ"/>
Die Weitergabe eines Modells an den Hub ist so einfach wie das Hinzufügen eines zusätzlichen Parameters oder Rückrufs. Erinnern Sie sich an das [Feinabstimmungs-Tutorial](training), in der Klasse [`TrainingArguments`] geben Sie Hyperparameter und zusätzliche Trainingsoptionen an. Eine dieser Trainingsoptionen beinhaltet die Möglichkeit, ein Modell direkt an den Hub zu pushen. Setzen Sie `push_to_hub=True` in Ihrer [`TrainingArguments`]:
@ -141,29 +113,6 @@ Nach der Feinabstimmung Ihres Modells rufen Sie [`~transformers.Trainer.push_to_
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
Geben Sie ein Modell mit [`PushToHubCallback`] an den Hub weiter. In der [`PushToHubCallback`] Funktion, fügen Sie hinzu:
- Ein Ausgabeverzeichnis für Ihr Modell.
- Einen Tokenizer.
- Die `hub_model_id`, die Ihr Hub-Benutzername und Modellname ist.
```py
>>> from transformers import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
... )
```
Fügen Sie den Callback zu [`fit`](https://keras.io/api/models/model_training_apis/) hinzu, und 🤗 Transformers wird das trainierte Modell an den Hub weiterleiten:
```py
>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
```
</tf>
</frameworkcontent>
## Verwenden Sie die Funktion `push_to_hub`.
@ -229,4 +178,4 @@ Um sicherzustellen, dass die Benutzer die Fähigkeiten, Grenzen, möglichen Verz
* Manuelles Erstellen und Hochladen einer "README.md"-Datei.
* Klicken Sie auf die Schaltfläche **Modellkarte bearbeiten** in Ihrem Modell-Repository.
Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert/distilbert-base-uncased) als gutes Beispiel für die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details über andere Optionen, die Sie in der Datei "README.md" einstellen können, wie z.B. den Kohlenstoff-Fußabdruck eines Modells oder Beispiele für Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards).
Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert/distilbert-base-uncased) als gutes Beispiel für die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details über andere Optionen, die Sie in der Datei "README.md" einstellen können, wie z.B. den Kohlenstoff-Fußabdruck eines Modells oder Beispiele für Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards).

View File

@ -153,8 +153,6 @@ Schließlich möchten Sie, dass der Tokenizer die tatsächlichen Tensoren zurüc
Setzen Sie den Parameter `return_tensors` entweder auf `pt` für PyTorch, oder `tf` für TensorFlow:
<frameworkcontent>
<pt>
```py
>>> batch_sentences = [
@ -174,32 +172,6 @@ Setzen Sie den Parameter `return_tensors` entweder auf `pt` für PyTorch, oder `
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
</pt>
<tf>
```py
>>> batch_sentences = [
... "But what about second breakfast?",
... "Don't think he knows about second breakfast, Pip.",
... "What about elevensies?",
... ]
>>> encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(encoded_input)
{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0],
[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
[101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}
```
</tf>
</frameworkcontent>
## Audio

View File

@ -66,20 +66,10 @@ Im folgenden Beispiel werden Sie die [`pipeline`] für die Stimmungsanalyse verw
Installieren Sie die folgenden Abhängigkeiten, falls Sie dies nicht bereits getan haben:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
Importieren sie die [`pipeline`] und spezifizieren sie die Aufgabe, welche sie lösen möchten:
@ -154,8 +144,6 @@ Die [`pipeline`] kann jedes Modell aus dem [Model Hub](https://huggingface.co/mo
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `AutoClass` below):
```py
@ -164,18 +152,6 @@ Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `TFAutoClass` below):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
Dann können Sie das Modell und den Tokenizer in der [`pipeline`] angeben und den `Klassifikator` auf Ihren Zieltext anwenden:
@ -226,8 +202,6 @@ Der Tokenizer gibt ein Wörterbuch zurück, das Folgendes enthält:
Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Darüber hinaus kann der Tokenizer den Text auch auffüllen und kürzen, um einen Stapel mit einheitlicher Länge zurückzugeben:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
@ -238,27 +212,11 @@ Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Dar
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
Lesen Sie das Tutorial [preprocessing](./preprocessing) für weitere Details zur Tokenisierung.
### AutoModel
<frameworkcontent>
<pt>
🤗 Transformers bietet eine einfache und einheitliche Möglichkeit, vortrainierte Instanzen zu laden. Das bedeutet, dass Sie ein [`AutoModel`] laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`AutoModel`] für die Aufgabe. Da Sie eine Text- oder Sequenzklassifizierung vornehmen, laden Sie [`AutoModelForSequenceClassification`]:
```py
@ -290,39 +248,6 @@ Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wen
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
🤗 Transformers bietet eine einfache und einheitliche Methode zum Laden von vortrainierten Instanzen. Das bedeutet, dass Sie ein [`TFAutoModel`] genauso laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`TFAutoModel`] für die Aufgabe. Da Sie Text - oder Sequenz - Klassifizierung machen, laden Sie [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben, indem Sie die Wörterbuchschlüssel direkt an die Tensoren übergeben:
```py
>>> tf_outputs = tf_model(tf_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
@ -342,8 +267,6 @@ Die Modellausgänge verhalten sich auch wie ein Tupel oder ein Wörterbuch (z.B.
### Modell speichern
<frameworkcontent>
<pt>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer speichern, indem Sie [`PreTrainedModel.save_pretrained`] verwenden:
```py
@ -357,28 +280,9 @@ Wenn Sie bereit sind, das Modell erneut zu verwenden, laden Sie es mit [`PreTrai
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer unter Verwendung von [`TFPreTrainedModel.save_pretrained`] speichern:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
Wenn Sie bereit sind, das Modell wieder zu verwenden, laden Sie es mit [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell zu speichern und es entweder als PyTorch- oder TensorFlow-Modell wieder zu laden. Der Parameter "from_pt" oder "from_tf" kann das Modell von einem Framework in das andere konvertieren:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
@ -386,17 +290,6 @@ Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</tf>
</frameworkcontent>
## Custom model builds
@ -410,8 +303,6 @@ Beginnen Sie mit dem Import von [`AutoConfig`] und laden Sie dann das trainierte
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:
```py
@ -419,17 +310,6 @@ Create a model from your custom configuration with [`AutoModel.from_config`]:
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
Weitere Informationen zur Erstellung von benutzerdefinierten Konfigurationen finden Sie in der Anleitung [Erstellen einer benutzerdefinierten Architektur](./create_a_model).

View File

@ -85,8 +85,6 @@ pip install -r requirements.txt
## Ein Skript ausführen
<frameworkcontent>
<pt>
Das Beispielskript lädt einen Datensatz aus der 🤗 [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Dann nimmt das Skript eine Feinabstimmung eines Datensatzes mit dem [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) auf einer Architektur vor, die eine Zusammenfassung unterstützt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem Datensatz [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) durchgeführt wird. Das T5-Modell benötigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusätzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiß T5, dass es sich um eine Zusammenfassungsaufgabe handelt.
```bash
@ -100,27 +98,8 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
Das Beispielskript lädt einen Datensatz aus der 🤗 [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Anschließend nimmt das Skript die Feinabstimmung eines Datensatzes mit Keras auf einer Architektur vor, die die Zusammenfassung unterstützt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) Datensatz durchgeführt wird. Das T5-Modell benötigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusätzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiß T5, dass es sich um eine Zusammenfassungsaufgabe handelt.
```bash
python examples/tensorflow/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## Verteiltes Training und gemischte Präzision
@ -142,7 +121,6 @@ torchrun \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -150,8 +128,6 @@ TensorFlow-Skripte verwenden eine [`MirroredStrategy`](https://www.tensorflow.or
## Ein Skript auf einer TPU ausführen
<frameworkcontent>
<pt>
Tensor Processing Units (TPUs) sind speziell für die Beschleunigung der Leistung konzipiert. PyTorch unterstützt TPUs mit dem [XLA](https://www.tensorflow.org/xla) Deep Learning Compiler (siehe [hier](https://github.com/pytorch/xla/blob/master/README.md) für weitere Details). Um eine TPU zu verwenden, starten Sie das Skript `xla_spawn.py` und verwenden das Argument `num_cores`, um die Anzahl der TPU-Kerne festzulegen, die Sie verwenden möchten.
```bash
@ -166,28 +142,8 @@ python xla_spawn.py --num_cores 8 \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
Tensor Processing Units (TPUs) sind speziell für die Beschleunigung der Leistung konzipiert. TensorFlow Skripte verwenden eine [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) für das Training auf TPUs. Um eine TPU zu verwenden, übergeben Sie den Namen der TPU-Ressource an das Argument `tpu`.
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## Führen Sie ein Skript mit 🤗 Accelerate aus.
@ -242,7 +198,6 @@ python examples/pytorch/summarization/run_summarization.py \
--summary_column summary_column_name \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--predict_with_generate
@ -270,7 +225,6 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -284,8 +238,6 @@ examples/pytorch/summarization/run_summarization.py -h
Eine weitere hilfreiche Option, die Sie aktivieren können, ist die Wiederaufnahme des Trainings von einem früheren Kontrollpunkt aus. Auf diese Weise können Sie im Falle einer Unterbrechung Ihres Trainings dort weitermachen, wo Sie aufgehört haben, ohne von vorne beginnen zu müssen. Es gibt zwei Methoden, um das Training von einem Kontrollpunkt aus wieder aufzunehmen.
Die erste Methode verwendet das Argument `output_dir previous_output_dir`, um das Training ab dem letzten in `output_dir` gespeicherten Kontrollpunkt wieder aufzunehmen. In diesem Fall sollten Sie `overwrite_output_dir` entfernen:
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
@ -297,24 +249,6 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--output_dir previous_output_dir \
--predict_with_generate
```
Die zweite Methode verwendet das Argument `Resume_from_checkpoint path_to_specific_checkpoint`, um das Training ab einem bestimmten Checkpoint-Ordner wieder aufzunehmen.
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--resume_from_checkpoint path_to_specific_checkpoint \
--predict_with_generate
```
@ -346,6 +280,5 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
```

View File

@ -73,8 +73,6 @@ An dieser Stelle sollten Sie dem Abschnitt folgen, der dem Rahmen entspricht, de
in der rechten Seitenleiste können Sie zu dem gewünschten Abschnitt springen - und wenn Sie den gesamten Inhalt eines bestimmten Frameworks ausblenden möchten,
klicken Sie einfach auf die Schaltfläche oben rechts im Block des jeweiligen Frameworks!
<frameworkcontent>
<pt>
<Youtube id="nvBXf7s7vTI"/>
## Trainieren mit PyTorch Trainer
@ -155,128 +153,11 @@ Anschließend können Sie Ihr Modell durch den Aufruf von [`~transformers.Traine
```py
>>> trainer.train()
```
</pt>
<tf>
<a id='keras'></a>
<Youtube id="rnTGBy2ax1c"/>
## Trainieren Sie ein TensorFlow-Modell mit Keras
Sie können auch 🤗 Transformers Modelle in TensorFlow mit der Keras API trainieren!
### Laden von Daten für Keras
Wenn Sie ein 🤗 Transformers Modell mit der Keras API trainieren wollen, müssen Sie Ihren Datensatz in ein Format konvertieren, das
Keras versteht. Wenn Ihr Datensatz klein ist, können Sie das Ganze einfach in NumPy-Arrays konvertieren und an Keras übergeben.
Probieren wir das zuerst aus, bevor wir etwas Komplizierteres tun.
Laden Sie zunächst ein Dataset. Wir werden den CoLA-Datensatz aus dem [GLUE-Benchmark](https://huggingface.co/datasets/glue) verwenden,
da es sich um eine einfache Aufgabe zur Klassifizierung von binärem Text handelt, und nehmen vorerst nur den Trainingssplit.
```py
from datasets import load_dataset
dataset = load_dataset("glue", "cola")
dataset = dataset["train"] # Just take the training split for now
```
Als nächstes laden Sie einen Tokenizer und tokenisieren die Daten als NumPy-Arrays. Beachten Sie, dass die Beschriftungen bereits eine Liste von 0 und 1en sind,
Wir können sie also ohne Tokenisierung direkt in ein NumPy-Array konvertieren!
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["text"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
```
Schließlich laden, [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) und [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) Sie das Modell:
```py
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))
model.fit(tokenized_data, labels)
```
<Tip>
Sie müssen Ihren Modellen kein Verlustargument übergeben, wenn Sie sie `compile()`! Hugging-Face-Modelle wählen automatisch
einen Loss, der für ihre Aufgabe und Modellarchitektur geeignet ist, wenn dieses Argument leer gelassen wird. Sie können jederzeit außer Kraft setzen, indem Sie selbst einen Loss angeben, wenn Sie das möchten!
</Tip>
Dieser Ansatz eignet sich hervorragend für kleinere Datensätze, aber bei größeren Datensätzen kann er zu einem Problem werden. Warum?
Weil das tokenisierte Array und die Beschriftungen vollständig in den Speicher geladen werden müssten, und weil NumPy nicht mit
"gezackte" Arrays nicht verarbeiten kann, so dass jedes tokenisierte Sample auf die Länge des längsten Samples im gesamten Datensatz aufgefüllt werden müsste.
Datensatzes aufgefüllt werden. Dadurch wird das Array noch größer, und all die aufgefüllten Token verlangsamen auch das Training!
### Laden von Daten als tf.data.Dataset
Wenn Sie eine Verlangsamung des Trainings vermeiden wollen, können Sie Ihre Daten stattdessen als `tf.data.Dataset` laden. Sie können zwar Ihre eigene
tf.data"-Pipeline schreiben können, wenn Sie wollen, haben wir zwei bequeme Methoden, um dies zu tun:
- [`~TFPreTrainedModel.prepare_tf_dataset`]: Dies ist die Methode, die wir in den meisten Fällen empfehlen. Da es sich um eine Methode
Ihres Modells ist, kann sie das Modell inspizieren, um automatisch herauszufinden, welche Spalten als Modelleingaben verwendet werden können, und
verwirft die anderen, um einen einfacheren, leistungsfähigeren Datensatz zu erstellen.
- [`~datasets.Dataset.to_tf_dataset`]: Diese Methode ist eher auf niedriger Ebene angesiedelt und ist nützlich, wenn Sie genau kontrollieren wollen, wie
Dataset erstellt wird, indem man genau angibt, welche `columns` und `label_cols` einbezogen werden sollen.
Bevor Sie [`~TFPreTrainedModel.prepare_tf_dataset`] verwenden können, müssen Sie die Tokenizer-Ausgaben als Spalten zu Ihrem Datensatz hinzufügen, wie in
dem folgenden Codebeispiel:
```py
def tokenize_dataset(data):
# Keys of the returned dictionary will be added to the dataset as columns
return tokenizer(data["text"])
dataset = dataset.map(tokenize_dataset)
```
Denken Sie daran, dass Hugging Face-Datensätze standardmäßig auf der Festplatte gespeichert werden, so dass dies nicht zu einem erhöhten Arbeitsspeicherbedarf führen wird! Sobald die
Spalten hinzugefügt wurden, können Sie Batches aus dem Datensatz streamen und zu jedem Batch Auffüllungen hinzufügen, was die Anzahl der Auffüllungs-Token im Vergleich zum Auffüllen des gesamten Datensatzes reduziert.
```py
>>> tf_dataset = model.prepare_tf_dataset(dataset, batch_size=16, shuffle=True, tokenizer=tokenizer)
```
Beachten Sie, dass Sie im obigen Codebeispiel den Tokenizer an `prepare_tf_dataset` übergeben müssen, damit die Stapel beim Laden korrekt aufgefüllt werden können.
Wenn alle Stichproben in Ihrem Datensatz die gleiche Länge haben und kein Auffüllen erforderlich ist, können Sie dieses Argument weglassen.
Wenn Sie etwas Komplexeres als nur das Auffüllen von Stichproben benötigen (z. B. das Korrumpieren von Token für die maskierte Sprachmodellierung), können Sie das Argument
Modellierung), können Sie stattdessen das Argument `collate_fn` verwenden, um eine Funktion zu übergeben, die aufgerufen wird, um die
Liste von Stichproben in einen Stapel umwandelt und alle gewünschten Vorverarbeitungen vornimmt. Siehe unsere
[examples](https://github.com/huggingface/transformers/tree/main/examples) oder
[notebooks](https://huggingface.co/docs/transformers/notebooks), um diesen Ansatz in Aktion zu sehen.
Sobald Sie einen `tf.data.Dataset` erstellt haben, können Sie das Modell wie zuvor kompilieren und anpassen:
```py
model.compile(optimizer=Adam(3e-5))
model.fit(tf_dataset)
```
</tf>
</frameworkcontent>
<a id='pytorch_native'></a>
## Trainieren in nativem PyTorch
<frameworkcontent>
<pt>
<Youtube id="Dh9CL8fyG80"/>
[`Trainer`] kümmert sich um die Trainingsschleife und ermöglicht die Feinabstimmung eines Modells in einer einzigen Codezeile. Für Benutzer, die es vorziehen, ihre eigene Trainingsschleife zu schreiben, können Sie auch eine Feinabstimmung eines 🤗 Transformers-Modells in nativem PyTorch vornehmen.
@ -418,8 +299,6 @@ Genauso wie Sie eine Bewertungsfunktion zu [`Trainer`] hinzugefügt haben, müss
>>> metric.compute()
```
</pt>
</frameworkcontent>
<a id='additional-resources'></a>
@ -430,4 +309,4 @@ Weitere Beispiele für die Feinabstimmung finden Sie unter:
- [🤗 Transformers Examples](https://github.com/huggingface/transformers/tree/main/examples) enthält Skripte
um gängige NLP-Aufgaben in PyTorch und TensorFlow zu trainieren.
- [🤗 Transformers Notebooks](notebooks) enthält verschiedene Notebooks zur Feinabstimmung eines Modells für bestimmte Aufgaben in PyTorch und TensorFlow.
- [🤗 Transformers Notebooks](notebooks) enthält verschiedene Notebooks zur Feinabstimmung eines Modells für bestimmte Aufgaben in PyTorch und TensorFlow.

View File

@ -199,6 +199,8 @@
title: HIGGS
- local: quantization/hqq
title: HQQ
- local: quantization/mxfp4
title: MXFP4
- local: quantization/optimum
title: Optimum
- local: quantization/quanto
@ -214,12 +216,15 @@
- local: quantization/contribute
title: Contribute
title: Quantization
- isExpanded: false
sections:
- local: kernel_doc/overview
title: Kernels in transformers
title: Kernels
- isExpanded: false
sections:
- local: serialization
title: ONNX
- local: tflite
title: LiteRT
- local: executorch
title: ExecuTorch
- local: torchscript
@ -305,6 +310,8 @@
title: Glossary
- local: philosophy
title: Philosophy
- local: models_timeline
title: Models Timeline
- local: notebooks
title: Notebooks with examples
- local: community
@ -334,16 +341,12 @@
title: Configuration
- local: main_classes/data_collator
title: Data Collator
- local: main_classes/keras_callbacks
title: Keras callbacks
- local: main_classes/logging
title: Logging
- local: main_classes/model
title: Models
- local: main_classes/text_generation
title: Text Generation
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules
title: Optimization
- local: main_classes/output
@ -370,6 +373,8 @@
title: Image Processor
- local: main_classes/video_processor
title: Video Processor
- local: main_classes/kernels
title: Kernels
title: Main Classes
- sections:
- sections:
@ -409,6 +414,8 @@
title: Blenderbot Small
- local: model_doc/bloom
title: BLOOM
- local: model_doc/blt
title: BLT
- local: model_doc/bort
title: BORT
- local: model_doc/byt5
@ -439,6 +446,8 @@
title: DeBERTa
- local: model_doc/deberta-v2
title: DeBERTa-v2
- local: model_doc/deepseek_v2
title: DeepSeek-V2
- local: model_doc/deepseek_v3
title: DeepSeek-V3
- local: model_doc/dialogpt
@ -483,6 +492,8 @@
title: FLAN-UL2
- local: model_doc/flaubert
title: FlauBERT
- local: model_doc/flex_olmo
title: FlexOlmo
- local: model_doc/fnet
title: FNet
- local: model_doc/fsmt
@ -551,12 +562,16 @@
title: LED
- local: model_doc/lfm2
title: LFM2
- local: model_doc/lfm2_moe
title: LFM2Moe
- local: model_doc/llama
title: LLaMA
- local: model_doc/llama2
title: Llama2
- local: model_doc/llama3
title: Llama3
- local: model_doc/longcat_flash
title: LongCatFlash
- local: model_doc/longformer
title: Longformer
- local: model_doc/longt5
@ -625,6 +640,8 @@
title: OLMo
- local: model_doc/olmo2
title: OLMo2
- local: model_doc/olmo3
title: Olmo3
- local: model_doc/olmoe
title: OLMoE
- local: model_doc/open-llama
@ -755,12 +772,6 @@
title: D-FINE
- local: model_doc/dab-detr
title: DAB-DETR
- local: model_doc/deepseek_v2
title: DeepSeek-V2
- local: model_doc/deepseek_vl
title: DeepseekVL
- local: model_doc/deepseek_vl_hybrid
title: DeepseekVLHybrid
- local: model_doc/deformable_detr
title: Deformable DETR
- local: model_doc/deit
@ -843,10 +854,16 @@
title: RT-DETR
- local: model_doc/rt_detr_v2
title: RT-DETRv2
- local: model_doc/sam2
title: SAM2
- local: model_doc/segformer
title: SegFormer
- local: model_doc/seggpt
title: SegGpt
- local: model_doc/sam
title: Segment Anything
- local: model_doc/sam_hq
title: Segment Anything High Quality
- local: model_doc/superglue
title: SuperGlue
- local: model_doc/superpoint
@ -925,6 +942,8 @@
title: MusicGen
- local: model_doc/musicgen_melody
title: MusicGen Melody
- local: model_doc/parakeet
title: Parakeet
- local: model_doc/pop2piano
title: Pop2Piano
- local: model_doc/seamless_m4t
@ -969,6 +988,8 @@
title: XLSR-Wav2Vec2
title: Audio models
- sections:
- local: model_doc/sam2_video
title: SAM2 Video
- local: model_doc/timesformer
title: TimeSformer
- local: model_doc/vjepa2
@ -1005,6 +1026,8 @@
title: CLIPSeg
- local: model_doc/clvp
title: CLVP
- local: model_doc/cwm
title: Code World Model (CWM)
- local: model_doc/cohere2_vision
title: Cohere2Vision
- local: model_doc/colpali
@ -1013,10 +1036,18 @@
title: ColQwen2
- local: model_doc/data2vec
title: Data2Vec
- local: model_doc/deepseek_vl
title: DeepseekVL
- local: model_doc/deepseek_vl_hybrid
title: DeepseekVLHybrid
- local: model_doc/deplot
title: DePlot
- local: model_doc/donut
title: Donut
- local: model_doc/edgetam
title: EdgeTAM
- local: model_doc/edgetam_video
title: EdgeTamVideo
- local: model_doc/emu3
title: Emu3
- local: model_doc/evolla
@ -1069,6 +1100,8 @@
title: LayoutLMV3
- local: model_doc/layoutxlm
title: LayoutXLM
- local: model_doc/lfm2_vl
title: LFM2-VL
- local: model_doc/lilt
title: LiLT
- local: model_doc/llama4
@ -1127,14 +1160,12 @@
title: Qwen2Audio
- local: model_doc/qwen2_vl
title: Qwen2VL
- local: model_doc/sam2
title: SAM2
- local: model_doc/sam2_video
title: SAM2 Video
- local: model_doc/sam
title: Segment Anything
- local: model_doc/sam_hq
title: Segment Anything High Quality
- local: model_doc/qwen3_omni_moe
title: Qwen3-Omni-MoE
- local: model_doc/qwen3_vl
title: Qwen3VL
- local: model_doc/qwen3_vl_moe
title: Qwen3VLMoe
- local: model_doc/shieldgemma2
title: ShieldGemma2
- local: model_doc/siglip

View File

@ -69,7 +69,6 @@ CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
Only GPUs 0 and 2 are "visible" to PyTorch and are mapped to `cuda:0` and `cuda:1` respectively.
To reverse the order (use GPU 2 as `cuda:0` and GPU 0 as `cuda:1`):
```bash
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
```
@ -108,7 +107,6 @@ To reverse the order (use XPU 2 as `xpu:0` and XPU 0 as `xpu:1`):
ZE_AFFINITY_MASK=2,0 torchrun trainer-program.py ...
```
You can also control the order of Intel XPUs with:
```bash
@ -120,7 +118,5 @@ For more information about device enumeration and sorting on Intel XPU, please r
</hfoption>
</hfoptions>
> [!WARNING]
> Environment variables can be exported instead of being added to the command line. This is not recommended because it can be confusing if you forget how the environment variable was set up and you end up using the wrong accelerators. Instead, it is common practice to set the environment variable for a specific training run on the same command line.

View File

@ -51,7 +51,7 @@ This section describes how the model and configuration classes interact and the
### Model and configuration
All Transformers' models inherit from a base [`PreTrainedModel`] and [`PretrainedConfig`] class. The configuration is the models blueprint.
All Transformers' models inherit from a base [`PreTrainedModel`] and [`PreTrainedConfig`] class. The configuration is the models blueprint.
There is never more than two levels of abstraction for any model to keep the code readable. The example model here, BrandNewLlama, inherits from `BrandNewLlamaPreTrainedModel` and [`PreTrainedModel`]. It is important that a new model only depends on [`PreTrainedModel`] so that it can use the [`~PreTrainedModel.from_pretrained`] and [`~PreTrainedModel.save_pretrained`] methods.
@ -66,9 +66,9 @@ model = BrandNewLlamaModel.from_pretrained("username/brand_new_llama")
model.config
```
[`PretrainedConfig`] provides the [`~PretrainedConfig.from_pretrained`] and [`~PretrainedConfig.save_pretrained`] methods.
[`PreTrainedConfig`] provides the [`~PreTrainedConfig.from_pretrained`] and [`~PreTrainedConfig.save_pretrained`] methods.
When you use [`PreTrainedModel.save_pretrained`], it automatically calls [`PretrainedConfig.save_pretrained`] so that both the model and configuration are saved together.
When you use [`PreTrainedModel.save_pretrained`], it automatically calls [`PreTrainedConfig.save_pretrained`] so that both the model and configuration are saved together.
A model is saved to a `model.safetensors` file and a configuration is saved to a `config.json` file.
@ -278,7 +278,7 @@ Every Transformers model output should have a precision or error tolerance of *1
Here are some tips for an efficient debugging environment.
- To debug intermediate results, it depends on the machine learning framework the original model repository is using. For PyTorch, you should write a script to decompose the original model into smaller sub-components to retrieve the intermediate values. For TensorFlow, you may need to use [tf.print](https://www.tensorflow.org/api_docs/python/tf/print). For Flax, make sure the model is *not jitted* during the forward pass (refer to this GitHub [Issue](https://github.com/google/jax/issues/196) for more details).
- To debug intermediate results, it depends on the machine learning framework the original model repository is using. For PyTorch, you should write a script to decompose the original model into smaller sub-components to retrieve the intermediate values.
- It is faster to debug with a smaller pretrained checkpoint versus a larger checkpoint where the forward pass takes more than 10 seconds. If only large checkpoints are available, create a dummy model with randomly initialized weights and save those weights to compare against the Transformers implementation.

View File

@ -193,4 +193,4 @@ def custom_attention_mask(
It mostly works thanks to the `mask_function`, which is a `Callable` in the form of [torch's mask_mod functions](https://pytorch.org/blog/flexattention/), taking 4 indices as input and returning a boolean to indicate if this position should take part in the attention computation.
If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).

View File

@ -145,7 +145,6 @@ Arguments can also be passed directly to `@auto_docstring` for more control. Use
The `Returns` and `Examples` parts of the docstring can also be manually specified.
```python
MODEL_COMMON_CUSTOM_ARGS = r"""
common_arg_1 (`torch.Tensor`, *optional*, defaults to `default_value`):
@ -202,7 +201,6 @@ There are some rules for documenting different types of arguments and they're li
If a standard argument behaves differently in your model, then you can override it locally in a `r""" """` block. This local definition has a higher priority. For example, the `labels` argument is often customized per model and typically requires overriding.
- New or custom arguments should be documented within an `r""" """` block after the signature if it is a function or in the `__init__` method's docstring if it is a class.
```py
@ -212,9 +210,9 @@ There are some rules for documenting different types of arguments and they're li
This can span multiple lines.
```
* Include `type` in backticks.
* Add *optional* if the argument is not required or has a default value.
* Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
* Include `type` in backticks.
* Add *optional* if the argument is not required or has a default value.
* Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
These arguments can also be passed to `@auto_docstring` as a `custom_args` argument. It is used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
@ -294,7 +292,7 @@ The `@auto_docstring` decorator automatically generates docstrings by:
8. Unrolling kwargs typed with the unpack operator. For specific methods (defined in `UNROLL_KWARGS_METHODS`) or classes (defined in `UNROLL_KWARGS_CLASSES`), the decorator processes `**kwargs` parameters that are typed with `Unpack[KwargsTypedDict]`. It extracts the documentations from the `TypedDict` and adds each parameter to the function's docstring.
Currently only supported for [`FastImageProcessorKwargs`].
Currently only supported for [`ImagesKwargs`].
## Best practices

View File

@ -22,7 +22,7 @@ Higher-level computer visions tasks, such as object detection or image segmentat
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Backbone.png"/>
</div>
Load a backbone with [`~PretrainedConfig.from_pretrained`] and use the `out_indices` parameter to determine which layer, given by the index, to extract a feature map from.
Load a backbone with [`~PreTrainedConfig.from_pretrained`] and use the `out_indices` parameter to determine which layer, given by the index, to extract a feature map from.
```py
from transformers import AutoBackbone
@ -46,7 +46,7 @@ There are two ways to load a Transformers backbone, [`AutoBackbone`] and a model
<hfoptions id="backbone-classes">
<hfoption id="AutoBackbone">
The [AutoClass](./model_doc/auto) API automatically loads a pretrained vision model with [`~PretrainedConfig.from_pretrained`] as a backbone if it's supported.
The [AutoClass](./model_doc/auto) API automatically loads a pretrained vision model with [`~PreTrainedConfig.from_pretrained`] as a backbone if it's supported.
Set the `out_indices` parameter to the layer you'd like to get the feature map from. If you know the name of the layer, you could also use `out_features`. These parameters can be used interchangeably, but if you use both, make sure they refer to the same layer.

View File

@ -59,11 +59,9 @@ Refer to the table below to compare how caching improves efficiency.
| without caching | with caching |
|---|---|
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
| for each step, recompute all previous `K` and `V` | for each step, only compute current `K` and `V`
| attention cost per step is **quadratic** with sequence length | attention cost per step is **linear** with sequence length (memory grows linearly, but compute/token remains low) |
## Cache class
A basic KV cache interface takes a key and value tensor for the current token and returns the updated `K` and `V` tensors. This is internally managed by a model's `forward` method.
@ -85,7 +83,7 @@ When you use Transformers' [`Cache`] class, the self-attention module performs s
Caches are structured as a list of layers, where each layer contains a key and value cache. The key and value caches are tensors with the shape `[batch_size, num_heads, seq_len, head_dim]`.
Layers can be of different types (e.g. `DynamicLayer`, `StaticLayer`, `SlidingWindowLayer`), which mostly changes how sequence length is handled and how the cache is updated.
Layers can be of different types (e.g. `DynamicLayer`, `StaticLayer`, `StaticSlidingWindowLayer`), which mostly changes how sequence length is handled and how the cache is updated.
The simplest is a `DynamicLayer` that grows as more tokens are processed. The sequence length dimension (`seq_len`) increases with each new token:
@ -94,15 +92,16 @@ cache.layers[idx].keys = torch.cat([cache.layers[idx].keys, key_states], dim=-2)
cache.layers[idx].values = torch.cat([cache.layers[idx].values, value_states], dim=-2)
```
Other layer types like `StaticLayer` and `SlidingWindowLayer` have a fixed sequence length that is set when the cache is created. This makes them compatible with `torch.compile`. In the case of `SlidingWindowLayer`, existing tokens are shifted out of the cache when a new token is added.
Other layer types like `StaticLayer` and `StaticSlidingWindowLayer` have a fixed sequence length that is set when the cache is created. This makes them compatible with `torch.compile`. In the case of `StaticSlidingWindowLayer`, existing tokens are shifted out of the cache when a new token is added.
The example below demonstrates how to create a generation loop with [`DynamicCache`]. As discussed, the attention mask is a concatenation of past and current token values and `1` is added to the cache position for the next token.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache, infer_device
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache
from accelerate import Accelerator
device = f"{infer_device()}:0"
device = Accelerator().device
model_id = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device)
@ -138,17 +137,17 @@ The cache position tracks where to insert new tokens in the attention cache. It
Cache position is used internally for two purposes:
1. Selecting new tokens to process in the input sequence and ensuring only tokens that havent been cached yet are passed to the model's `forward`.
1. Selecting new tokens to process in the input sequence and ensuring only tokens that haven't been cached yet are passed to the model's `forward`.
2. Storing key/value pairs at the correct positions in the cache. This is especially important for fixed-size caches, that pre-allocates a specific cache length.
The generation loop usually takes care of the cache position, but if you're writing a custom generation method, it is important that cache positions are accurate since they are used to write and read key/value states into fixed slots.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache, infer_device
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache
from accelerate import Accelerator
device = f"{infer_device()}:0"
device = Accelerator().device
model_id = "meta-llama/Llama-2-7b-chat-hf"
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16, device_map=device)
@ -159,31 +158,3 @@ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, ret
generated_ids = model.generate(**inputs, use_cache=True, max_new_tokens=10)
```
## Legacy cache format
Before the [`Cache`] class, the cache used to be stored as a tuple of tuples of tensors. This format is dynamic because it grows as text is generated, similar to [`DynamicCache`].
The legacy format is essentially the same data structure but organized differently.
- It's a tuple of tuples, where each inner tuple contains the key and value tensors for a layer.
- The tensors have the same shape `[batch_size, num_heads, seq_len, head_dim]`.
- The format is less flexible and doesn't support features like quantization or offloading.
If your project depends on this legacy format, we recommend to convert to [`DynamicCache`] with [`~DynamicCache.from_legacy_cache`]. Note that legacy cache format is deprecated and not used anymore in `Transformers`. You can convert back to tuple format with [`DynamicCache.to_legacy_cache`] functions, which is helpful if you have custom logic for manipulating a cache in a specific format.
```py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, DynamicCache
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf", dtype=torch.float16, device_map="auto")
inputs = tokenizer("Hello, my name is", return_tensors="pt").to(model.device)
# `return_dict_in_generate=True` is required to return the cache and `return_legacy_cache` forces the returned cache
# in the legacy format
generation_outputs = model.generate(**inputs, return_dict_in_generate=True, return_legacy_cache=True, max_new_tokens=5)
cache = DynamicCache.from_legacy_cache(generation_outputs.past_key_values)
legacy_format_cache = cache.to_legacy_cache()
```

View File

@ -16,7 +16,7 @@ rendered properly in your Markdown viewer.
# Tool use
Chat models are commonly trained with support for "function-calling" or "tool-use". Tools are functions supplied by the user, which the model can choose to call as part of its response. For example, models could have access to a calculator tool to perform arithmetic without having to it internally.
Chat models are commonly trained with support for "function-calling" or "tool-use". Tools are functions supplied by the user, which the model can choose to call as part of its response. For example, models could have access to a calculator tool to perform arithmetic without having to perform the computation internally.
This guide will demonstrate how to define tools, how to pass them to a chat model, and how to handle the model's output when it calls a tool.
@ -29,12 +29,11 @@ the arguments, argument types, and function docstring are parsed in order to gen
Although passing Python functions is very convenient, the parser can only handle [Google-style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings)
docstrings. Refer to the examples below for how to format a tool-ready function.
```py
def get_current_temperature(location: str, unit: str):
"""
Get the current temperature at a location.
Args:
location: The location to get the temperature for, in the format "City, Country"
unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
@ -44,7 +43,7 @@ def get_current_temperature(location: str, unit: str):
def get_current_wind_speed(location: str):
"""
Get the current wind speed in km/h at a given location.
Args:
location: The location to get the wind speed for, in the format "City, Country"
"""
@ -103,7 +102,6 @@ Hold the call in the `tool_calls` key of an `assistant` message. This is the rec
> [!WARNING]
> Although `tool_calls` is similar to the OpenAI API, the OpenAI API uses a JSON string as its `tool_calls` format. This may cause errors or strange model behavior if used in Transformers, which expects a dict.
```py
tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
messages.append({"role": "assistant", "tool_calls": [{"type": "function", "function": tool_call}]})
@ -131,7 +129,6 @@ The temperature in Paris, France right now is 22°C.<|im_end|>
> Although the key in the assistant message is called `tool_calls`, in most cases, models only emit a single tool call at a time. Some older models emit multiple tool calls at the same time, but this is a
> significantly more complex process, as you need to handle multiple tool responses at once and disambiguate them, often using tool call IDs. Please refer to the model card to see exactly what format a model expects for tool calls.
## JSON schemas
Another way to define tools is by passing a [JSON schema](https://json-schema.org/learn/getting-started-step-by-step).
@ -147,7 +144,7 @@ from transformers.utils import get_json_schema
def multiply(a: float, b: float):
"""
A function that multiplies two numbers
Args:
a: The first number to multiply
b: The second number to multiply
@ -160,22 +157,22 @@ print(schema)
```json
{
"type": "function",
"type": "function",
"function": {
"name": "multiply",
"description": "A function that multiplies two numbers",
"name": "multiply",
"description": "A function that multiplies two numbers",
"parameters": {
"type": "object",
"type": "object",
"properties": {
"a": {
"type": "number",
"type": "number",
"description": "The first number to multiply"
},
},
"b": {
"type": "number",
"description": "The second number to multiply"
}
},
},
"required": ["a", "b"]
}
}
@ -187,7 +184,7 @@ We won't go into the details of JSON schema itself here, since it's already [ver
```py
# A simple function that takes no arguments
current_time = {
"type": "function",
"type": "function",
"function": {
"name": "current_time",
"description": "Get the current local time as a string.",
@ -203,18 +200,18 @@ multiply = {
'type': 'function',
'function': {
'name': 'multiply',
'description': 'A function that multiplies two numbers',
'description': 'A function that multiplies two numbers',
'parameters': {
'type': 'object',
'type': 'object',
'properties': {
'a': {
'type': 'number',
'description': 'The first number to multiply'
},
},
'b': {
'type': 'number', 'description': 'The second number to multiply'
}
},
},
'required': ['a', 'b']
}
}
@ -224,4 +221,4 @@ model_input = tokenizer.apply_chat_template(
messages,
tools = [current_time, multiply]
)
```
```

View File

@ -16,13 +16,13 @@ rendered properly in your Markdown viewer.
# Chat templates
The [chat basics](./conversations) guide covers how to store chat histories and generate text from chat models using [`TextGenerationPipeline`].
The [chat basics](./conversations) guide covers how to store chat histories and generate text from chat models using [`TextGenerationPipeline`].
This guide is intended for more advanced users, and covers the underlying classes and methods, as well as the key concepts for understanding what's actually going on when you chat with a model.
The critical insight needed to understand chat models is this: All causal LMs, whether chat-trained or not, continue a sequence of tokens. When causal LMs are trained, the training usually begins with "pre-training" on a huge corpus of text, which creates a "base" model.
These base models are then often "fine-tuned" for chat, which means training them on data that is formatted as a sequence of messages. The chat is still just a sequence of tokens, though! The list of `role` and `content` dictionaries that you pass
to a chat model get converted to a token sequence, often with control tokens like `<|user|>` or `<|assistant|>` or `<|end_of_message|>`, which allow the model to see the chat structure.
to a chat model get converted to a token sequence, often with control tokens like `<|user|>` or `<|assistant|>` or `<|end_of_message|>`, which allow the model to see the chat structure.
There are many possible chat formats, and different models may use different formats or control tokens, even if they were fine-tuned from the same base model!
Don't panic, though - you don't need to memorize every possible chat format in order to use chat models. Chat models come with **chat templates**, which indicate how they expect chats to be formatted.
@ -43,6 +43,7 @@ chat = [
tokenizer.apply_chat_template(chat, tokenize=False)
```
```md
<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]
```
@ -62,6 +63,7 @@ chat = [
tokenizer.apply_chat_template(chat, tokenize=False)
```
```md
<|user|>\nHello, how are you?</s>\n<|assistant|>\nI'm doing great. How can I help you today?</s>\n<|user|>\nI'd like to show off how chat templating works!</s>\n
```
@ -75,9 +77,9 @@ Mistral-7B-Instruct uses `[INST]` and `[/INST]` tokens to indicate the start and
The input to `apply_chat_template` should be structured as a list of dictionaries with `role` and `content` keys. The `role` key specifies the speaker, and the `content` key contains the message. The common roles are:
- `user` for messages from the user
- `assistant` for messages from the model
- `system` for directives on how the model should act (usually placed at the beginning of the chat)
- `user` for messages from the user
- `assistant` for messages from the model
- `system` for directives on how the model should act (usually placed at the beginning of the chat)
[`apply_chat_template`] takes this list and returns a formatted sequence. Set `tokenize=True` if you want to tokenize the sequence.
@ -110,6 +112,7 @@ Pass the tokenized chat to [`~GenerationMixin.generate`] to generate a response.
outputs = model.generate(tokenized_chat, max_new_tokens=128)
print(tokenizer.decode(outputs[0]))
```
```md
<|system|>
You are a friendly chatbot who always responds in the style of a pirate</s>
@ -121,13 +124,13 @@ Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopte
> [!WARNING]
> Some tokenizers add special `<bos>` and `<eos>` tokens. Chat templates should already include all the necessary special tokens, and adding additional special tokens is often incorrect or duplicated, hurting model performance. When you format text with `apply_chat_template(tokenize=False)`, make sure you set `add_special_tokens=False` if you tokenize later to avoid duplicating these tokens.
> This isnt an issue if you use `apply_chat_template(tokenize=True)`, which means it's usually the safer option!
> This isn't an issue if you use `apply_chat_template(tokenize=True)`, which means it's usually the safer option!
### add_generation_prompt
You may have noticed the [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) argument in the above examples.
You may have noticed the [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) argument in the above examples.
This argument adds tokens to the end of the chat that indicate the start of an `assistant` response. Remember: Beneath all the chat abstractions, chat models are still just language models that continue a sequence of tokens!
If you include tokens that tell it that it's now in an `assistant` response, it will correctly write a response, but if you don't include these tokens, the model may get confused and do something strange, like **continuing** the user's message instead of replying to it!
If you include tokens that tell it that it's now in an `assistant` response, it will correctly write a response, but if you don't include these tokens, the model may get confused and do something strange, like **continuing** the user's message instead of replying to it!
Let's see an example to understand what `add_generation_prompt` is actually doing. First, let's format a chat without `add_generation_prompt`:
@ -135,6 +138,7 @@ Let's see an example to understand what `add_generation_prompt` is actually doin
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
tokenized_chat
```
```md
<|im_start|>user
Hi there!<|im_end|>
@ -150,6 +154,7 @@ Now, let's format the same chat with `add_generation_prompt=True`:
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
tokenized_chat
```
```md
<|im_start|>user
Hi there!<|im_end|>
@ -163,7 +168,7 @@ Can I ask a question?<|im_end|>
When `add_generation_prompt=True`, `<|im_start|>assistant` is added at the end to indicate the start of an `assistant` message. This lets the model know an `assistant` response is next.
Not all models require generation prompts, and some models, like [Llama](./model_doc/llama), dont have any special tokens before the `assistant` response. In these cases, [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) has no effect.
Not all models require generation prompts, and some models, like [Llama](./model_doc/llama), don't have any special tokens before the `assistant` response. In these cases, [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) has no effect.
### continue_final_message
@ -182,14 +187,13 @@ model.generate(**formatted_chat)
```
> [!WARNING]
> You shouldnt use [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) and [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) together. The former adds tokens that start a new message, while the latter removes end of sequence tokens. Using them together returns an error.
[`TextGenerationPipeline`] sets [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) to `True` by default to start a new message. However, if the final message in the chat has the `assistant` role, it assumes the message is a prefill and switches to `continue_final_message=True`. This is because most models dont support multiple consecutive assistant messages. To override this behavior, explicitly pass the [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) argument to the pipeline.
> You shouldn't use [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) and [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) together. The former adds tokens that start a new message, while the latter removes end of sequence tokens. Using them together returns an error.
[`TextGenerationPipeline`] sets [add_generation_prompt](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.add_generation_prompt) to `True` by default to start a new message. However, if the final message in the chat has the `assistant` role, it assumes the message is a prefill and switches to `continue_final_message=True`. This is because most models don't support multiple consecutive assistant messages. To override this behavior, explicitly pass the [continue_final_message](https://huggingface.co/docs/transformers/internal/tokenization_utils#transformers.PreTrainedTokenizerBase.apply_chat_template.continue_final_message) argument to the pipeline.
## Model training
Training a model with a chat template is a good way to ensure the template matches the tokens the model was trained on. Apply the chat template as a preprocessing step to your dataset. Set `add_generation_prompt=False` because the additional tokens to prompt an assistant response arent helpful during training.
Training a model with a chat template is a good way to ensure the template matches the tokens the model was trained on. Apply the chat template as a preprocessing step to your dataset. Set `add_generation_prompt=False` because the additional tokens to prompt an assistant response aren't helpful during training.
An example of preprocessing a dataset with a chat template is shown below.
@ -212,6 +216,7 @@ dataset = Dataset.from_dict({"chat": [chat1, chat2]})
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
print(dataset['formatted_chat'][0])
```
```md
<|user|>
Which is bigger, the moon or the sun?</s>

View File

@ -18,8 +18,7 @@ rendered properly in your Markdown viewer.
Multimodal chat models accept inputs like images, audio or video, in addition to text. The `content` key in a multimodal chat history is a list containing multiple items of different types. This is unlike text-only chat models whose `content` key is a single string.
In the same way the [Tokenizer](./fast_tokenizer) class handles chat templates and tokenization for text-only models,
In the same way the [Tokenizer](./fast_tokenizer) class handles chat templates and tokenization for text-only models,
the [Processor](./processors) class handles preprocessing, tokenization and chat templates for multimodal models. Their [`~ProcessorMixin.apply_chat_template`] methods are almost identical.
This guide will show you how to chat with multimodal models with the high-level [`ImageTextToTextPipeline`] and at a lower level using the [`~ProcessorMixin.apply_chat_template`] and [`~GenerationMixin.generate`] methods.
@ -46,7 +45,7 @@ messages = [
]
```
Create an [`ImageTextToTextPipeline`] and pass the chat to it. For large models, setting [device_map=auto](./models#big-model-inference) helps load the model quicker and automatically places it on the fastest device available. Setting the data type to [auto](./models#model-data-type) also helps save memory and improve speed.
Create an [`ImageTextToTextPipeline`] and pass the chat to it. For large models, setting [device_map="auto"](./models#big-model-inference) helps load the model quicker and automatically places it on the fastest device available. Setting the data type to [auto](./models#model-data-type) also helps save memory and improve speed.
```python
import torch
@ -57,8 +56,7 @@ out = pipe(text=messages, max_new_tokens=128)
print(out[0]['generated_text'][-1]['content'])
```
```
```text
Ahoy, me hearty! These be two feline friends, likely some tabby cats, taking a siesta on a cozy pink blanket. They're resting near remote controls, perhaps after watching some TV or just enjoying some quiet time together. Cats sure know how to find comfort and relaxation, don't they?
```
@ -66,10 +64,9 @@ Aside from the gradual descent from pirate-speak into modern American English (i
## Using `apply_chat_template`
Like [text-only models](./chat_templating), use the [`~ProcessorMixin.apply_chat_template`] method to prepare the chat messages for multimodal models.
Like [text-only models](./chat_templating), use the [`~ProcessorMixin.apply_chat_template`] method to prepare the chat messages for multimodal models.
This method handles the tokenization and formatting of the chat messages, including images and other media types. The resulting inputs are passed to the model for generation.
```python
from transformers import AutoProcessor, AutoModelForImageTextToText
@ -99,8 +96,7 @@ processed_chat = processor.apply_chat_template(messages, add_generation_prompt=T
print(list(processed_chat.keys()))
```
```
```text
['input_ids', 'attention_mask', 'pixel_values', 'image_grid_thw']
```
@ -113,14 +109,13 @@ print(processor.decode(out[0]))
The decoded output contains the full conversation so far, including the user message and the placeholder tokens that contain the image information. You may need to trim the previous conversation from the output before displaying it to the user.
## Video inputs
Some vision models also support video inputs. The message format is very similar to the format for [image inputs](#image-inputs).
- The content `"type"` should be `"video"` to indicate the content is a video.
- For videos, it can be a link to the video (`"url"`) or it could be a file path (`"path"`). Videos loaded from a URL can only be decoded with [PyAV](https://pyav.basswood-io.com/docs/stable/) or [Decord](https://github.com/dmlc/decord).
- In addition to loading videos from a URL or file path, you can also pass decoded video data directly. This is useful if youve already preprocessed or decoded video frames elsewhere in memory (e.g., using OpenCV, decord, or torchvision). You don't need to save to files or store it in an URL.
- In addition to loading videos from a URL or file path, you can also pass decoded video data directly. This is useful if you've already preprocessed or decoded video frames elsewhere in memory (e.g., using OpenCV, decord, or torchvision). You don't need to save to files or store it in an URL.
> [!WARNING]
> Loading a video from `"url"` is only supported by the PyAV or Decord backends.
@ -148,6 +143,7 @@ messages = [
```
### Example: Passing decoded video objects
```python
import numpy as np
@ -167,7 +163,9 @@ messages = [
},
]
```
You can also use existing (`"load_video()"`) function to load a video, edit the video in memory and pass it in the messages.
```python
# Make sure a video backend library (pyav, decord, or torchvision) is available.
@ -200,7 +198,6 @@ Pass `messages` to [`~ProcessorMixin.apply_chat_template`] to tokenize the input
The `num_frames` parameter controls how many frames to uniformly sample from the video. Each checkpoint has a maximum frame count it was pretrained with and exceeding this count can significantly lower generation quality. It's important to choose a frame count that fits both the model capacity and your hardware resources. If `num_frames` isn't specified, the entire video is loaded without any frame sampling.
```python
processed_chat = processor.apply_chat_template(
messages,
@ -265,4 +262,3 @@ print(processed_chat.keys())
</hfoption>
</hfoptions>

View File

@ -18,7 +18,6 @@ rendered properly in your Markdown viewer.
A chat template is a [Jinja](https://jinja.palletsprojects.com/en/stable/templates/) template stored in the tokenizer's [chat_template](https://huggingface.co/docs/transformers/main_classes/tokenizer#transformers.PreTrainedTokenizer.chat_template) attribute. Jinja is a templating language that allows you to write Python-like code and syntax.
```jinja
{%- for message in messages %}
{{- '<|' + message['role'] + |>\n' }}
@ -30,8 +29,8 @@ A chat template is a [Jinja](https://jinja.palletsprojects.com/en/stable/templat
```
If you stare at this for a while, you should realize that this is actually very like Python, albeit with some strange
`{%-` syntax. The template iterates over a list of messages, and for each message, it prints the role and content of
the message, followed by an end-of-sequence token. If `add_generation_prompt=True`, it adds
`{%-` syntax. The template iterates over a list of messages, and for each message, it prints the role and content of
the message, followed by an end-of-sequence token. If `add_generation_prompt=True`, it adds
the starting header for an assistant message to the end of the conversation.
Load the written template as a string and assign it to the tokenizer's `chat_template` attribute. Once set, the template is used whenever you call [`~PreTrainedTokenizerBase.apply_chat_template`]. It is also saved
@ -42,7 +41,7 @@ edit this file directly to change the template, which is often easier than manip
The easiest way to start writing Jinja templates is to refer to existing templates. Use `print(tokenizer.chat_template)` on any chat model to see the template it's using. Try starting with simple models that don't call any tools or support RAG because tool-use models can have very complex templates. Finally, take a look at the [Jinja documentation](https://jinja.palletsprojects.com/en/stable/templates/#synopsis) for more details about formatting and syntax.
There are some specific tips and pitfalls you may encounter while writing chat templates specifically, though, and this section will cover some of them in more detail.
There are some specific tips and pitfalls you may encounter while writing chat templates specifically, though, and this section will cover some of them in more detail.
### Writing multimodal chat templates
@ -108,7 +107,6 @@ We strongly recommend using `-` to ensure only the intended content is printed.
### Special variables and callables
The only constants in a template are the `messages` variable and the `add_generation_prompt` boolean. However, you have
access to **any other keyword arguments that are passed** to the [`~PreTrainedTokenizerBase.apply_chat_template`] method.
@ -133,7 +131,7 @@ Make the changes below to ensure compatibility across all Jinja implementations.
### Big templates
Newer models or models with features like [tool-calling](./chat_extras#tools) and [RAG](./chat_extras#retrieval-augmented-generation-rag) require larger templates that can be longer than 100 lines. It may be easier to write larger templates in a separate file. The line numbers in the separate file corresponds exactly to the line numbers in template parsing or execution errors, making it easier to debug any potential issues.
Newer models or models with features like [tool-calling](./chat_extras) and RAG require larger templates that can be longer than 100 lines. It may be easier to write larger templates in a separate file. The line numbers in the separate file corresponds exactly to the line numbers in template parsing or execution errors, making it easier to debug any potential issues.
Write the template in a separate file and extract it to the chat template.
@ -166,22 +164,22 @@ The example below shows how a tool is defined in JSON schema format.
```json
{
"type": "function",
"type": "function",
"function": {
"name": "multiply",
"description": "A function that multiplies two numbers",
"name": "multiply",
"description": "A function that multiplies two numbers",
"parameters": {
"type": "object",
"type": "object",
"properties": {
"a": {
"type": "number",
"type": "number",
"description": "The first number to multiply"
},
},
"b": {
"type": "number",
"description": "The second number to multiply"
}
},
},
"required": ["a", "b"]
}
}
@ -190,7 +188,7 @@ The example below shows how a tool is defined in JSON schema format.
An example of handling tool definitions in a chat template is shown below. The specific tokens and layouts should be changed to match the ones the model was trained with.
```
```jinja
{%- if tools %}
{%- for tool in tools %}
{{- '<tool>' + tool['function']['name'] + '\n' }}
@ -228,7 +226,7 @@ Tool calls are generally passed in the `tool_calls` key of an `"assistant”` me
A common pattern for handling tool calls is shown below. You can use this as a starting point, but make sure you template actually matches the format the model was trained with!
```
```jinja
{%- if message['role'] == 'assistant' and 'tool_calls' in message %}
{%- for tool_call in message['tool_calls'] %}
{{- '<tool_call>' + tool_call['function']['name'] + '\n' + tool_call['function']['arguments']|tojson + '\n</tool_call>' }}
@ -251,7 +249,7 @@ Tool responses are message dicts with the `tool` role. They are much simpler tha
Some templates may not even need the `name` key, in which case, you can write your template to only read the `content` key.
```
```jinja
{%- if message['role'] == 'tool' %}
{{- "<tool_result>" + message['content'] + "</tool_result>" }}
{%- endif %}

View File

@ -48,7 +48,6 @@ transformers chat -h
The chat is implemented on top of the [AutoClass](./model_doc/auto), using tooling from [text generation](./llm_tutorial) and [chat](./chat_templating). It uses the `transformers serve` CLI under the hood ([docs](./serving.md#serve-cli)).
## TextGenerationPipeline
[`TextGenerationPipeline`] is a high-level text generation class with a "chat mode". Chat mode is enabled when a conversational model is detected and the chat prompt is [properly formatted](./llm_tutorial#wrong-prompt-format).
@ -109,7 +108,7 @@ quantization_config = BitsAndBytesConfig(load_in_8bit=True)
pipeline = pipeline(task="text-generation", model="meta-llama/Meta-Llama-3-8B-Instruct", device_map="auto", model_kwargs={"quantization_config": quantization_config})
```
In general, model size and performance are directly correlated. Larger models are slower in addition to requiring more memory because each active parameter must be read from memory for every generated token.
In general, model size and performance are directly correlated. Larger models are slower in addition to requiring more memory because each active parameter must be read from memory for every generated token.
This is a bottleneck for LLM text generation and the main options for improving generation speed are to either quantize a model or use hardware with higher memory bandwidth. Adding more compute power doesn't meaningfully help.
You can also try techniques like [speculative decoding](./generation_strategies#speculative-decoding), where a smaller model generates candidate tokens that are verified by the larger model. If the candidate tokens are correct, the larger model can generate more than one token at a time. This significantly alleviates the bandwidth bottleneck and improves generation speed.

View File

@ -21,9 +21,10 @@ where `port` is the port used by `transformers serve` (`8000` by default). On th
</h3>
You're now ready to set things up on the app side! In Cursor, while you can't set a new provider, you can change the endpoint for OpenAI requests in the model selection settings. First, navigate to "Settings" > "Cursor Settings", "Models" tab, and expand the "API Keys" collapsible. To set your `transformers serve` endpoint, follow this order:
1. Unselect ALL models in the list above (e.g. `gpt4`, ...);
2. Add and select the model you want to use (e.g. `Qwen/Qwen3-4B`)
3. Add some random text to OpenAI API Key. This field won't be used, but it cant be empty;
3. Add some random text to OpenAI API Key. This field won't be used, but it can't be empty;
4. Add the https address from `ngrok` to the "Override OpenAI Base URL" field, appending `/v1` to the address (i.e. `https://(...).ngrok-free.app/v1`);
5. Hit "Verify".
@ -38,5 +39,3 @@ You are now ready to use your local model in Cursor! For instance, if you toggle
<h3 align="center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_serve_cursor_chat.png"/>
</h3>

View File

@ -25,12 +25,12 @@ This guide will show you how to customize a ResNet model, enable [AutoClass](./m
## Configuration
A configuration, given by the base [`PretrainedConfig`] class, contains all the necessary information to build a model. This is where you'll configure the attributes of the custom ResNet model. Different attributes gives different ResNet model types.
A configuration, given by the base [`PreTrainedConfig`] class, contains all the necessary information to build a model. This is where you'll configure the attributes of the custom ResNet model. Different attributes gives different ResNet model types.
The main rules for customizing a configuration are:
1. A custom configuration must subclass [`PretrainedConfig`]. This ensures a custom model has all the functionality of a Transformers' model such as [`~PretrainedConfig.from_pretrained`], [`~PretrainedConfig.save_pretrained`], and [`~PretrainedConfig.push_to_hub`].
2. The [`PretrainedConfig`] `__init__` must accept any `kwargs` and they must be passed to the superclass `__init__`. [`PretrainedConfig`] has more fields than the ones set in your custom configuration, so when you load a configuration with [`~PretrainedConfig.from_pretrained`], those fields need to be accepted by your configuration and passed to the superclass.
1. A custom configuration must subclass [`PreTrainedConfig`]. This ensures a custom model has all the functionality of a Transformers' model such as [`~PreTrainedConfig.from_pretrained`], [`~PreTrainedConfig.save_pretrained`], and [`~PreTrainedConfig.push_to_hub`].
2. The [`PreTrainedConfig`] `__init__` must accept any `kwargs` and they must be passed to the superclass `__init__`. [`PreTrainedConfig`] has more fields than the ones set in your custom configuration, so when you load a configuration with [`~PreTrainedConfig.from_pretrained`], those fields need to be accepted by your configuration and passed to the superclass.
> [!TIP]
> It is useful to check the validity of some of the parameters. In the example below, a check is implemented to ensure `block_type` and `stem_type` belong to one of the predefined values.
@ -38,10 +38,10 @@ The main rules for customizing a configuration are:
> Add `model_type` to the configuration class to enable [AutoClass](./models#autoclass) support.
```py
from transformers import PretrainedConfig
from transformers import PreTrainedConfig
from typing import List
class ResnetConfig(PretrainedConfig):
class ResnetConfig(PreTrainedConfig):
model_type = "resnet"
def __init__(
@ -74,7 +74,7 @@ class ResnetConfig(PretrainedConfig):
super().__init__(**kwargs)
```
Save the configuration to a JSON file in your custom model folder, `custom-resnet`, with [`~PretrainedConfig.save_pretrained`].
Save the configuration to a JSON file in your custom model folder, `custom-resnet`, with [`~PreTrainedConfig.save_pretrained`].
```py
resnet50d_config = ResnetConfig(block_type="bottleneck", stem_width=32, stem_type="deep", avg_down=True)
@ -83,7 +83,7 @@ resnet50d_config.save_pretrained("custom-resnet")
## Model
With the custom ResNet configuration, you can now create and customize the model. The model subclasses the base [`PreTrainedModel`] class. Like [`PretrainedConfig`], inheriting from [`PreTrainedModel`] and initializing the superclass with the configuration extends Transformers' functionalities such as saving and loading to the custom model.
With the custom ResNet configuration, you can now create and customize the model. The model subclasses the base [`PreTrainedModel`] class. Like [`PreTrainedConfig`], inheriting from [`PreTrainedModel`] and initializing the superclass with the configuration extends Transformers' functionalities such as saving and loading to the custom model.
Transformers' models follow the convention of accepting a `config` object in the `__init__` method. This passes the entire `config` to the model sublayers, instead of breaking the `config` object into multiple arguments that are individually passed to the sublayers.
@ -235,7 +235,7 @@ from resnet_model.configuration_resnet import ResnetConfig
from resnet_model.modeling_resnet import ResnetModel, ResnetModelForImageClassification
```
Copy the code from the model and configuration files. To make sure the AutoClass objects are saved with [`~PreTrainedModel.save_pretrained`], call the [`~PretrainedConfig.register_for_auto_class`] method. This modifies the configuration JSON file to include the AutoClass objects and mapping.
Copy the code from the model and configuration files. To make sure the AutoClass objects are saved with [`~PreTrainedModel.save_pretrained`], call the [`~PreTrainedConfig.register_for_auto_class`] method. This modifies the configuration JSON file to include the AutoClass objects and mapping.
For a model, pick the appropriate `AutoModelFor` class based on the task.

View File

@ -35,7 +35,7 @@ pip install deepspeed
PyTorch comes with its own CUDA toolkit, but to use DeepSpeed with PyTorch, you need to have an identical version of CUDA installed system-wide. For example, if you installed PyTorch with `cudatoolkit==10.2` in your Python environment, then you'll also need to have CUDA 10.2 installed everywhere.
The exact location can vary from system to system, but `usr/local/cuda-10.2` is the most common location on many Unix systems. When CUDA is correctly set up and added to your `PATH` environment variable, you can find the installation location with the following command.
The exact location can vary from system to system, but `/usr/local/cuda-10.2` is the most common location on many Unix systems. When CUDA is correctly set up and added to your `PATH` environment variable, you can find the installation location with the following command.
```bash
which nvcc
@ -45,7 +45,7 @@ which nvcc
You may also have more than one CUDA toolkit installed on your system.
```bash
```text
/usr/local/cuda-10.2
/usr/local/cuda-11.0
```

View File

@ -294,7 +294,7 @@ Consider running a [benchmark](https://github.com/microsoft/DeepSpeed/issues/998
The example ZeRO-3 and ZeRO-Infinity config below sets most of the parameter values to `auto`, but you can also manually set configure these values.
```yaml
```json
{
"fp16": {
"enabled": "auto",
@ -383,7 +383,7 @@ Gradient checkpointing saves memory by only storing *some* of the intermediate a
The batch size can be automatically configured or manually set. When you choose the `"auto"` option, [`Trainer`] sets `train_micro_batch_size_per_gpu` and `train_batch_size` to the value of `world_size * per_device_train_batch_size * gradient_accumulation_steps`.
```yaml
```json
{
"train_micro_batch_size_per_gpu": "auto",
"train_batch_size": "auto"
@ -400,7 +400,7 @@ Reduce operations are lossy, for example, when gradients are averaged across mul
Choose the communication data type by setting the `communication_data_type` parameter in the config file. For example, choosing fp32 adds a small amount of overhead but ensures the reduction operation is accumulated in fp32 and when it is ready, it's downcasted to whichever half-precision data type you're training in.
```yaml
```json
{
"communication_data_type": "fp32"
}
@ -412,7 +412,7 @@ Gradient accumulation accumulates gradients over several mini-batches of data be
Gradient accumulation can be automatically configured or manually set. When you choose the `"auto"` option, [`Trainer`] sets it to the value of `gradient_accumulation_steps`.
```yaml
```json
{
"gradient_accumulation_steps": "auto"
}
@ -424,7 +424,7 @@ Gradient clipping is useful for preventing exploding gradients which can lead to
Gradient clipping can be automatically configured or manually set. When you choose the `"auto"` option, [`Trainer`] sets it to the value of `max_grad_norm`.
```yaml
```json
{
"gradient_clipping": "auto"
}
@ -439,7 +439,7 @@ Mixed precision accelerates training speed by performing some calculations in ha
Train in fp32 if a model wasn't pretrained in mixed precision because it may cause underflow or overflow errors. Disable fp16, the default, in this case.
```yaml
```json
{
"fp16": {
"enabled": false
@ -452,9 +452,9 @@ For Ampere GPUs and PyTorch 1.7+, the more efficient [tf32](https://pytorch.org/
</hfoption>
<hfoption id="fp16">
To configure AMP-like fp16 mixed precision, set up the config as shown below with `"auto"` or your own values. [`Trainer`] automatically enables or disables fp16 based on the value of `fp16_backend`, and the rest of the config can be set by you. fp16 is enabled from the command line when the following arguments are passed: `--fp16`, `--fp16_backend amp` or `--fp16_full_eval`.
To configure fp16 mixed precision, set up the config as shown below with `"auto"` or your own values. [`Trainer`] automatically enables or disables fp16 based on the value of `fp16` or `fp16_full_eval`, and the rest of the config can be set by you. fp16 is enabled from the command line when the following arguments are passed: `--fp16` or `--fp16_full_eval` also.
```yaml
```json
{
"fp16": {
"enabled": "auto",
@ -469,28 +469,17 @@ To configure AMP-like fp16 mixed precision, set up the config as shown below wit
For additional DeepSpeed fp16 training options, take a look at the [FP16 Training Options](https://www.deepspeed.ai/docs/config-json/#fp16-training-options) reference.
To configure Apex-like fp16 mixed precision, set up the config as shown below with `"auto"` or your own values. [`Trainer`] automatically configures `amp` based on the values of `fp16_backend` and `fp16_opt_level`. It can also be enabled from the command line when the following arguments are passed: `--fp16`, `--fp16_backend apex` or `--fp16_opt_level 01`.
```yaml
{
"amp": {
"enabled": "auto",
"opt_level": "auto"
}
}
```
</hfoption>
<hfoption id="bf16">
> [!TIP]
> bf16 requires DeepSpeed 0.6.0.
bf16 has the same dynamic range as fp32, and doesnt require loss scaling unlike fp16. However, if you use [gradient accumulation](#gradient-accumulation) with bf16, gradients are accumulated in bf16 which may not be desirable because the lower precision can lead to lossy accumulation.
bf16 has the same dynamic range as fp32, and doesn't require loss scaling unlike fp16. However, if you use [gradient accumulation](#gradient-accumulation) with bf16, gradients are accumulated in bf16 which may not be desirable because the lower precision can lead to lossy accumulation.
bf16 can be set up in the config file or enabled from the command line when the following arguments are passed: `--bf16` or `--bf16_full_eval`.
```yaml
```json
{
"bf16": {
"enabled": "auto"
@ -514,7 +503,7 @@ DeepSpeed offers several [optimizers](https://www.deepspeed.ai/docs/config-json/
You can set the parameters to `"auto"` or manually input your own values.
```yaml
```json
{
"optimizer": {
"type": "AdamW",
@ -530,7 +519,7 @@ You can set the parameters to `"auto"` or manually input your own values.
Use an unsupported optimizer by adding the following to the top level configuration.
```yaml
```json
{
"zero_allow_untested_optimizer": true
}
@ -538,7 +527,7 @@ Use an unsupported optimizer by adding the following to the top level configurat
From DeepSpeed 0.8.3+, if you want to use offload, you'll also need to add the following to the top level configuration because offload works best with DeepSpeed's CPU Adam optimizer.
```yaml
```json
{
"zero_force_ds_cpu_optimizer": false
}
@ -558,7 +547,7 @@ If you don't configure the scheduler in the config file, [`Trainer`] automatical
You can set the parameters to `"auto"` or manually input your own values.
```yaml
```json
{
"scheduler": {
"type": "WarmupDecayLR",
@ -581,7 +570,7 @@ You can set the parameters to `"auto"` or manually input your own values.
Resume training with a Universal checkpoint by setting `load_universal` to `true` in the config file.
```yaml
```json
{
"checkpoint": {
"load_universal": true
@ -604,7 +593,7 @@ To deploy DeepSpeed on multiple GPUs, add `--num_gpus`. You don't need to add `-
deepspeed --num_gpus=2 examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero3.json \
--model_name_or_path google-t5/t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir --fp16 \
--output_dir output_dir --fp16 \
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
--source_lang en --target_lang ro
@ -627,7 +616,7 @@ To deploy DeepSpeed on a single GPU, add `--num_gpus`. You don't need to add `--
deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
--deepspeed tests/deepspeed/ds_config_zero2.json \
--model_name_or_path google-t5/t5-small --per_device_train_batch_size 1 \
--output_dir output_dir --overwrite_output_dir --fp16 \
--output_dir output_dir --fp16 \
--do_train --max_train_samples 500 --num_train_epochs 1 \
--dataset_name wmt16 --dataset_config "ro-en" \
--source_lang en --target_lang ro
@ -640,7 +629,7 @@ deepspeed --num_gpus=1 examples/pytorch/translation/run_translation.py \
A multi-node setup consists of multiple nodes, where each node has one of more GPUs running a workload. DeepSpeed expects a shared storage system, but if this is not the case, you need to adjust the config file to include a [checkpoint](https://www.deepspeed.ai/docs/config-json/#checkpoint-options) to allow loading without access to a shared filesystem.
```yaml
```json
{
"checkpoint": {
"use_node_local_storage": true
@ -824,7 +813,7 @@ ZeRO-2 saves the model weights in fp16. To save the weights in fp16 for ZeRO-3,
If you don't, [`Trainer`] won't save the weights in fp16 and won't create a `pytorch_model.bin` file. This is because DeepSpeed's state_dict contains a placeholder instead of the real weights, so you won't be able to load it.
```yaml
```json
{
"zero_optimization": {
"stage": 3,
@ -986,7 +975,7 @@ NaN loss often occurs when a model is pretrained in bf16 and you try to use it w
It is also possible that fp16 is causing overflow. For example, if your config file looks like the one below, you may see the following overflow errors in the logs.
```yaml
```json
{
"fp16": {
"enabled": "auto",

View File

@ -226,7 +226,7 @@ tokenizer = PreTrainedTokenizerFast.from_pretrained("config/save/dir")
<Youtube id="Yffk5aydLzg"/>
A Transformers model expects the input to be a PyTorch or NumPy tensor. A tokenizers job is to preprocess text into those tensors. Specify the framework tensor type to return with the `return_tensors` parameter.
A Transformers model expects the input to be a PyTorch or NumPy tensor. A tokenizer's job is to preprocess text into those tensors. Specify the framework tensor type to return with the `return_tensors` parameter.
```py
from transformers import AutoTokenizer

View File

@ -32,9 +32,10 @@ Greedy search works well for tasks with relatively short outputs where creativit
```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
device = infer_device()
device = Accelerator().device
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hugging Face is an open-source company", return_tensors="pt").to(device)
@ -54,9 +55,10 @@ Enable multinomial sampling with `do_sample=True` and `num_beams=1`.
```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
device = infer_device()
device = Accelerator().device
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hugging Face is an open-source company", return_tensors="pt").to(device)
@ -79,9 +81,10 @@ Enable beam search with the `num_beams` parameter (should be greater than 1 othe
```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
device = infer_device()
device = Accelerator().device
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
inputs = tokenizer("Hugging Face is an open-source company", return_tensors="pt").to(device)
@ -166,9 +169,10 @@ Enable prompt lookup decoding with the `prompt_lookup_num_tokens` parameter.
```py
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, infer_device
from transformers import AutoModelForCausalLM, AutoTokenizer
from accelerate import Accelerator
device = infer_device()
device = Accelerator().device
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM-1.7B")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM-1.7B", dtype=torch.float16).to(device)
@ -229,6 +233,7 @@ tokenizer.batch_decode(outputs, skip_special_tokens=True)
## Custom generation methods
Custom generation methods enable specialized behavior such as:
- have the model continue thinking if it is uncertain;
- roll back generation if the model gets stuck;
- handle special tokens with custom logic;
@ -289,7 +294,7 @@ print(tokenizer.batch_decode(gen_out)[0])
If the custom method has pinned Python requirements that your environment doesn't meet, you'll get an exception about missing requirements. For instance, [transformers-community/custom_generate_bad_requirements](https://huggingface.co/transformers-community/custom_generate_bad_requirements) has an impossible set of requirements defined in its `custom_generate/requirements.txt` file, and you'll see the error message below if you try to run it.
```
```text
ImportError: Missing requirements in your local environment for `transformers-community/custom_generate_bad_requirements`:
foo (installed: None)
bar==0.0.0 (installed: None)
@ -301,6 +306,7 @@ Updating your Python requirements accordingly will remove this error message.
### Creating a custom generation method
To create a new generation method, you need to create a new [**Model**](https://huggingface.co/new) repository and push a few files into it.
1. The model you've designed your generation method with.
2. `custom_generate/generate.py`, which contains all the logic for your custom generation method.
3. `custom_generate/requirements.txt`, used to optionally add new Python requirements and/or lock specific versions to correctly use your method.
@ -308,7 +314,7 @@ To create a new generation method, you need to create a new [**Model**](https://
After you've added all required files, your repository should look like this
```
```text
your_repo/
├── README.md # include the 'custom_generate' tag
├── config.json
@ -377,6 +383,7 @@ def generate(model, input_ids, generation_config=None, left_padding=None, **kwar
```
Follow the recommended practices below to ensure your custom generation method works as expected.
- Feel free to reuse the logic for validation and input preparation in the original [`~GenerationMixin.generate`].
- Pin the `transformers` version in the requirements if you use any private method/attribute in `model`.
- Consider adding model validation, input validation, or even a separate test file to help users sanity-check your code in their environment.
@ -389,7 +396,6 @@ from .utils import some_function
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom generation method.
#### requirements.txt
You can optionally specify additional Python requirements in a `requirements.txt` file inside the `custom_generate` folder. These are checked at runtime and an exception will be thrown if they're missing, nudging users to update their environment accordingly.
@ -400,7 +406,7 @@ The root level `README.md` in the model repository usually describes the model t
For discoverability, we highly recommend you to add the `custom_generate` tag to your repository. To do so, the top of your `README.md` file should look like the example below. After you push the file, you should see the tag in your repository!
```
```text
---
library_name: transformers
tags:
@ -411,13 +417,14 @@ tags:
```
Recommended practices:
- Document input and output differences in [`~GenerationMixin.generate`].
- Add self-contained examples to enable quick experimentation.
- Describe soft-requirements such as if the method only works well with a certain family of models.
### Reusing `generate`s input preparation
### Reusing `generate`'s input preparation
If you're adding a new decoding loop, you might want to preserve the input preparation present in `generate` (batch expansion, attention masks, logits processors, stopping criteria, etc.). You can also pass a **callable** to `custom_generate` to reuse [`~GenerationMixin.generate`]s full preparation pipeline while overriding only the decoding loop.
If you're adding a new decoding loop, you might want to preserve the input preparation present in `generate` (batch expansion, attention masks, logits processors, stopping criteria, etc.). You can also pass a **callable** to `custom_generate` to reuse [`~GenerationMixin.generate`]'s full preparation pipeline while overriding only the decoding loop.
```py
def custom_loop(model, input_ids, attention_mask, logits_processor, stopping_criteria, generation_config, **model_kwargs):
@ -438,11 +445,12 @@ output = model.generate(
```
> [!TIP]
> If you publish a `custom_generate` repository, your `generate` implementation can itself define a callable and pass it to `model.generate()`. This lets you customize the decoding loop while still benefiting from Transformers built-in input preparation logic.
> If you publish a `custom_generate` repository, your `generate` implementation can itself define a callable and pass it to `model.generate()`. This lets you customize the decoding loop while still benefiting from Transformers' built-in input preparation logic.
### Finding custom generation methods
You can find all custom generation methods by [searching for their custom tag.](https://huggingface.co/models?other=custom_generate), `custom_generate`. In addition to the tag, we curate two collections of `custom_generate` methods:
- [Custom generation methods - Community](https://huggingface.co/collections/transformers-community/custom-generation-methods-community-6888fb1da0efbc592d3a8ab6) -- a collection of powerful methods contributed by the community;
- [Custom generation methods - Tutorials](https://huggingface.co/collections/transformers-community/custom-generation-methods-tutorials-6823589657a94940ea02cfec) -- a collection of reference implementations for methods that previously were part of `transformers`, as well as tutorials for `custom_generate`.

View File

@ -185,9 +185,9 @@ See the [Fine-tune a pretrained model](https://huggingface.co/docs/transformers/
The model head refers to the last layer of a neural network that accepts the raw hidden states and projects them onto a different dimension. There is a different model head for each task. For example:
* [`GPT2ForSequenceClassification`] is a sequence classification head - a linear layer - on top of the base [`GPT2Model`].
* [`ViTForImageClassification`] is an image classification head - a linear layer on top of the final hidden state of the `CLS` token - on top of the base [`ViTModel`].
* [`Wav2Vec2ForCTC`] is a language modeling head with [CTC](#connectionist-temporal-classification-ctc) on top of the base [`Wav2Vec2Model`].
* [`GPT2ForSequenceClassification`] is a sequence classification head - a linear layer - on top of the base [`GPT2Model`].
* [`ViTForImageClassification`] is an image classification head - a linear layer on top of the final hidden state of the `CLS` token - on top of the base [`ViTModel`].
* [`Wav2Vec2ForCTC`] is a language modeling head with [CTC](#connectionist-temporal-classification-ctc) on top of the base [`Wav2Vec2Model`].
## I

View File

@ -149,4 +149,4 @@ Call [print_trainable_parameters](https://huggingface.co/docs/peft/package_refer
```py
model.print_trainable_parameters()
"trainable params: 589,824 || all params: 94,274,096 || trainable%: 0.6256"
```
```

View File

@ -15,15 +15,12 @@ rendered properly in your Markdown viewer.
# Hyperparameter search
Hyperparameter search discovers an optimal set of hyperparameters that produces the best model performance. [`Trainer`] supports several hyperparameter search backends - [Optuna](https://optuna.readthedocs.io/en/stable/index.html), [SigOpt](https://docs.sigopt.com/), [Weights & Biases](https://docs.wandb.ai/), [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) - through [`~Trainer.hyperparameter_search`] to optimize an objective or even multiple objectives.
Hyperparameter search discovers an optimal set of hyperparameters that produces the best model performance. [`Trainer`] supports several hyperparameter search backends - [Optuna](https://optuna.readthedocs.io/en/stable/index.html), [Weights & Biases](https://docs.wandb.ai/), [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) - through [`~Trainer.hyperparameter_search`] to optimize an objective or even multiple objectives.
This guide will go over how to set up a hyperparameter search for each of the backends.
> [!WARNING]
> [SigOpt](https://github.com/sigopt/sigopt-server) is in public archive mode and is no longer actively maintained. Try using Optuna, Weights & Biases or Ray Tune instead.
```bash
pip install optuna/sigopt/wandb/ray[tune]
pip install optuna/wandb/ray[tune]
```
To use [`~Trainer.hyperparameter_search`], you need to create a `model_init` function. This function includes basic model information (arguments and configuration) because it needs to be reinitialized for each search trial in the run.
@ -109,31 +106,7 @@ best_trials = trainer.hyperparameter_search(
n_trials=20,
compute_objective=compute_objective,
)
```
</hfoption>
<hfoption id="SigOpt">
[SigOpt](https://docs.sigopt.com/ai-module-api-references/api_reference/objects/object_parameter) optimizes double, integer, and categorical parameters.
```py
def sigopt_hp_space(trial):
return [
{"bounds": {"min": 1e-6, "max": 1e-4}, "name": "learning_rate", "type": "double"},
{
"categorical_values": ["16", "32", "64", "128"],
"name": "per_device_train_batch_size",
"type": "categorical",
},
]
best_trials = trainer.hyperparameter_search(
direction=["minimize", "maximize"],
backend="sigopt",
hp_space=sigopt_hp_space,
n_trials=20,
compute_objective=compute_objective,
)
```
</hfoption>
@ -166,4 +139,4 @@ best_trials = trainer.hyperparameter_search(
## Distributed Data Parallel
[`Trainer`] only supports hyperparameter search for distributed data parallel (DDP) on the Optuna and SigOpt backends. Only the rank-zero process is used to generate the search trial, and the resulting parameters are passed along to the other ranks.
[`Trainer`] only supports hyperparameter search for distributed data parallel (DDP) on the Optuna backends. Only the rank-zero process is used to generate the search trial, and the resulting parameters are passed along to the other ranks.

View File

@ -19,7 +19,6 @@ rendered properly in your Markdown viewer.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
@ -35,6 +34,10 @@ There are over 1M+ Transformers [model checkpoints](https://huggingface.co/model
Explore the [Hub](https://huggingface.com/) today to find a model and use Transformers to help you get started right away.
Explore the [Models Timeline](./models_timeline) to discover the latest text, vision, audio and multimodal model architectures in Transformers.
## Features
Transformers provides everything you need for inference or training with state-of-the-art pretrained models. Some of the main features include:
@ -61,4 +64,4 @@ Transformers is designed for developers and machine learning engineers and resea
## Learn
If you're new to Transformers or want to learn more about transformer models, we recommend starting with the [LLM course](https://huggingface.co/learn/llm-course/chapter1/1?fw=pt). This comprehensive course covers everything from the fundamentals of how transformer models work to practical applications across various tasks. You'll learn the complete workflow, from curating high-quality datasets to fine-tuning large language models and implementing reasoning capabilities. The course contains both theoretical and hands-on exercises to build a solid foundational knowledge of transformer models as you learn.
If you're new to Transformers or want to learn more about transformer models, we recommend starting with the [LLM course](https://huggingface.co/learn/llm-course/chapter1/1?fw=pt). This comprehensive course covers everything from the fundamentals of how transformer models work to practical applications across various tasks. You'll learn the complete workflow, from curating high-quality datasets to fine-tuning large language models and implementing reasoning capabilities. The course contains both theoretical and hands-on exercises to build a solid foundational knowledge of transformer models as you learn.

Some files were not shown because too many files have changed in this diff Show More