Compare commits

..

2 Commits

Author SHA1 Message Date
8ac2b916b0 Release: v4.57.0 2025-10-03 18:32:49 +02:00
2ccc6cae21 v4.57.0 Branch (#41310)
* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)

Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts

* Fix single quotes in markdown (#41154)

Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* extend gemma3n integration ut cases on XPU (#41071)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Add Parakeet (#39062)

* first commit

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update to handle masking for bs>1

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tests and docs

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update model ids

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update docs and improve style

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update librosa location

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* import guard torch too

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff code checks fix

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* ruff format check

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* updated to parakeet names

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update script

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Add tokenizer decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Remove other model dependency

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* clean tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* linting

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff lint warnings

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* move to seperate folders

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet ctc model code

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* simplify encoder structure

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* update documentation

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet to toctree

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* add parakeet doc

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Address comments

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* Update featurizer to compute lens directly

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix ruff tests

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix encoding format

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* fix minor ctc decoding

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* revert modular_model_converter.py changes

* revert check_config_attributes.py changes

* refactor: fastconformer & parakeet_ctc -> parakeet

* modeling update

* test update

* propagate feature extractor updates

* propagate doc changes

* propagate doc changes

* propagate tokenization changes

* propagate conversion changes

* remove fastconformer tests

* remove modular

* update processor

* update processor

* tset update

* diverse fixes

* 100% macthing greedy batched

* Update conversion script.

* Refactor docs.

* Reafactor auto loading.

* Refactor and fix tokenization and processing.

* Update integration test.

* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss

* Format and repo consistency.

* Update model doc.

* Fix feature extraction tests.

* Fix (most) tokenizer tests.

* Add pipeline example.

* Fixes

* Use eager_attention_forward from Llama.

* Small tweaks.

* Replace Sequential with ModuleList

* Add check if not all layers copied

* Clean tokenizer.

* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.

* Switch to modular for modeling and processing.

* Add processor tests.

* Fix modeling tests.

* Formating and docstrings.

* Add `return_attention_mask` like other feature extractors.

* clean up after merging main.

* nits on modeling

* configuration update

* nit

* simplification: use PretrainedTokenizerFast, simplify processor

* add dtype arg to mel_filter_bank

* feature extraction: simplify!

* modeling update

* change to ParakeetTokenizerFast

* correct attention mask handling

* auto update

* proc update

* test update

* feature extraction fixes

* modeling update

* conversion script update

* udpate tests feature integration

* update tokenization and tests

* processor tests

* revert audio_utils

* config docstring update

* blank_token -> pad_token

* modeling udpate

* doc update

* fix tests

* fix test

* fix tests

* address review comments

* add comment

* add comment

* explicitly not support flash

* atttention straightforward masking

* fix

* tokenizer update: skipping blank tokens by default

* doc update

* fix max_positions_embeddings handling

* nits

* change atol faeture extraction integration tests

* doc update + fix loss

* doc update

* nit

* update integration test for A10

* repo id name

* nit

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>

* Fix format of compressed_tensors.md (#41155)

* Fix table format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify and improve model loading logic (#41103)

* remove unexpected keys from inputs (they have nothing to do there)

* remove input

* simplify a lot init

* fix

* fix check for non-persistent buffer

* revert because too many old and bad models...

* remove comment

* type hint

* make it a real test

* remove model_to_load -> always use the same model

* typo

* remove legacy offload_folder (we never waste that memory anymore)

* do not change prefix anymore

* change very bad function name

* create adjust method

* remove useless method

* restrict

* BC

* remove unused method

* CI

* remove unused args

* small fix

* fix

* CI

* CI

* avoid too many loops

* fix regex

* cleaner

* typo

* fix

* fix

* Force new vision models addition to include a fast image processor (#40802)

* add test

* fix test and change cutoff date

* Add documentation to test

* Add language specifiers to code blocks of markdown files (#41114)

* Add language specifiers to code blocks of markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/qwen3_omni_moe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/chat_templating_writing.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update nemotron.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update phimoe.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update README.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fix syntax error

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Improve `add_dates` script (#41167)

* utils/add_dates.py

* put lfm2-vl in correct category

* Fix flash-attn for paged_attention when no kernels (#41078)

* Fix non-kernels flash attention paged implementation

* Cover all cases

* Style

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Apply style fixes

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Remove data from examples (#41168)

Remove telemetry

* Enable fa in amd docker (#41069)

* Add FA to docker

* Use caching mechanism for qwen2_5

* Fix a typo in important models list

* Partial fixes for gemma3

* Added a commit ID for FA repo

* Detailled  the expectation storage format

* Rebase fix

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* handle flash slow tests (#41072)

* handle flash slow tests

* update patch mask to 1/0 for flash

* don't skip flash

* flash

* raise tols

* rm flash support :(

* nits

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>

* Modernbert fix (#41056)

* Add FA to docker

* Fixed padding for mdernbert

* Fixed logits and hidden states extraction in ModernBertForMultipleChoice

* Added a test for ModernBertForMultipleChoice

* fixes

* More fixes and GREEN CI

* consistency

* moar consistency

* CI Runners - move amd runners mi355 and 325 to runner group (#41193)

* Update CI workflows to use devmi355 branch

* Add workflow trigger for AMD scheduled CI caller

* Remove unnecessary blank line in workflow YAML

* Add trigger for workflow_run on main branch

* Update workflow references from devmi355 to main

* Change runner_scale_set to runner_group in CI config

* [XPU] Add MXFP4 support for XPU (#41117)

* XPU supports gpt-oss MXFP4

* Complete MXFP4 UT file and comment information

* Complete MXFP4 UT file and comment information

* Fix code style

* Fix code style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [tests] `CausalLMTester` automatically infers other test classes from `base_model_class` 🐛 🔫  (#41066)

* halfway through the models

* update test checks

* refactor all

* another one

* use tuples

* more deletions

* solve bad inheritance patterns

* type

* PR ready?

* automatic model class inference from the base class

* vaultgemma

* make fixup

* make fixup

* rebase with gpt2

* make fixup :'(

* gpt2 is special

* More typing fixes (#41102)

* Fix noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* remove noqa

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix chars

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* enable flex attention ut cases on XPU (#40989)

* enable flex attention ut cases on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix(trainer): Avoid moving model with device_map (#41032)

* fix(trainer): Avoid moving model with device_map

When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.

This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.

A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.

* Added the logger warning for the move model

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* Fix attention sink implementation in flex attention (#41083)

* Fix attention sink implementation in flex attention

* fix dim

* fix

* Remove print

* raisae error when return_lse is False yet s_aux is providewd

* Clean test files for merge

* Update src/transformers/integrations/flex_attention.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* force return lse

* Add to doc

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Separate docker images for Nvidia and AMD in benchmarking (#41119)

Separate docker images for Nvidia and AMD

* Make quantizers good citizens loading-wise (#41138)

* fix param_needs_quantization

* rewrite most hqq

* clean

* fix

* comment

* remove it from exception of safetensors

* start on bnb 4bits

* post-rebase fix

* make bnb4 bit a good citizen

* remove forgotten print

* make bnb 8bits a good citizen

* better hqq

* fix

* clean

* remove state dict from signature

* switch method

* make torchao a good citizen

* fixes

* fix torchao

* add check

* typo

* [`Kernels Attention`] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)

* fix

* be more strict

* change logic to include fa3

* fix the case where nothing is requested

* modify old tests + add kernels related tests

* style

* Add EdgeTAM (#39800)

* initial comment

* test

* initial conversion for outline

* intermediate commit for configuration

* chore:init files for sam2

* adding arbitary undefined config

* check

* add vision

* make style

* init sam2 base model

* Fix imports

* Linting

* chore:sam to sam2 classes

* Linting

* Add sam2 to models.__init__

* chore:match prompt encoder with sam2 code

* chore:prepare kwargs for mask decoder

* Add image/video predictors

* Add CUDA kernel

* Add output classes

* linting

* Add logging info

* tmp commit

* docs for sam2

* enable image processing

* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize

* enable promptencoder of sam2

* fix promprencoder

* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)

* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)

* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)

* SamModel is now available (Need more chore for name)

* make fix-copies

* make style

* make CI happy

* Refactor VisionEncoder and PostioinEmbedding

* TO DO : fix the image_embeddings and sparse_embeddings part

* pure image inference done

* reusable features fix and make style

* styling

* refactor memoryattention

* tmp

* tmp

* refactor memoryencoder
TO DO : convert and inference the video pipeline

* TO DO : fix the image_encoder shape

* conversion finish
TO DO: need to check video inference

* make style

* remove video model

* lint

* change

* python utils/check_docstringspy --check_all

* python utils/check_config_attributes.py

* remove copies for sam2promptencoder due to configuration

* change __init__.py

* remove tensorflow version

* fix that to not use direct comparison

* make style

* add missing import

* fix image_embedding_size

* refactor Sam2 Attention

* add fully working video inference (refactoring todo)

* clarify _prepare_memory_conditioned_features

* simplify modeling code, remove unused paths

* use one model

* use auto_docstring

* refactor rope embeddings

* nit

* not using multimask when several points given

* add all sam2.1

* add video tmp

* add Sam2VideoSessionState + fast image proc + video proc

* remove init_states from model

* fix batch inference

* add image integration tests

* uniformize modeling code with other sam models and use modular

* pass vision tests an most model tests

* All tests passing

* add offloading inference state and video to cpu

* fix inference from image embedding and existing mask

* fix multi_boxes mask inference

* Fix batch images + batch boxes inference

* improve processing for image inference

* add support for mask generation pipeline

* add support for get_connected_components post processing in mask generation

* add fast image processor sam, image processor tests and use modular for sam2 image processor

* fix mistake in sam after #39120

* fix init weights

* refactor convert

* add integration tests for video + other improvements

* add needed missing docstrings

* Improve docstrings and

* improve inference speed by avoiding cuda sync

* add test

* skip test for vision_model

* minor fix for vision_model

* fix vision_model by adding sam2model and change the torch dependencies

* remove patch_size

* remove image_embedding_size

* fix patch_size

* fix test

* make style

* Separate hieradet and vision encoder in sam2

* fixup

* review changes part 1

* remove MemoryEncoderConfig and MemoryAttentionConfig

* pass q_stride instead of q_pool module

* add inference on streamed videos

* explicitely process streamed frames

* nit

* Improve docstrings in Sam2Model

* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel

* improve video inference api

* change inference_state to inference_session

* use modular for Sam2Model

* fix convert sam2 hf

* modular

* Update src/transformers/models/sam2/video_processing_sam2.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix minor config

* fix attention loading error

* update modeling tests to use hub checkpoints

* Use CI A10 runner for integration tests values + higher tolerance for video integration tests

* PR review part 1

* fix doc

* nit improvements

* enforce one input format for points, labels and boxes

* nit

* last few nits from PR review

* fix style

* fix the input type

* fix docs

* add sam2 model as conversion script

* improve sam2 doc

* add rough necessarry changes

* first working edgetam

* fix issue with object pointers

* Use modular as much as possible

* nit fixes + optimization

* refactor spatial perceiver

* cleanup after merge

* add working edgetam

* improve perceiver resampler code

* simplify/unify rope attention logic

* Improve comments in apply_rotary_pos_emb_2d

* add working tests

* fix test timmwrapper

* add docs

* make fixup

* nits

* fix modular

* fix modular

* PR review part 1

* split apply_rotary_pos_emb_2d

* add granularity to _prepare_memory_conditioned_features

* add dates to doc

* add separate mlp for memory attention

* Fix memory on wrong device

* store processed frames in dict

* update checkpoints in tests

* update dates

---------

Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix EXAONE-4.0 dummy id (#41089)

* Fix EXAONE-4.0 dummy id

* Fix exaone4 dummy (#1)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix 8bit bnb loading (#41200)

* Fix 8bit

* oups forgot the case where it is not prequantized

* Fix docker quantization (#41201)

* launch docker

* remove gptq for now

* run tests

* Revert "run tests"

This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.

* revert

* Embed interactive timeline in docs (#41015)

* embed timeline in docs (test web componentand Iframe)

* test scaling

* test multiple scales

* compensate scale in width

* set correct syle and scale

* remove bottom space created by scale

* add timeline as a separate page

* reformulate docs after review

* [docs] Fix links (#41110)

fix

* Remove unnecessary Optional typing (#41198)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)

* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)

* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes

* make style

* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes

* docs/examples(speech): address review  remove Hub subsection & Whisper tip; align dataset help text

* style: apply ruff/black/usort/codespell on examples/speech-recognition

* Apply style fixes

* Update examples/pytorch/speech-recognition/README.md

* update doc to match load_dataset

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix Qwen3-Omni audio_token_id serialization issue (#41192)

Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map

- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles

Fixes #41191

* Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)

* Update trainer.py

* fix

* fix format

* move barrier, delete redundant

* Avoid assumption that model has config attribute in deepspeed (#41207)

Avoid assumption that model has config in deepspeed

* Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_step` (#41183)

* Add num_items_in_batch computation to predict_step.

* address comments.

* Fix test cases.

* fixup

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)

add accepts_loss_kwargs=False to EsmPreTrainedModel

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Align pull request template to bug report template (#41220)

The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.

* [generate] cache missing custom generate file (#41216)

* cache missing custom generate file

* make fixup

* Remove old Python code (#41226)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)

Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.

Co-authored-by: frozenleaves <frozen@Mac.local>

* update code owners (#41221)

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Unify is_torchvision_v2_available with is_torchvision_available (#41227)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing of train_args (#41142)

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix fsdp typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix sliding window attn mask (#41228)

* Fix sliding window attn mask

* Clearer test

* Apply style fixes

* If Picasso made ascii drawings he would have made this

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)

* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"

This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.

* fix

* [docs] Fix tp_plan (#41205)

remove manual

* Fix white space in documentation (#41157)

* Fix white space

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix autodoc

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix qwen text config (#41158)

* fix qwen text config

* fix tests

* fix one more test

* address comments

* Video processor accepts single frames on cuda (#41218)

* fix

* why was is np if input is in torch

* Use math.log2 (#41241)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix TrainerIntegrationDeepSpeed UT failures (#41236)

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* [repo utils] Update `models_to_deprecate.py` (#41231)

* update models_to_deprecate

* exclude this file

* handle typos and aliases

* don't commit files

* PR suggestions; make fixup

* Use removeprefix and removesuffix (#41240)

* Use removeprefix and removesuffix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings (#41222)

* Remove unused variables

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove reimported packages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove all instances of `is_safetensors_available` (#41233)

* safetensors is a core dep

* fix

* ok

* simplify branching

* keep it for now

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* FP-Quant NVFP4 and Python 3.9 support (#39876)

* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

* nvfp4

* nvfp4 tests

* FP-Quant version bumped

* nvfp4 default and docs update

* trainable

* cpu if pseudoquant

* proper group size selection

* gsr

* qutlass requirement version bumo

* Upstream docker copy

* docs update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [`FA3`] Fix masking and loading logic in same process (#41217)

fix loading and fa3 masking

* [t5gemma] fix `get_text_config` and related fixes (#40939)

* tmp commit

* t5gemma fixes

* Don't convert to `safetensors` on the fly if the call is from testing (#41194)

* don't convert

* disable

* Update src/transformers/modeling_utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

* disable

* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* Resolve remote custom module path warnings (#41243)

* add peft team members to issue/pr template (#41262)

* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* docs: update bitsandbytes platform support (#41266)

* add more activation kernels, follow up  (#40944)

* add more activation kernels

* fixing style

* fix version

* fix asr pipeline ut failures (#41275)

* fix asr pipeline ut failures

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* make style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Use regex defailed flags (#41264)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)

* fix multi-video timestamp bug in qwen3vl,glm4v

* run make fix-copies to sync modular files

* run make fix-copies to sync modular files

---------

Co-authored-by: UBT <daqin.luo@ubtrobot.com>

* Fix binding of video frames to video placeholder in `InternVL` model (#41237)

* Fix binding video frames to video placeholder in prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add test on binding video frames to prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix code style issues

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix broken tests on `InternVLProcessor`

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add `return_tensors` to video processor defaults

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

---------

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Deprecate Trackio environment variables and deploy to Spaces by default (#40950)

* allow prive space id for trackio

* complete docstring

* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default

* style

* Enhance documentation for Trackio Space ID in TrainingArguments

* Allow private Space id for Trackio (#40948)

* allow prive space id for trackio

* complete docstring

* fix async client for transformers chat (#41255)

* fix-client

* fix

* Unify is_torchvision_v2_available with is_torchvision_available (#41259)

Fix is_torchvision_v2_available

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use max/min (#41280)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Biogptlogits (#41270)

added logits slicing to BioGpt for seq classifier

Signed-off-by: Aviral <aviralkamaljain@gmail.com>

* Fix unnecessary single-item container checks (#41279)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint generator warnings (#41258)

Fix pylint generator warnings

Signed-off-by: cyy <cyyever@outlook.com>

* feat: use `aws-highcpu-32-priv` for amd docker img build (#41285)

* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI

* Add processor and intergration test for qwen3vl (#41277)

* support aux loss in qwen3vlmoe

* update qwen3vl processor test!

* add integration tests for qwen3vl-30a3

* remove duplicated decorator

* code clean

* fix consistency

* do not inherit from nn.Linear for better quantization

* pass check

* Remove `test_initialization` (#41261)

remove it

* Remove some previous team members from allow list of triggering Github Actions (#41263)

* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Build doc in 2 jobs: `en` and `other languages` (#41290)

* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix mxfp4 dequantization (#41292)

fix

* [`Flex Attn`] Fix lse x attention sinks logic   (#41249)

fix

* FIX: Bug in PEFT integration delete_adapter method (#41252)

The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.

Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.

(Note: I tested locally with that the test will pass with PEFT 0.18.0)

While working on this, I also cleaned up the following:

- The active_adapter property has been deprecated for more than 2 years
  (#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
  the docstrings, which have been addressed.

When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.

* Italian translation for README.md (#41269)

chore: add Italian translation for README.md

* Fix README.md error when installing from source (#41303)

* download and use HF Hub Cache (#41181)

use hub cache

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix some merge issues

* [test_all]

* [test-all]

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
Signed-off-by: Aviral <aviralkamaljain@gmail.com>
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Qile Xu <87457840+Xqle@users.noreply.github.com>
Co-authored-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Pk Patel <46714886+The5cheduler@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: OMOTAYO OMOYEMI <58476114+tayo4christ@users.noreply.github.com>
Co-authored-by: eun2ce <joeun2ce@gmail.com>
Co-authored-by: Sam Sharpe <ssharpe42y@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: 魅影 <46097299+frozenleaves@users.noreply.github.com>
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: tim120526 <43242086+tim120526@users.noreply.github.com>
Co-authored-by: UBT <daqin.luo@ubtrobot.com>
Co-authored-by: Daniel Bershatsky <daskol@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: 0xAvi <aviralkamaljain@gmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Federico Moretti <hello@federicomoretti.it>
Co-authored-by: YangshenDeng <yangshen.d@outlook.com>
2025-10-03 18:29:51 +02:00
3443 changed files with 312044 additions and 168825 deletions

View File

@ -48,17 +48,18 @@ body:
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @SunMarc
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @SunMarc @zach-huggingface
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization: @SunMarc @MekkCyber
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo

View File

@ -51,17 +51,18 @@ Library:
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @SunMarc
- trainer: @zach-huggingface @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker
- distributed: @3outeille @ArthurZucker @S1ro1
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @SunMarc @zach-huggingface
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization: @SunMarc @MekkCyber
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo

View File

@ -22,6 +22,7 @@ tests/generation/ @gante
/src/transformers/models/auto/ @ArthurZucker
/src/transformers/utils/ @ArthurZucker @Rocketknight1
/src/transformers/loss/ @ArthurZucker
/src/transformers/onnx/ @michaelbenayoun
# Specific files come after the sections/globs, so they take priority
/.circleci/config.yml @ArthurZucker @ydshieh

View File

@ -12,8 +12,6 @@ concurrency:
env:
HF_HOME: /mnt/cache
DATASET_ID: hf-benchmarks/transformers
MODEL_ID: meta-llama/Llama-3.1-8B-Instruct
jobs:
benchmark:
@ -28,20 +26,35 @@ jobs:
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )||
(github.event_name == 'push' && github.ref == 'refs/heads/main')
container:
image: huggingface/transformers-all-latest-gpu
image: huggingface/transformers-pytorch-gpu
options: --gpus all --privileged --ipc host
steps:
- name: Get repo
uses: actions/checkout@v5
uses: actions/checkout@v4
with:
fetch-depth: 1
ref: ${{ github.event.pull_request.head.sha || github.sha }}
- name: Install libpq-dev & psql
run: |
apt update
apt install -y libpq-dev postgresql-client
- name: Install benchmark script dependencies
run: python3 -m pip install -r benchmark_v2/requirements.txt kernels
run: python3 -m pip install -r benchmark/requirements.txt
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]"
- name: Run database init script
run: |
psql -f benchmark/utils/init_db.sql
env:
PGDATABASE: metrics
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
- name: Run benchmark
run: |
git config --global --add safe.directory /__w/transformers/transformers
@ -51,11 +64,13 @@ jobs:
commit_id=$GITHUB_SHA
fi
commit_msg=$(git show -s --format=%s | cut -c1-70)
python3 benchmark_v2/run_benchmarks.py -b 32 -s 128 -n 256 --level 2 --branch-name "$BRANCH_NAME" --commit-id "$commit_id" --commit-message "$commit_msg" --model-id "$MODEL_ID" --log-level INFO --push-result-to-dataset "$DATASET_ID"
python3 benchmark/benchmarks_entrypoint.py "huggingface/transformers" "$BRANCH_NAME" "$commit_id" "$commit_msg"
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
PUSH_TO_HUB_TOKEN: ${{ secrets.PUSH_TO_HUB_TOKEN }}
# Enable this to see debug logs
# HF_HUB_VERBOSITY: debug
# TRANSFORMERS_VERBOSITY: debug
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

View File

@ -1,7 +1,35 @@
name: Benchmark v2 Framework
on:
workflow_dispatch:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
container_image:
description: 'Docker image to use'
required: true
type: string
container_options:
description: 'Container options to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''
env:
HF_HOME: /mnt/cache
@ -54,4 +82,4 @@ jobs:
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -1,7 +1,11 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
workflow_dispatch:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
@ -9,9 +13,9 @@ jobs:
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
container_image: huggingface/transformers-all-latest-gpu
container_image: huggingface/transformers-pytorch-gpu
container_options: --gpus all --privileged --ipc host --shm-size "16gb"
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
secrets: inherit

View File

@ -1,7 +1,11 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
workflow_dispatch:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
jobs:
benchmark-v2-default:
@ -14,4 +18,4 @@ jobs:
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
secrets: inherit

View File

@ -45,59 +45,33 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
flash-attn-ci-image:
name: "PyTorch with Flash Attn [dev]"
runs-on:
group: aws-general-8-plus
steps:
# Push CI images still need to be re-built daily
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
PYTORCH=2.8.0
TORCHCODEC=0.7.0
FLASH_ATTN=yes
push: true
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}:flash-attn
tags: huggingface/transformers-all-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu docker build
title: 🤗 Results of the transformers-all-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
runs-on:
group: aws-general-8-plus
group: aws-g4dn-2xlarge-cache
steps:
-
name: Set up Docker Buildx
@ -130,8 +104,51 @@ jobs:
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
# Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
latest-torch-deepspeed-docker-for-push-ci-daily-build:
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
doc-builder:
name: "Doc builder"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
@ -164,6 +181,44 @@ jobs:
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch:
name: "Latest PyTorch [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-gpudocker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on:
@ -190,47 +245,29 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu build
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
cache-latest-pytorch-amd:
name: "Cache Latest Pytorch (AMD) Image"
needs: latest-pytorch-amd
runs-on:
group: amd-mi325-1gpu
steps:
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Pull and save docker image to cache
run: |
image="huggingface/transformers-pytorch-amd-gpu"
final_path="/mnt/image-cache/transformers-pytorch-amd-gpu.tar"
tmp_path="${final_path}.tmp"
echo "Pulling image: ${image}"
docker pull "${image}"
echo "Saving to temp file: ${tmp_path}"
docker save "${image}" -o "${tmp_path}"
echo "Moving to final path: ${final_path}"
mv -f "${tmp_path}" "${final_path}"
echo "Cache populated successfully at ${final_path}"
latest-pytorch-deepspeed-amd:
name: "PyTorch + DeepSpeed (AMD) [dev]"
runs-on:
@ -257,6 +294,19 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
- name: Post to Slack
if: always()
@ -269,6 +319,8 @@ jobs:
latest-quantization-torch-docker:
name: "Latest Pytorch + Quantization [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:

View File

@ -1,23 +0,0 @@
---
name: Check Permissions Advisor
on:
workflow_dispatch:
inputs:
workflow_name:
description: 'Workflow file name'
type: string
run_count:
description: 'Number of runs to analyze'
type: string
default: "10"
jobs:
advisor:
uses: huggingface/security-workflows/.github/workflows/permissions-advisor-reusable.yml@main
permissions:
actions: read
contents: read
with:
workflow_name: ${{ inputs.workflow_name }}
run_count: ${{ fromJSON(inputs.run_count) }}

View File

@ -6,6 +6,9 @@ on:
docker:
required: true
type: string
start_sha:
required: true
type: string
job:
required: true
type: string
@ -21,13 +24,7 @@ on:
commit_sha:
required: false
type: string
pr_number:
required: false
type: string
outputs:
report:
description: "Content of the report of new failures"
value: ${{ jobs.process_new_failures_with_commit_info.outputs.report }}
env:
HF_HOME: /mnt/cache
@ -38,20 +35,16 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
check_new_failures:
name: "Find commits for new failing tests"
strategy:
matrix:
run_idx: [1]
name: " "
runs-on:
group: aws-g5-4xlarge-cache
outputs:
process: ${{ steps.check_file.outputs.process }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -62,19 +55,14 @@ jobs:
path: /transformers/ci_results_${{ inputs.job }}
- name: Check file
id: check_file
working-directory: /transformers
env:
job: ${{ inputs.job }}
run: |
if [ -f "ci_results_${job}/new_failures.json" ]; then
echo "\`ci_results_${job}/new_failures.json\` exists, continue ..."
if [ -f ci_results_${{ inputs.job }}/new_failures.json ]; then
echo "`ci_results_${{ inputs.job }}/new_failures.json` exists, continue ..."
echo "process=true" >> $GITHUB_ENV
echo "process=true" >> $GITHUB_OUTPUT
else
echo "\`ci_results_${job}/new_failures.json\` doesn't exist, abort."
echo "`ci_results_${{ inputs.job }}/new_failures.json` doesn't exist, abort."
echo "process=false" >> $GITHUB_ENV
echo "process=false" >> $GITHUB_OUTPUT
fi
- uses: actions/download-artifact@v4
@ -93,62 +81,27 @@ jobs:
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Update clone
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Get `START_SHA`
- name: Get target commit
working-directory: /transformers/utils
if: ${{ env.process == 'true' }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
echo "START_SHA=$commit_sha" >> $GITHUB_ENV
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
# This is used if the CI is triggered from a pull request `self-comment-ci.yml` (after security check is verified)
- name: Extract the base commit on `main` (of the merge commit created by Github) if it is a PR
id: pr_info
if: ${{ env.process == 'true' && inputs.pr_number != '' }}
uses: actions/github-script@v6
with:
script: |
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
const { data: merge_commit } = await github.rest.repos.getCommit({
owner: pr.base.repo.owner.login,
repo: pr.base.repo.name,
ref: '${{ inputs.commit_sha }}',
});
core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
# Usually, `END_SHA` should be the commit of the last previous workflow run of the **SAME** (scheduled) workflow.
# (This is why we don't need to specify `workflow_id` which would be fetched automatically in the python script.)
- name: Get `END_SHA` from previous CI runs of the same workflow
working-directory: /transformers/utils
if: ${{ env.process == 'true' && inputs.pr_number == '' }}
env:
ACCESS_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
run: |
echo "END_SHA=$(TOKEN="$ACCESS_TOKEN" python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
# However, for workflow runs triggered by `issue_comment` (for pull requests), we want to check against the
# parent commit (on `main`) of the `merge_commit` (dynamically created by GitHub). In this case, the goal is to
# see if a reported failing test is actually ONLY failing on the `merge_commit`.
- name: Set `END_SHA`
if: ${{ env.process == 'true' && inputs.pr_number != '' }}
env:
merge_commit_base_sha: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
run: |
echo "END_SHA=$merge_commit_base_sha" >> $GITHUB_ENV
- name: Checkout to `start_sha`
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.start_sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -166,10 +119,6 @@ jobs:
run: |
python3 utils/print_env.py
- name: Install pytest-flakefinder
if: ${{ env.process == 'true' }}
run: python3 -m pip install pytest-flakefinder
- name: Show installed libraries and their versions
working-directory: /transformers
if: ${{ env.process == 'true' }}
@ -178,78 +127,37 @@ jobs:
- name: Check failed tests
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
job: ${{ inputs.job }}
run_idx: ${{ matrix.run_idx }}
run: python3 utils/check_bad_commit.py --start_commit "$START_SHA" --end_commit "$END_SHA" --file "ci_results_${job}/new_failures.json" --output_file "new_failures_with_bad_commit_${job}_${run_idx}.json"
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_${{ inputs.job }}/new_failures.json --output_file new_failures_with_bad_commit.json
- name: Show results
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
job: ${{ inputs.job }}
run_idx: ${{ matrix.run_idx }}
run: |
ls -l "new_failures_with_bad_commit_${job}_${run_idx}.json"
cat "new_failures_with_bad_commit_${job}_${run_idx}.json"
ls -l new_failures_with_bad_commit.json
cat new_failures_with_bad_commit.json
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
process_new_failures_with_commit_info:
name: "process bad commit reports"
needs: check_new_failures
if: needs.check_new_failures.outputs.process == 'true'
runs-on:
group: aws-g5-4xlarge-cache
outputs:
report: ${{ steps.set_output.outputs.report }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: /transformers/ci_results_${{ inputs.job }}
- uses: actions/download-artifact@v4
with:
pattern: new_failures_with_bad_commit_${{ inputs.job }}*
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}
merge-multiple: true
- name: Check files
- name: Checkout back
working-directory: /transformers
env:
job: ${{ inputs.job }}
if: ${{ env.process == 'true' }}
run: |
ls -la /transformers
ls -la "/transformers/new_failures_with_bad_commit_${job}"
# Currently, we only run with a single runner by using `run_idx: [1]`. We might try to run with multiple runners
# to further reduce the false positive caused by flaky tests, which requires further processing to merge reports.
- name: Merge files
shell: bash
working-directory: /transformers
env:
job: ${{ inputs.job }}
run: |
cp "/transformers/new_failures_with_bad_commit_${job}/new_failures_with_bad_commit_${job}_1.json" new_failures_with_bad_commit.json
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
git checkout ${{ inputs.start_sha }}
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
run: |
python3 utils/process_bad_commit_report.py
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
@ -262,40 +170,15 @@ jobs:
echo EOF
} >> "$GITHUB_ENV"
# The output is useful if a caller needs more processing, for example, we have a chain
# self-comment-ci.yml -> self-scheduled.yml -> this one (check_failed_tests.yml),
# and `self-comment-ci.yml` needs further processing before sending a GitHub comment to the pull request page.
- name: Show results & Set outputs
id: set_output
working-directory: /transformers
run: |
ls -l new_failures_with_bad_commit.json
cat new_failures_with_bad_commit.json
{
echo 'report<<EOF'
cat new_failures_with_bad_commit.json
echo '' # Force a newline
echo EOF
} >> "$GITHUB_OUTPUT"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: new_failures_with_bad_commit_${{ inputs.job }}
path: /transformers/new_failures_with_bad_commit.json
- name: Prepare Slack report title
working-directory: /transformers
env:
ci_event: ${{ inputs.ci_event }}
job: ${{ inputs.job }}
if: ${{ env.process == 'true' }}
run: |
pip install slack_sdk
echo "title=$(python3 -c 'import sys; import os; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = os.environ["ci_event"]; job = os.environ["job"]; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
echo "title=$(python3 -c 'import sys; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = "${{ inputs.ci_event }}"; job = "${{ inputs.job }}"; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
- name: Send processed report
if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
if: ${{ env.process == 'true' && !endsWith(env.REPORT_TEXT, '{}') }}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.

View File

@ -1,22 +0,0 @@
---
name: CodeQL Security Analysis
on:
push:
branches: ["main", "fix_security_issue_*"]
# pull_request:
# branches: ["main"]
workflow_dispatch:
jobs:
codeql:
name: CodeQL Analysis
uses: huggingface/security-workflows/.github/workflows/codeql-reusable.yml@main
permissions:
security-events: write
packages: read
actions: read
contents: read
with:
languages: '["actions"]'
queries: 'security-extended,security-and-quality'

View File

@ -16,6 +16,7 @@ env:
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:

View File

@ -39,9 +39,6 @@ on:
PR_MERGE_COMMIT_SHA:
description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_MERGE_COMMIT_BASE_SHA:
description: "The sha of the parent commit of the the merge commit on the target branch in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_BASE_SHA }}
PR_HEAD_COMMIT_DATE:
description: "The date of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
@ -77,7 +74,6 @@ jobs:
PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
PR_MERGE_COMMIT_BASE_SHA: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
@ -126,7 +122,6 @@ jobs:
core.setOutput('base_ref', pr.base.ref);
core.setOutput('head_sha', pr.head.sha);
core.setOutput('base_sha', pr.base.sha);
core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
core.setOutput('merge_commit_sha', pr.merge_commit_sha);
core.setOutput('pr', pr);
@ -147,21 +142,16 @@ jobs:
date: merge_commit.commit.committer.date
});
console.log('PR Info:', {
pr_info: pr
});
- name: Convert dates to timestamps
id: get_timestamps
env:
head_commit_date: ${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date: ${{ steps.pr_info.outputs.merge_commit_date }}
run: |
echo "$head_commit_date"
echo "$merge_commit_date"
head_commit_date=${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date=${{ steps.pr_info.outputs.merge_commit_date }}
echo $head_commit_date
echo $merge_commit_date
head_commit_timestamp=$(date -d "$head_commit_date" +%s)
merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
echo "$head_commit_timestamp"
echo "$merge_commit_timestamp"
echo $head_commit_timestamp
echo $merge_commit_timestamp
echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT

View File

@ -15,19 +15,13 @@ jobs:
steps:
- name: Get PR number
shell: bash
env:
issue_number: ${{ github.event.issue.number }}
is_pull_request_issue: ${{ github.event.issue.pull_request != null }}
pr_number: ${{ github.event.pull_request.number }}
is_pull_request: ${{ github.event.pull_request != null }}
event_number: ${{ github.event.number }}
run: |
if [[ "$issue_number" != "" && "$is_pull_request_issue" == "true" ]]; then
echo "PR_NUMBER=$issue_number" >> $GITHUB_ENV
elif [[ "$pr_number" != "" ]]; then
echo "PR_NUMBER=$pr_number" >> $GITHUB_ENV
elif [[ "$is_pull_request" == "true" ]]; then
echo "PR_NUMBER=$event_number" >> $GITHUB_ENV
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request.number }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
@ -35,8 +29,8 @@ jobs:
- name: Check PR number
shell: bash
run: |
echo "$PR_NUMBER"
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=$PR_NUMBER" >> "$GITHUB_OUTPUT"
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"

View File

@ -28,9 +28,6 @@ on:
report_repo_id:
required: false
type: string
pytest_marker:
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -41,6 +38,7 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -62,33 +60,25 @@ jobs:
steps:
- name: Echo input and matrix info
shell: bash
env:
folder_slices: ${{ inputs.folder_slices }}
matrix_folders: ${{ matrix.folders }}
slice_data: ${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}
run: |
echo "$folder_slices"
echo "$matrix_folders"
echo "$slice_data"
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
env:
matrix_folders_raw: ${{ matrix.folders }}
run: |
echo "$matrix_folders_raw"
matrix_folders="${matrix_folders_raw/'models/'/'models_'}"
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -123,17 +113,15 @@ jobs:
id: set_machine_type
working-directory: /transformers
shell: bash
env:
input_machine_type: ${{ inputs.machine_type }}
run: |
echo "$input_machine_type"
echo "${{ inputs.machine_type }}"
if [ "$input_machine_type" = "aws-g5-4xlarge-cache" ]; then
if [ "${{ inputs.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$input_machine_type" = "aws-g5-12xlarge-cache" ]; then
elif [ "${{ inputs.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$input_machine_type"
machine_type=${{ inputs.machine_type }}
fi
echo "$machine_type"
@ -142,21 +130,15 @@ jobs:
- name: Create report directory if it doesn't exist
shell: bash
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
mkdir -p "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
echo "dummy" > "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/dummy.txt"
ls -la "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: Run all tests on GPU
working-directory: /transformers
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
pytest_marker: ${{ inputs.pytest_marker }}
model: ${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports python3 -m pytest -rsfE -v -m '${pytest_marker}' --make-reports=${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports tests/${model}" test_outputs.txt
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
@ -167,25 +149,19 @@ jobs:
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/failures_short.txt"
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/captured_info.txt"
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
cp /transformers/test_outputs.txt "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
@ -196,7 +172,7 @@ jobs:
collated_reports:
name: Collated Reports
if: ${{ always() && inputs.runner_type != '' }}
if: ${{ always() }}
needs: run_models_gpu
uses: huggingface/transformers/.github/workflows/collated-reports.yml@main
with:

View File

@ -26,6 +26,7 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -98,7 +98,7 @@ jobs:
commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
package: transformers
languages: ar de en es fr hi it ja ko pt zh
languages: ar de en es fr hi it ko pt tr zh ja te
update_run_status:
name: Update Check Run Status

View File

@ -1,4 +1,4 @@
name: PR slow CI - Suggestion
name: PR slow CI
on:
pull_request_target:
types: [opened, synchronize, reopened]
@ -23,28 +23,11 @@ jobs:
outputs:
jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
steps:
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
# We need to use `${{ ... }}` here to avoid `Argument list too long` error when a PR changes a lot of files.
# (We could also try to use artifact approach, but it's more involved).
# `CodeQL` doesn't identify any security issue here. Also `PR_FILES` is from `get-pr-info.yml` by using an api
# `github.rest.pulls.listFiles`, which is fine.
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Get repository content
id: repo_content
uses: actions/github-script@v6
with:
script: |
const fs = require('node:fs');
const { data: tests_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
@ -66,10 +49,38 @@ jobs:
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
// Write to files instead of outputs
fs.writeFileSync('tests_dir.txt', JSON.stringify(tests_dir, null, 2));
fs.writeFileSync('tests_models_dir.txt', JSON.stringify(tests_models_dir, null, 2));
fs.writeFileSync('tests_quantization_dir.txt', JSON.stringify(tests_quantization_dir, null, 2));
core.setOutput('tests_dir', tests_dir);
core.setOutput('tests_models_dir', tests_models_dir);
core.setOutput('tests_quantization_dir', tests_quantization_dir);
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Write tests_dir file
run: |
cat > tests_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_dir }}
EOF
- name: Write tests_models_dir file
run: |
cat > tests_models_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_models_dir }}
EOF
- name: Write tests_quantization_dir file
run: |
cat > tests_quantization_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_quantization_dir }}
EOF
- name: Run script to get jobs to run
id: get_jobs

View File

@ -149,9 +149,9 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-push"
docker: huggingface/transformers-all-latest-gpu:flash-attn
docker: huggingface/transformers-all-latest-gpu
ci_event: push
report_repo_id: hf-internal-testing/transformers_ci_push
commit_sha: ${{ github.sha }}
subdirs: ${{ needs.get_modified_models.outputs.matrix }}
models: ${{ needs.get_modified_models.outputs.matrix }}
secrets: inherit

View File

@ -20,37 +20,66 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
uses: ./.github/workflows/get-pr-number.yml
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
get-pr-info:
name: Get PR commit SHA
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"
get-sha:
runs-on: ubuntu-22.04
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
check-timestamps:
name: Check timestamps (security check)
runs-on: ubuntu-22.04
needs: get-pr-info
outputs:
PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
PR_MERGE_SHA: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_HEAD_SHA: ${{ steps.get_sha.outputs.PR_HEAD_SHA }}
PR_MERGE_SHA: ${{ steps.get_sha.outputs.PR_MERGE_SHA }}
steps:
- name: Verify `merge_commit` timestamp is older than the issue comment timestamp
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
- name: Get SHA (and verify timestamps against the issue comment date)
id: get_sha
env:
PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
run: |
git fetch origin refs/pull/$PR_NUMBER/head:refs/remotes/pull/$PR_NUMBER/head
git checkout refs/remotes/pull/$PR_NUMBER/head
echo "PR_HEAD_SHA: $(git log -1 --format=%H)"
echo "PR_HEAD_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
git fetch origin refs/pull/$PR_NUMBER/merge:refs/remotes/pull/$PR_NUMBER/merge
git checkout refs/remotes/pull/$PR_NUMBER/merge
echo "PR_MERGE_SHA: $(git log -1 --format=%H)"
echo "PR_MERGE_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
PR_MERGE_COMMIT_TIMESTAMP=$(git log -1 --date=unix --format=%cd)
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
@ -59,10 +88,13 @@ jobs:
exit -1;
fi
# use a python script to handle this complex logic.
# use a python script to handle this complex logic
# case 1: `run-slow` (auto. infer with limited number of models, but in particular, new model)
# case 2: `run-slow model_1, model_2`
get-tests:
runs-on: ubuntu-22.04
needs: [get-pr-number, check-timestamps]
needs: [get-pr-number, get-sha]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
outputs:
models: ${{ steps.models_to_run.outputs.models }}
quantizations: ${{ steps.models_to_run.outputs.quantizations }}
@ -70,11 +102,11 @@ jobs:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
@ -95,33 +127,11 @@ jobs:
- name: Show models to test
id: models_to_run
run: |
echo "$models"
echo "models=$models" >> $GITHUB_OUTPUT
echo "$quantizations"
echo "quantizations=$quantizations" >> $GITHUB_OUTPUT
# Report back if we are not able to get the tests (for example, security check is failing)
report_error_earlier:
name: Report error earlier
if: ${{ always() && needs.get-pr-info.result == 'success' && needs.get-tests.result != 'success' }}
needs: [get-pr-number, get-pr-info, get-tests]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/issues/${pr_number}/comments" \
-f body="💔 This comment contains \`run-slow\`, but unknown error occurred and [the workflow run]($GITHUB_RUN_URL) aborted!"
echo "${{ env.models }}"
echo "models=${{ env.models }}" >> $GITHUB_ENV
echo "models=${{ env.models }}" >> $GITHUB_OUTPUT
echo "${{ env.quantizations }}"
echo "quantizations=${{ env.quantizations }}" >> $GITHUB_OUTPUT
reply_to_comment:
name: Reply to the comment
@ -134,20 +144,20 @@ jobs:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BODY: '\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}'
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
MODELS: ${{ needs.get-tests.outputs.models }}
BODY: "\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}"
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/issues/${pr_number}/comments" \
-f body="This comment contains \`run-slow\`, running the specified jobs: $(echo -e "$BODY")"
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=This comment contains run-slow, running the specified jobs: ${{ env.BODY }} ..."
create_run:
name: Create run
needs: [check-timestamps, reply_to_comment]
if: ${{ needs.get-tests.outputs.models != '[]' || needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-sha, get-tests, reply_to_comment]
permissions:
statuses: write
runs-on: ubuntu-22.04
@ -159,196 +169,248 @@ jobs:
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/statuses/${pr_head_sha}" \
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Slow CI job" -f "context=pytest/custom-tests"
model-ci:
name: Model CI
run_models_gpu:
name: Run all tests for the model
if: ${{ needs.get-tests.outputs.models != '[]' }}
uses: ./.github/workflows/self-scheduled.yml
needs: [get-pr-number, check-timestamps, get-tests, create_run]
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-pr"
docker: huggingface/transformers-all-latest-gpu
ci_event: PR Comment CI
report_repo_id: hf-internal-testing/transformers_pr_ci
commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
subdirs: ${{ needs.get-tests.outputs.models }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
secrets: inherit
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.models) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ matrix.folders }}"
quantization-ci:
name: Quantization CI
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: |
export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})"
echo $CUDA_VISIBLE_DEVICES
python3 -m pytest -v -rsfE --make-reports=${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
run_quantization_torch_gpu:
name: Run all tests for a quantization
if: ${{ needs.get-tests.outputs.quantizations != '[]' }}
uses: ./.github/workflows/self-scheduled.yml
needs: [get-pr-number, check-timestamps, get-tests, create_run]
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-pr"
docker: huggingface/transformers-quantization-latest-gpu
ci_event: PR Comment CI
report_repo_id: hf-internal-testing/transformers_pr_ci
commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
subdirs: ${{ needs.get-tests.outputs.quantizations }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
secrets: inherit
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.quantizations) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-quantization-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'quantization/'/'quantization_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
report:
name: Check & Report
needs: [get-pr-number, check-timestamps, create_run, model-ci, quantization-ci]
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run quantization tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports
update_run_status:
name: Update Check Run Status
needs: [get-sha, create_run, run_models_gpu, run_quantization_torch_gpu]
permissions:
pull-requests: write
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.run_models_gpu.result) && contains(fromJSON('["skipped", "success"]'), needs.run_quantization_torch_gpu.result) }}
steps:
- name: Show reports from jobs
env:
MODEL_REPORT: ${{ needs.model-ci.outputs.report }}
QUANT_REPORT: ${{ needs.quantization-ci.outputs.report }}
- name: Get `run_models_gpu` job status
run: |
echo "$MODEL_REPORT"
echo "$QUANT_REPORT"
- name: Process and filter reports
env:
MODEL_REPORT: ${{ needs.model-ci.outputs.report }}
QUANT_REPORT: ${{ needs.quantization-ci.outputs.report }}
run: |
# Preprocess with Python
python3 << 'PYTHON_SCRIPT'
import json
import os
def filter_and_format_report(data):
"""
Filter out entries where commit is `None` (failing tests who status is not certain) and format as text
"""
lines = []
for model, model_result in data.items():
model_lines = []
for device, failures in model_result.items():
# Filter out None commits and extract just the test names
test_names = [
failure['test']
for failure in failures
if isinstance(failure, dict) and failure.get('commit') is not None
]
# Add tests to model lines
for idx, test_name in enumerate(test_names):
if idx == 0:
job_link = failures[idx]['job_link']
model_lines.append(f"- [{model}]({job_link}):")
model_lines.append(f" {test_name}")
# Only add model section if it has tests
if len(model_lines) > 0:
lines.extend(model_lines)
lines.append("") # Empty line between models
return "\n".join(lines).strip()
# Load and filter reports
model_report_str = os.environ.get('MODEL_REPORT', '{}')
quant_report_str = os.environ.get('QUANT_REPORT', '{}')
model_report = json.loads(model_report_str) if model_report_str else {}
quant_report = json.loads(quant_report_str) if quant_report_str else {}
formatted_model = filter_and_format_report(model_report)
formatted_quant = filter_and_format_report(quant_report)
# Write to files
with open('model_ci.txt', 'w') as f:
f.write(formatted_model)
if formatted_model:
f.write('\n')
with open('quantization_ci.txt', 'w') as f:
f.write(formatted_quant)
if formatted_quant:
f.write('\n')
PYTHON_SCRIPT
- name: Post results as PR comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
model_ci_result: ${{ needs.model-ci.result }}
quantization_ci_result: ${{ needs.quantization-ci.result }}
run: |
{
echo '## CI Results'
echo "[Workflow Run ⚙️]($GITHUB_RUN_URL)"
echo ''
# Check if both jobs were skipped or cancelled
if [[ "$model_ci_result" == "skipped" || "$model_ci_result" == "cancelled" ]] && \
[[ "$quantization_ci_result" == "skipped" || "$quantization_ci_result" == "cancelled" ]]; then
echo '⚠️ No test being reported (jobs are skipped or cancelled)!'
echo "STATUS=error" >> $GITHUB_ENV
# Check if either file has content
elif [ -s model_ci.txt ] || [ -s quantization_ci.txt ]; then
echo "STATUS=failure" >> $GITHUB_ENV
# Check if model_ci.txt has content
if [ -s model_ci.txt ]; then
echo '### Model CI Report'
echo ''
echo '#### ❌ Failed tests'
echo ''
cat model_ci.txt
echo ''
fi
# Check if quantization_ci.txt has content
if [ -s quantization_ci.txt ]; then
echo '### Quantization CI Report'
echo ''
echo '#### ❌ Failed tests'
echo ''
cat quantization_ci.txt
echo ''
fi
else
echo "STATUS=success" >> $GITHUB_ENV
echo '✅ No failing test specific to this PR 🎉 !'
fi
} > comment_body.txt
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/issues/${pr_number}/comments" \
-F body=@comment_body.txt
echo "${{ needs.run_models_gpu.result }}"
echo "${{ needs.run_quantization_torch_gpu.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
- name: Update PR commit statuses
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
# The env. variable `STATUS` used here is set in the previous step
run: |
echo "${{ needs.run_models_gpu.result }}"
echo "${{ env.STATUS }}"
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/statuses/${pr_head_sha}" \
-f "target_url=$GITHUB_RUN_URL" -f "state=$STATUS" -f "description=Slow CI job" -f "context=pytest/custom-tests"
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Slow CI job" -f "context=pytest/custom-tests"

View File

@ -51,7 +51,6 @@ jobs:
slack_report_channel: "#transformers-ci-past-future"
docker: huggingface/transformers-all-latest-torch-nightly-gpu
ci_event: Nightly CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
secrets: inherit

View File

@ -0,0 +1,25 @@
name: Self-hosted runner (AMD mi210 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi210
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi210
secrets: inherit

View File

@ -0,0 +1,25 @@
name: Self-hosted runner (AMD mi250 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi250
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi250
secrets: inherit

334
.github/workflows/self-push-amd.yml vendored Normal file
View File

@ -0,0 +1,334 @@
name: Self-hosted runner AMD GPU (push)
on:
workflow_call:
inputs:
gpu_flavor:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
check_runner_status:
name: Check Runner Status
runs-on: ubuntu-22.04
steps:
- name: Checkout transformers
uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Check Runner Status
run: python utils/check_self_hosted_runner.py --target_runners amd-mi210-single-gpu-ci-runner-docker --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
check_runners:
name: Check Runners
needs: check_runner_status
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
setup_gpu:
name: Setup
needs: check_runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
env:
# `CI_BRANCH_PUSH`: The branch name from the push event
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
# `CI_SHA_PUSH`: The commit SHA from the push event
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Fetch the tests to run
working-directory: /transformers
# TODO: add `git-python` in the docker images
run: |
pip install --upgrade git-python
python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v4
with:
name: test_fetched
path: /transformers/test_preparation.txt
- id: set-matrix
name: Organize tests into models
working-directory: /transformers
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "matrix=$keys" >> $GITHUB_OUTPUT
echo "test_map=$test_map" >> $GITHUB_OUTPUT
run_models_gpu:
name: Model tests
needs: setup_gpu
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup_gpu.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup_gpu.outputs.matrix) }}
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports ${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }} -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
needs: [
check_runner_status,
check_runners,
setup_gpu,
run_models_gpu,
# run_tests_torch_cuda_extensions_single_gpu,
# run_tests_torch_cuda_extensions_multi_gpu
]
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Runner availability: ${{ needs.check_runner_status.result }}"
echo "Setup status: ${{ needs.setup_gpu.result }}"
echo "Runner status: ${{ needs.check_runners.result }}"
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- uses: actions/checkout@v4
# To avoid failure when multiple commits are merged into `main` in a short period of time.
# Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ...
# (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit)
with:
fetch-depth: 20
- name: Update clone using environment variables
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- uses: actions/download-artifact@v4
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_ID_AMD: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: Push CI (AMD) - ${{ inputs.gpu_flavor }}
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
RUNNER_STATUS: ${{ needs.check_runner_status.result }}
RUNNER_ENV_STATUS: ${{ needs.check_runners.result }}
SETUP_STATUS: ${{ needs.setup_gpu.result }}
# We pass `needs.setup_gpu.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ needs.setup_gpu.outputs.matrix }}"

54
.github/workflows/self-push-caller.yml vendored Normal file
View File

@ -0,0 +1,54 @@
# Used to trigger self-push CI
name: Self-hosted runner (push-caller)
on:
push:
branches:
- main
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
check-for-setup:
runs-on: ubuntu-22.04
name: Check if setup was changed
outputs:
changed: ${{ steps.was_changed.outputs.changed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "2"
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c
- name: Was setup changed
id: was_changed
run: |
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
if [ `basename "${file}"` = "setup.py" ]; then
echo "changed=1" >> $GITHUB_OUTPUT
fi
done
build-docker-containers:
needs: check-for-setup
if: (github.event_name == 'push') && (needs.check-for-setup.outputs.changed == '1')
uses: ./.github/workflows/build-docker-images.yml
with:
image_postfix: "-push-ci"
secrets: inherit
run_push_ci:
name: Trigger Push CI
runs-on: ubuntu-22.04
if: ${{ always() }}
needs: build-docker-containers
steps:
- name: Trigger push CI via workflow_run
run: echo "Trigger push CI via workflow_run"

652
.github/workflows/self-push.yml vendored Normal file
View File

@ -0,0 +1,652 @@
name: Self-hosted runner (push)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- ci_*
- ci-*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
repository_dispatch:
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
name: Setup
strategy:
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
env:
# `CI_BRANCH_PUSH`: The branch name from the push event
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
# `CI_SHA_PUSH`: The commit SHA from the push event
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Fetch the tests to run
working-directory: /transformers
# TODO: add `git-python` in the docker images
run: |
pip install --upgrade git-python
python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v4
with:
name: test_fetched
path: /transformers/test_preparation.txt
- id: set-matrix
name: Organize tests into models
working-directory: /transformers
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "matrix=$keys" >> $GITHUB_OUTPUT
echo "test_map=$test_map" >> $GITHUB_OUTPUT
run_tests_single_gpu:
name: Model tests
needs: setup
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_multi_gpu:
name: Model tests
needs: setup
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
env:
MKL_SERVICE_FORCE_INTEL: 1
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_torch_cuda_extensions_single_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
run_tests_torch_cuda_extensions_multi_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
needs: [
setup,
run_tests_single_gpu,
run_tests_multi_gpu,
run_tests_torch_cuda_extensions_single_gpu,
run_tests_torch_cuda_extensions_multi_gpu
]
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Setup status: ${{ needs.setup.result }}"
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- uses: actions/checkout@v4
# To avoid failure when multiple commits are merged into `main` in a short period of time.
# Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ...
# (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit)
with:
fetch-depth: 20
- name: Update clone using environment variables
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- uses: actions/download-artifact@v4
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: push
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
SETUP_STATUS: ${{ needs.setup.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

View File

@ -2,7 +2,7 @@ name: Self-hosted runner (AMD scheduled CI caller)
on:
schedule:
- cron: "17 5 * * *"
- cron: "17 2 * * *"
jobs:
run_scheduled_amd_ci:

View File

@ -21,7 +21,7 @@ jobs:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/transformers-pytorch-amd-gpu
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
@ -33,7 +33,7 @@ jobs:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/transformers-pytorch-amd-gpu
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
@ -45,7 +45,7 @@ jobs:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/transformers-pytorch-amd-gpu
docker: huggingface/testing-rocm7.0-preview
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -33,13 +33,10 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Setup
env:
prev_workflow_run_id: ${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}
other_workflow_run_id: ${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}
run: |
mkdir "setup_values"
echo "$prev_workflow_run_id" > "setup_values/prev_workflow_run_id.txt"
echo "$other_workflow_run_id" > "setup_values/other_workflow_run_id.txt"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
@ -66,7 +63,7 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-torch"
docker: huggingface/transformers-all-latest-gpu
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
@ -121,15 +118,3 @@ jobs:
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
kernels-ci:
name: Kernels CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_kernels_gpu
slack_report_channel: "#transformers-ci-daily-kernels"
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit

View File

@ -1,60 +0,0 @@
name: Nvidia CI - Flash Attn
on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
push:
branches:
- run_nvidia_ci_flash_attn*
workflow_dispatch:
inputs:
prev_workflow_run_id:
description: 'previous workflow run id to compare'
type: string
required: false
default: ""
other_workflow_run_id:
description: 'other workflow run id to compare'
type: string
required: false
default: ""
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-flash-attn"
docker: huggingface/transformers-all-latest-gpu:flash-attn
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_flash_attn_ci
commit_sha: ${{ github.sha }}
pytest_marker: "flash_attn_test or flash_attn_3_test"
secrets: inherit

View File

@ -26,6 +26,7 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -34,20 +34,10 @@ on:
runner_type:
required: false
type: string
subdirs:
models:
default: ""
required: false
type: string
pytest_marker:
required: false
type: string
pr_number:
required: false
type: string
outputs:
report:
description: "Content of the report of new failures"
value: ${{ jobs.check_new_failures.outputs.report }}
env:
HF_HOME: /mnt/cache
@ -58,8 +48,10 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
NUM_SLICES: 2
jobs:
setup:
@ -80,11 +72,8 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin $commit_sha
git fetch && git checkout $commit_sha
git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Cleanup
working-directory: /transformers
@ -101,17 +90,11 @@ jobs:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: /transformers/tests
env:
job: ${{ inputs.job }}
subdirs: ${{ inputs.subdirs }}
NUM_SLICES: 2
run: |
if [ "$job" = "run_models_gpu" ]; then
python3 ../utils/split_model_tests.py --subdirs "$subdirs" --num_splits "$NUM_SLICES" > folder_slices.txt
echo "folder_slices=$(cat folder_slices.txt)" >> $GITHUB_OUTPUT
python3 -c "import ast; folder_slices = ast.literal_eval(open('folder_slices.txt').read()); open('slice_ids.txt', 'w').write(str(list(range(len(folder_slices)))))"
echo "slice_ids=$(cat slice_ids.txt)" >> $GITHUB_OUTPUT
elif [ "$job" = "run_trainer_and_fsdp_gpu" ]; then
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
@ -120,10 +103,8 @@ jobs:
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
name: Identify quantization method to test
working-directory: /transformers/tests
env:
subdirs: ${{ inputs.subdirs || 'None' }}
run: |
echo "quantization_matrix=$(python3 -c 'import ast; import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); subdirs = ast.literal_eval(os.environ["subdirs"]); quantization_tests = [x.removeprefix("quantization/") for x in subdirs] if subdirs is not None else quantization_tests; d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))); print(d)')" >> $GITHUB_OUTPUT
echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT
- name: NVIDIA-SMI
run: |
@ -147,7 +128,6 @@ jobs:
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
pytest_marker: ${{ inputs.pytest_marker }}
secrets: inherit
run_trainer_and_fsdp_gpu:
@ -181,14 +161,12 @@ jobs:
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
image: huggingface/transformers-pytorch-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -210,17 +188,15 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
echo "${{ matrix.machine_type }}"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
@ -229,12 +205,12 @@ jobs:
- name: Run all pipeline tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 1 -v --dist=loadfile --make-reports="${machine_type}_run_pipelines_torch_gpu_test_reports" tests/pipelines
python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat "/transformers/reports/${machine_type}_run_pipelines_torch_gpu_test_reports/failures_short.txt"
run: cat /transformers/reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
@ -258,9 +234,7 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -282,17 +256,15 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
echo "${{ matrix.machine_type }}"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
@ -302,12 +274,12 @@ jobs:
working-directory: /transformers
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports="${machine_type}_run_examples_gpu_test_reports" examples/pytorch
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_examples_gpu_test_reports examples/pytorch
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat "/transformers/reports/${machine_type}_run_examples_gpu_test_reports/failures_short.txt"
run: cat /transformers/reports/${{ env.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
@ -331,9 +303,7 @@ jobs:
steps:
- name: Update clone
working-directory: ${{ inputs.working-directory-prefix }}/transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: ${{ inputs.working-directory-prefix }}/transformers
@ -355,7 +325,7 @@ jobs:
working-directory: ${{ inputs.working-directory-prefix }}/
run: |
python3 -m pip uninstall -y deepspeed
DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
# To avoid unknown test failures
- name: Pre build DeepSpeed *again* (for nightly & Past CI)
@ -365,7 +335,7 @@ jobs:
python3 -m pip uninstall -y deepspeed
rm -rf DeepSpeed
git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
@ -383,17 +353,15 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: ${{ inputs.working-directory-prefix }}/transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
echo "${{ matrix.machine_type }}"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
@ -402,14 +370,12 @@ jobs:
- name: Run all tests on GPU
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: |
python3 -m pytest -v --make-reports="${machine_type}_run_torch_cuda_extensions_gpu_test_reports" tests/deepspeed tests/extended
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
env:
working_directory_prefix: ${{ inputs.working-directory-prefix }}
run: cat "${working_directory_prefix}/transformers/reports/${machine_type}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt"
run: cat ${{ inputs.working-directory-prefix }}/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
@ -436,19 +402,16 @@ jobs:
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
env:
matrix_folders_raw: ${{ matrix.folders }}
run: |
echo "$matrix_folders_raw"
matrix_folders="${matrix_folders_raw/'quantization/'/'quantization_'}"
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'quantization/'/'quantization_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -470,17 +433,15 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
echo "${{ matrix.machine_type }}"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
@ -488,96 +449,20 @@ jobs:
- name: Run quantization tests on GPU
working-directory: /transformers
env:
folders: ${{ matrix.folders }}
run: |
python3 -m pytest -v --make-reports="${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports" tests/${folders}
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat "/transformers/reports/${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports/failures_short.txt"
run: cat /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
run_kernels_gpu:
if: ${{ inputs.job == 'run_kernels_gpu' }}
name: Kernel tests
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .[testing]
- name: Install kernels
working-directory: /transformers
run: python3 -m pip install -U kernels
- name: NVIDIA-SMI
run: nvidia-smi
- name: Environment
working-directory: /transformers
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run kernel tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -v --make-reports="${machine_type}_run_kernels_gpu_test_reports" tests/kernels/test_kernels.py
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat "/transformers/reports/${machine_type}_run_kernels_gpu_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_kernels_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_kernels_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_kernels_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports
run_extract_warnings:
# Let's only do this for the job `run_models_gpu` to simplify the (already complex) logic.
@ -586,10 +471,11 @@ jobs:
runs-on: ubuntu-22.04
needs: [setup, run_models_gpu]
steps:
# Checkout in order to run `utils/extract_warnings.py`. Avoid **explicit** checkout (i.e. don't specify `ref`) for
# security reason.
- name: Checkout transformers
uses: actions/checkout@v4
with:
fetch-depth: 2
ref: ${{ inputs.commit_sha || github.sha }}
- name: Install transformers
run: pip install transformers
@ -609,12 +495,9 @@ jobs:
working-directory: warnings_in_ci
- name: Extract warnings in CI artifacts
env:
github_run_id: ${{ github.run_id }}
access_token: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
run: |
python3 utils/extract_warnings.py --workflow_run_id "$github_run_id" --output_dir warnings_in_ci --token "$access_token" --from_gh
echo "$(python3 -c 'import os; import json; fp = open("warnings_in_ci/selected_warnings.json"); d = json.load(fp); d = "\n".join(d); print(d)')"
python3 utils/extract_warnings.py --workflow_run_id ${{ github.run_id }} --output_dir warnings_in_ci --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }} --from_gh
echo "$(python3 -c 'import os; import json; fp = open("warnings_in_ci/selected_warnings.json"); d = json.load(fp); d = "\n".join(d) ;print(d)')"
- name: Upload artifact
if: ${{ always() }}
@ -633,7 +516,6 @@ jobs:
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_quantization_torch_gpu,
run_kernels_gpu,
run_extract_warnings
]
if: always() && !cancelled()
@ -653,17 +535,16 @@ jobs:
secrets: inherit
check_new_failures:
if: ${{ always() && needs.send_results.result == 'success' }}
if: ${{ always() && inputs.ci_event == 'Daily CI' && needs.send_results.result == 'success' }}
name: Check new failures
needs: send_results
uses: ./.github/workflows/check_failed_tests.yml
with:
docker: ${{ inputs.docker }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
start_sha: ${{ inputs.commit_sha || github.sha }}
job: ${{ inputs.job }}
slack_report_channel: ${{ inputs.slack_report_channel }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
pr_number: ${{ inputs.pr_number }}
secrets: inherit

View File

@ -41,16 +41,13 @@ jobs:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
env:
setup_status: ${{ inputs.setup_status }}
run: |
echo "Setup status: $setup_status"
echo "Setup status: ${{ inputs.setup_status }}"
- uses: actions/checkout@v4
with:
fetch-depth: 2
# Security: checkout to the `main` branch for untrusted triggers (issue_comment, pull_request_target), otherwise use the specified ref
ref: ${{ (github.event_name == 'issue_comment' || github.event_name == 'pull_request_target') && 'main' || (inputs.commit_sha || github.sha) }}
ref: ${{ inputs.commit_sha || github.sha }}
- uses: actions/download-artifact@v4
@ -84,8 +81,6 @@ jobs:
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
quantization_matrix: ${{ inputs.quantization_matrix }}
folder_slices: ${{ inputs.folder_slices }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
# For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an
@ -94,10 +89,10 @@ jobs:
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
if [ "$quantization_matrix" != "" ]; then
python utils/notification_service.py "$quantization_matrix"
if [ "${{ inputs.quantization_matrix }}" != "" ]; then
python utils/notification_service.py "${{ inputs.quantization_matrix }}"
else
python utils/notification_service.py "$folder_slices"
python utils/notification_service.py "${{ inputs.folder_slices }}"
fi
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.

View File

@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
runner_type:
description: 'Type of runner to test (a10)'
description: 'Type of runner to test (a10 or t4)'
required: true
docker_image:
description: 'Name of the Docker image'
@ -20,6 +20,7 @@ env:
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -32,14 +33,15 @@ jobs:
steps:
- name: Get runner to use
shell: bash
env:
NUM_GPUS: ${{ github.event.inputs.num_gpus }}
RUNNER_TYPE: ${{ github.event.inputs.runner_type }}
run: |
if [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache-ssh" >> $GITHUB_ENV
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache-ssh" >> $GITHUB_ENV
if [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
echo "RUNNER=aws-g4dn-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
echo "RUNNER=aws-g4dn-12xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache" >> $GITHUB_ENV
else
echo "RUNNER=" >> $GITHUB_ENV
fi
@ -47,8 +49,8 @@ jobs:
- name: Set runner to use
id: set_runner
run: |
echo "$RUNNER"
echo "RUNNER=$RUNNER" >> $GITHUB_OUTPUT
echo ${{ env.RUNNER }}
echo "RUNNER=${{ env.RUNNER }}" >> $GITHUB_OUTPUT
ssh_runner:
name: "SSH"
@ -57,13 +59,13 @@ jobs:
group: ${{ needs.get_runner.outputs.RUNNER }}
container:
image: ${{ github.event.inputs.docker_image }}
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ github.sha }}
run: |
git fetch && git checkout "$commit_sha"
git fetch && git checkout ${{ github.sha }}
- name: Cleanup
working-directory: /transformers
@ -83,11 +85,9 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
GITHUB_ACTOR: ${{ github.actor }}
run: |
echo "$GITHUB_ACTOR"
github_actor=$GITHUB_ACTOR
echo "${{ github.actor }}"
github_actor=${{ github.actor }}
github_actor=${github_actor/'-'/'_'}
echo "$github_actor"
echo "github_actor=$github_actor" >> $GITHUB_ENV
@ -95,17 +95,14 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
user_slack_id: ${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}
default_slack_channel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
run: |
echo "$github_actor"
if [ "$user_slack_id" != "" ]; then
echo "SLACKCHANNEL=$user_slack_id" >> $GITHUB_ENV
echo "${{ env.github_actor }}"
if [ "${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}" != "" ]; then
echo "SLACKCHANNEL=${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}" >> $GITHUB_ENV
else
echo "SLACKCHANNEL=$default_slack_channel" >> $GITHUB_ENV
echo "SLACKCHANNEL=${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}" >> $GITHUB_ENV
fi
- name: Tailscale # In order to be able to SSH when a test fails
uses: huggingface/tailscale-action@main
with:

4
.gitignore vendored
View File

@ -98,7 +98,6 @@ celerybeat-schedule
# Environments
.env
.venv
.venv*
env/
venv/
ENV/
@ -172,6 +171,3 @@ tags
# modular conversion
*.modular_backup
# Cursor IDE files
.cursor/

View File

@ -14,7 +14,7 @@ This AGENTS.md file provides guidance for code agents working with this codebase
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e ".[quality]"`. You can then run `make fixup` to apply style and consistency fixes to your code.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
@ -36,4 +36,4 @@ After making changes, you should usually run `make fixup` to ensure any copies a
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e ".[testing]"`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -112,125 +112,7 @@ New models are constantly released and if you want to implement a new model, ple
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/modular_transformers).
### Vision-Language Model Contribution Checklist
If you're contributing a **vision-language model** (or any multimodal model that processes images/videos), please follow this checklist. Maintainers will use this to review your PR, and completing these steps will significantly increase the likelihood of your PR being merged quickly.
**Required checklist for all vision-language model contributions:**
**1. Implement a modular file**
All new models should use the modular architecture pattern. Create a `modular_<model_name>.py` file using the modular model converter:
- Use the CLI, [`transformers add-new-model-like`](https://github.com/huggingface/transformers/blob/main/src/transformers/cli/add_new_model_like.py) to generate a modular skeleton and get started
- All code should be in the modular file if possible. Modeling must be in it, it's better if configuration is in it as well.
- Reuse existing patterns from similar models as much as possible
To verify your modular file is correct, run:
```bash
python utils/modular_model_converter.py <model_name>
```
This will generate the separate files (`modeling_*.py`, `configuration_*.py`, etc.) from your modular file. The CI will enforce that these generated files match your modular file.
**2. Add a fast image processor (for image models)**
If your model processes images, implement a fast image processor that uses `torch` and `torchvision` instead of PIL/numpy for better inference performance:
- See the detailed guide in [#36978](https://github.com/huggingface/transformers/issues/36978)
- Fast processors inherit from `BaseImageProcessorFast`
- Examples: `LlavaOnevisionImageProcessorFast`, `Idefics2ImageProcessorFast`
**3. Create a weight conversion script**
Add a `convert_<model_name>_to_hf.py` script that converts the original model weights to the HuggingFace format:
- Script should handle checkpoint loading, key mapping, and saving in HF format
- Include usage examples and documentation in the script
- Examples: [`convert_llava_onevision_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py), [`convert_idefics2_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/idefics2/convert_idefics2_weights_to_hf.py)
**4. Add integration tests with exact output matching**
At minimum, add an `IntegrationTest` class that tests end-to-end generation (processing and modelling) with **exact** output matching:
- For generative models: test that generated text matches expected output exactly
- For non-generative models: test that output logits match expected values
- Tests should use real checkpoints (load in 4-bit or half precision if the checkpoint is too big to fit in our CI runners) and real inputs
- Example pattern:
```python
class MyModelIntegrationTest(unittest.TestCase):
@slow
def test_model_integration(self):
model = MyModelForConditionalGeneration.from_pretrained("org/model-name")
processor = AutoProcessor.from_pretrained("org/model-name")
inputs = processor(images=image, text=prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)
EXPECTED_TEXT = "exact expected output"
self.assertEqual(processor.decode(output[0]), EXPECTED_TEXT)
```
See `tests/models/llava_onevision/test_modeling_llava_onevision.py` for complete examples.
**5. Update documentation**
Add or update model documentation:
- Create if the cli hasn't `docs/source/en/model_doc/<model_name>.md` with usage examples
- Include model description, paper link, and basic usage with `Pipeline` and `AutoModel`
- Add the model to the appropriate TOC files
**6. Look for reusable patterns**
The library has 400+ models with many established patterns:
- Search for similar models (e.g., other vision-language models)
- Reuse attention mechanisms, layer implementations, and processing patterns
- Check models like LLaVA, Idefics2, Fuyu for vision-language patterns
- Use provided decorators like (`auto_docstring`, `can_return_tuple`, `check_model_inputs` and `_can_record_outputs`) where relevant.
- Don't reinvent the wheel
**7. Run quality checks and read the output**
Before submitting your PR, install quality dependencies and run the full check suite:
```bash
pip install -e ".[quality]"
make fixup
```
**Important**: Take time to read the output of `make fixup`. It will:
- Lint and format your code automatically
- Run consistency checks (imports, docstrings, etc.)
- Show any remaining issues that need manual fixes
All checks must pass before your PR can be merged.
**If this checklist is complete, your PR has a very high likelihood of being merged!** Following these steps makes the maintainers' work much easier and will reduce the number of review iterations, getting your important work out there faster.
#### Copy-pastable checklist for maintainers
Here's a condensed version maintainers can copy into PRs:
```markdown
## Multimodal Model Addition Checklist
Please ensure your PR completes all following items. See the [full checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#vision-language-model-contribution-checklist) for details.
- [ ] **Modular file**: `modular_<model_name>.py` implemented and verified with `python utils/modular_model_converter.py <model_name>`
- [ ] **Fast image processor**: Implemented using `BaseImageProcessorFast` (see [#36978](https://github.com/huggingface/transformers/issues/36978))
- [ ] **Conversion script**: `convert_<model_name>_to_hf.py` added with usage examples
- [ ] **Integration tests**: End-to-end tests with exact output matching (text or logits)
- [ ] **Documentation**: Model docs added/updated in `docs/source/en/model_doc/`
- [ ] **Pattern reuse**: Verified against similar models (LLaVA, Idefics2, etc.)
- [ ] **Quality checks**: `make fixup` passes with no errors
```
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/add_new_model).
## Do you want to add documentation?

View File

@ -153,7 +153,7 @@ You are not required to read the following guidelines before opening an issue. H
cd examples/seq2seq
torchrun --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir \
--output_dir output_dir --overwrite_output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \

View File

@ -64,8 +64,8 @@ limitations under the License.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer
vision, audio, video, and multimodal models, for both inference and training.
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
@ -81,7 +81,7 @@ Explore the [Hub](https://huggingface.com/) today to find a model and use Transf
## Installation
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.

View File

@ -9,12 +9,6 @@ In this list, we showcase incredibly impactful and novel projects that have push
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [◉ Universal Intelligence](https://github.com/blueraai/universal-intelligence)
[Universal Intelligence](https://github.com/blueraai/universal-intelligence) aims to standardize models, tools, and agents —transforming them into simple, composable, portable, interoperable, framework-agnostic, hardware-agnostic interfaces (through auto-negotiation and resource sharing); for fast and accessible development of AI applications.
Keywords: Protocol, Open-source, LLMs, Large Language Models, Agents, Low-code
## [gpt4all](https://github.com/nomic-ai/gpt4all)
[gpt4all](https://github.com/nomic-ai/gpt4all) is an ecosystem of open-source chatbots trained on massive collections of clean assistant data including code, stories and dialogue. It offers open-source, large language models such as LLaMA and GPT-J trained in an assistant-style.

View File

@ -16,6 +16,7 @@ import sys
from logging import Logger
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
# Add the parent directory to Python path to import benchmarks_entrypoint
@ -41,7 +42,7 @@ except ImportError:
GenerationConfig = None
StaticCache = None
os.environ["HF_XET_HIGH_PERFORMANCE"] = "1"
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
# Only set torch precision if torch is available
@ -144,7 +145,7 @@ def run_benchmark(
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: int | None = None):
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
@ -154,7 +155,7 @@ def run_benchmark(
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: int | None = None):
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
probs = logits_to_probs(logits[0, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs

View File

@ -1,5 +1,6 @@
gpustat==1.1.1
psutil==6.0.0
psycopg2==2.9.9
hf_xet
pandas>=1.5.0
torch>=2.4.0
hf_transfer
pandas>=1.5.0

View File

@ -1,2 +1 @@
benchmark_results/
benchmark_results_profiles/
benchmark_results/

View File

@ -0,0 +1 @@
# Benchmark implementations directory

View File

@ -0,0 +1,165 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
from typing import Any
import torch
from benchmark_framework import ModelBenchmark
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
class LLaMABenchmark(ModelBenchmark):
"""Simplified LLaMA model benchmark implementation using the ModelBenchmark base class."""
def __init__(self, logger: logging.Logger):
super().__init__(logger)
self._default_prompt = "Why dogs are so cute?" # Custom prompt for LLaMA
def get_scenario_configs(self) -> list[dict[str, Any]]:
"""
Get LLaMA-specific scenario configurations.
Returns:
List of scenario configuration dictionaries
"""
return [
# Eager variants
{"variant": "eager", "compile_mode": None, "use_cache": True, "description": "Eager execution with cache"},
# Compiled variants
{
"variant": "compiled",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Compiled with max autotune",
},
# Kernelized variant (if available)
{
"variant": "kernelized",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Kernelized execution",
},
]
def _is_kernelization_available(self) -> bool:
"""Check if kernelization is available for LLaMA."""
try:
from kernels import Mode, kernelize # noqa: F401
return True
except ImportError:
self.logger.debug("Kernelization not available: kernels module not found")
return False
def get_default_generation_config(self) -> dict[str, Any]:
"""Get LLaMA-specific generation configuration."""
return {
"do_sample": False,
"top_p": 1.0,
"temperature": 1.0,
"repetition_penalty": 1.0,
"max_new_tokens": None, # Will be set per scenario
}
def get_model_init_kwargs(self, config) -> dict[str, Any]:
"""Get LLaMA-specific model initialization kwargs."""
return {
"torch_dtype": getattr(torch, config.torch_dtype),
"attn_implementation": config.attn_implementation,
"use_cache": True,
}
def get_default_torch_dtype(self) -> str:
"""Get default torch dtype for LLaMA."""
return "float16" # LLaMA works well with float16
def get_default_device(self) -> str:
"""Get default device for LLaMA."""
return "cuda" # LLaMA prefers CUDA
def run_llama(logger, output_dir, **kwargs):
"""
Run LLaMA benchmark with the given configuration.
Args:
logger: Logger instance
output_dir: Output directory for results
**kwargs: Additional configuration options
Returns:
Path to output file if successful
"""
from benchmark_framework import BenchmarkRunner
# Extract parameters with defaults
model_id = kwargs.get("model_id", "meta-llama/Llama-2-7b-hf")
warmup_iterations = kwargs.get("warmup_iterations", 3)
measurement_iterations = kwargs.get("measurement_iterations", 5)
num_tokens_to_generate = kwargs.get("num_tokens_to_generate", 100)
include_sdpa_variants = kwargs.get("include_sdpa_variants", True)
device = kwargs.get("device", "cuda")
torch_dtype = kwargs.get("torch_dtype", "float16")
batch_size = kwargs.get("batch_size", 1)
commit_id = kwargs.get("commit_id")
logger.info(f"Starting LLaMA benchmark for model: {model_id}")
logger.info(
f"Configuration: warmup={warmup_iterations}, measurement={measurement_iterations}, tokens={num_tokens_to_generate}"
)
try:
# Create benchmark instance
benchmark = LLaMABenchmark(logger)
# Create scenarios
scenarios = benchmark.create_scenarios(
model_id=model_id,
warmup_iterations=warmup_iterations,
measurement_iterations=measurement_iterations,
num_tokens_to_generate=num_tokens_to_generate,
include_sdpa_variants=include_sdpa_variants,
device=device,
torch_dtype=torch_dtype,
batch_size=batch_size,
)
logger.info(f"Created {len(scenarios)} benchmark scenarios")
# Create runner and execute benchmarks
runner = BenchmarkRunner(logger, output_dir)
results = runner.run_benchmark(benchmark, scenarios, commit_id=commit_id)
if not results:
logger.warning("No successful benchmark results")
return None
# Save results
model_name = model_id.split("/")[-1] # Extract model name from ID
output_file = runner.save_results(model_name, results)
logger.info(f"LLaMA benchmark completed successfully. Results saved to: {output_file}")
return output_file
except Exception as e:
logger.error(f"LLaMA benchmark failed: {e}")
import traceback
logger.debug(traceback.format_exc())
raise

File diff suppressed because it is too large Load Diff

View File

@ -1,242 +0,0 @@
import hashlib
import itertools
import json
import logging
from typing import Any
from transformers.utils.import_utils import is_flash_attn_2_available
KERNELIZATION_AVAILABLE = False
try:
from kernels import Mode, kernelize # noqa: F401
KERNELIZATION_AVAILABLE = True
except ImportError:
pass
logger = logging.getLogger(__name__)
class BenchmarkConfig:
"""Configuration for a single benchmark scenario."""
all_attn_implementations = [
("flash_attention_2", None),
("eager", None),
("sdpa", "math"),
("sdpa", "flash_attention"),
("flex_attention", None),
]
all_compiled_modes = [None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"]
def __init__(
self,
warmup_iterations: int = 5,
measurement_iterations: int = 20,
gpu_monitoring: bool = True, # NOTE: you may want to disable this at times as we have obsvered it could heavily slow down benchmarks on AMD
continuous_batching: bool = False,
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
attn_implementation: str = "eager",
sdpa_backend: str | None = None,
compile_mode: str | None = None,
compile_options: dict[str, Any] | None = None,
kernelize: bool = False,
name: str | None = None,
skip_validity_check: bool = False,
) -> None:
# Benchmark parameters
self.warmup_iterations = warmup_iterations
self.measurement_iterations = measurement_iterations
self.gpu_monitoring = gpu_monitoring
self.continuous_batching = continuous_batching
# Input parameters
self.batch_size = batch_size
self.sequence_length = sequence_length
self.num_tokens_to_generate = num_tokens_to_generate
# Generation parameters
self.attn_implementation = attn_implementation
self.sdpa_backend = sdpa_backend
# Optimization parameters
self.compile_mode = compile_mode
self.compile_options = compile_options if compile_options is not None else {}
self.kernelize = kernelize
# Constant parameters
self.dtype = "torch.bfloat16"
self.device = "cuda"
self.check_validity(skip_validity_check)
self.name = name if name is not None else self.infer_name()
def check_validity(self, skip_validity_check: bool = False) -> None:
if skip_validity_check:
return
# Check FA is installed
if self.attn_implementation == "flash_attention_2" and not is_flash_attn_2_available():
logger.warning(
"Flash attention does not support compile mode. Defaulting to SDPA w/ flash attention backend."
)
self.attn_implementation = "sdpa"
self.sdpa_backend = "flash_attention"
# Flash attention does not support compile mode, so we turn it off # FIXME: it would be better to support it
is_fa = self.attn_implementation == "flash_attention_2"
is_fa |= self.attn_implementation == "sdpa" and self.sdpa_backend == "flash_attention"
if is_fa:
logger.warning("Flash attention does not support compile mode. Turning off compile mode.")
self.compile_mode = None
# Handle SDPA backend if not determined by the config (needs to be done before skipping duplicates)
if self.attn_implementation == "sdpa" and self.sdpa_backend is None:
default_backend = "flash_attention" # FIXME: torch has a _cur_sdpa_kernel_backends but it fails
logger.warning(f"No SDPA backend provided, using {default_backend} instead.")
self.sdpa_backend = default_backend
if self.continuous_batching:
if self.attn_implementation == "flex_attention":
logger.error(
"disabling continuous batching because of invalid configuration: flex attention is not supported"
)
self.continuous_batching = False
elif self.attn_implementation == "sdpa" and self.sdpa_backend is not None:
logger.warning(
"when continuous batching is enabled, sdpa_backend must be None because of the attention mask, setting it to None"
)
self.sdpa_backend = "math"
@property
def hash(self) -> str:
return hashlib.sha256(json.dumps(self.to_dict()).encode()).hexdigest()
def infer_name(self, compact: bool = True) -> str:
"""Infer a human-readable name for the benchmark config, either compact or verbose."""
if compact:
iter_str = f"w{self.warmup_iterations}_i{self.measurement_iterations}"
gpu_monitor_str = "monitored" if self.gpu_monitoring else "unmonitored"
dimensions_str = f"b{self.batch_size}_s{self.sequence_length}_n{self.num_tokens_to_generate}"
attn_code = self.attn_implementation
attn_code += f"_{self.sdpa_backend}" if self.attn_implementation == "sdpa" else ""
compile_str = f"compiled_{self.compile_mode}" if self.compile_mode is not None else "uncompiled"
kernelize_str = "kernelized" if self.kernelize else "unkernelized"
continuous_batching_str = "cb" if self.continuous_batching else "generate"
sep = "-"
else:
iter_str = f"{self.warmup_iterations} warmup, {self.measurement_iterations} iterations"
gpu_monitor_str = ("with" if self.gpu_monitoring else "no") + " GPU monitoring"
dimensions_str = f"batch size {self.batch_size}, sequence length {self.sequence_length}, {self.num_tokens_to_generate} generated tokens"
attn_code = f"{self.attn_implementation} attention"
attn_code += f" with {self.sdpa_backend} backend" if self.attn_implementation == "sdpa" else ""
compile_str = "compiled" if self.compile_mode is not None else "not compiled"
kernelize_str = "kernelized" if self.kernelize else "not kernelized"
continuous_batching_str = "continuous batching" if self.continuous_batching else "regular generate"
sep = ", "
return sep.join(
[iter_str, gpu_monitor_str, dimensions_str, attn_code, compile_str, kernelize_str, continuous_batching_str]
)
def to_dict(self) -> dict[str, Any]:
return {
"name": self.name,
"warmup_iterations": self.warmup_iterations,
"measurement_iterations": self.measurement_iterations,
"gpu_monitoring": self.gpu_monitoring,
"continuous_batching": self.continuous_batching,
"batch_size": self.batch_size,
"sequence_length": self.sequence_length,
"num_tokens_to_generate": self.num_tokens_to_generate,
"attn_implementation": self.attn_implementation,
"sdpa_backend": self.sdpa_backend,
"compile_mode": self.compile_mode,
"compile_options": self.compile_options | {}, # to avoid inplace modification of the original dict
"kernelize": self.kernelize,
}
@classmethod
def from_dict(cls, data: dict[str, Any], skip_validity_check: bool = False) -> "BenchmarkConfig":
return cls(
warmup_iterations=data.get("warmup_iterations", 5),
measurement_iterations=data.get("measurement_iterations", 20),
gpu_monitoring=data.get("gpu_monitoring", False),
continuous_batching=data.get("continuous_batching", False),
batch_size=data.get("batch_size", 1),
sequence_length=data.get("sequence_length", 128),
num_tokens_to_generate=data.get("num_tokens_to_generate", 128),
attn_implementation=data.get("attn_implementation", "eager"),
sdpa_backend=data.get("sdpa_backend"),
compile_mode=data.get("compile_mode"),
compile_options=data.get("compile_options"),
kernelize=data.get("kernelize", False),
name=data.get("name"),
skip_validity_check=skip_validity_check,
)
def adapt_configs(
configs: list[BenchmarkConfig],
warmup_iterations: int | list[int] = 5,
measurement_iterations: int | list[int] = 20,
batch_size: int | list[int] = 1,
sequence_length: int | list[int] = 128,
num_tokens_to_generate: int | list[int] = 128,
gpu_monitoring: bool | list[bool] = True,
) -> list[BenchmarkConfig]:
parameters = (
x if isinstance(x, list) else [x]
for x in [
warmup_iterations,
measurement_iterations,
batch_size,
sequence_length,
num_tokens_to_generate,
gpu_monitoring,
]
)
iterator = itertools.product(*parameters)
adapted_configs = []
for warmup_iters, measurement_iters, bs, seqlen, ntok, monitor in iterator:
for config in configs:
config = config.to_dict()
config["warmup_iterations"] = warmup_iters
config["measurement_iterations"] = measurement_iters
config["batch_size"] = bs
config["sequence_length"] = seqlen
config["num_tokens_to_generate"] = ntok
config["gpu_monitoring"] = monitor
adapted_configs.append(BenchmarkConfig.from_dict(config))
return adapted_configs
def get_config_by_level(level: int) -> list[BenchmarkConfig]:
configs = []
# Early return if level is greater than 3: we generate all combinations of configs, maybe even w/ all compile modes
if level >= 3:
for attn_implementation, sdpa_backend in BenchmarkConfig.all_attn_implementations:
# Usually there is not much to gain by compiling with other modes, but we allow it for level 4
compile_modes = BenchmarkConfig.all_compiled_modes if level >= 4 else [None, "default"]
for cm in compile_modes:
for kernelize_on in {False, KERNELIZATION_AVAILABLE}:
for cb_on in [False, True]:
configs.append(
BenchmarkConfig(
attn_implementation=attn_implementation,
sdpa_backend=sdpa_backend,
compile_mode=cm,
kernelize=kernelize_on,
continuous_batching=cb_on,
)
)
return configs
# Otherwise, we add the configs for the given level
if level >= 0:
configs.append(BenchmarkConfig(attn_implementation="flex_attention", compile_mode="default"))
if level >= 1:
configs.append(BenchmarkConfig(attn_implementation="flash_attention_2"))
configs.append(BenchmarkConfig(attn_implementation="eager", compile_mode="default"))
configs.append(BenchmarkConfig(attn_implementation="flash_attention_2", continuous_batching=True))
if level >= 2:
configs.append(BenchmarkConfig(attn_implementation="sdpa", compile_mode="default"))
configs.append(BenchmarkConfig(attn_implementation="flex_attention", compile_mode="default", kernelize=True))
configs.append(BenchmarkConfig(attn_implementation="flash_attention_2", kernelize=True))
configs.append(BenchmarkConfig(attn_implementation="paged|sdpa", continuous_batching=True))
return configs

View File

@ -1,507 +0,0 @@
import gc
import json
import logging
import os
import pathlib
import re
import tempfile
import time
from contextlib import nullcontext
from datetime import datetime
from queue import Queue
from typing import Any
import torch
from datasets import Dataset
from huggingface_hub import HfApi
from tqdm import trange
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
CompileConfig,
GenerationConfig,
GenerationMixin,
)
from transformers.generation.streamers import BaseStreamer
from .benchmark_config import BenchmarkConfig
from .data_classes import BenchmarkMetadata, BenchmarkResult, GPURawMetrics, pretty_print_dict
from .hardware_metrics import GPUMonitor
try:
from kernels import Mode, kernelize # noqa: F401
except ImportError:
kernelize = None
Mode = None
DEFAULT_PROMPT = "\n".join([
"The French Revolution was a period of political and societal change in France that began with the Estates General of 1789 and ended with the Coup of 18 Brumaire on 9 November 1799.",
"Many of the revolution's ideas are considered fundamental principles of liberal democracy, and its values remain central to modern French political discourse.",
"It was caused by a combination of social, political, and economic factors which the existing regime proved unable to manage.",
"Financial crisis and widespread social distress led to the convocation of the Estates General in May 1789, its first meeting since 1614.",
"The representatives of the Third Estate broke away and re-constituted themselves as a National Assembly in June.",
"The Storming of the Bastille in Paris on 14 July led to a series of radical measures by the Assembly, including the abolition of feudalism, state control over the Catholic Church in France, and issuing the Declaration of the Rights of Man and of the Citizen.",
"The next three years were dominated by a struggle for political control.",
"King Louis XVI's attempted flight to Varennes in June 1791 further discredited the monarchy, and military defeats after the outbreak of the French Revolutionary Wars in April 1792 led to the insurrection of 10 August 1792.",
"As a result, the monarchy was replaced by the French First Republic in September, followed by the execution of Louis XVI himself in January 1793.",
"After another revolt in June 1793, the constitution was suspended, and political power passed from the National Convention to the Committee of Public Safety, dominated by radical Jacobins led by Maximilien Robespierre.",
"About 16,000 people were sentenced by the Revolutionary Tribunal and executed in the Reign of Terror, which ended in July 1794 with the Thermidorian Reaction.",
"Weakened by external threats and internal opposition, the Committee of Public Safety was replaced in November 1795 by the Directory.",
"Its instability ended in the coup of 18 Brumaire and the establishment of the Consulate, with Napoleon Bonaparte as First Consul.",
]) # fmt: skip
PUSH_TO_HUB_TOKEN = os.getenv("PUSH_TO_HUB_TOKEN", None)
def compact_json_numeric_arrays(data: dict):
# Match arrays that contain only numbers (ints/floats), whitespace, commas, and newlines
pattern = r"\[\s*\n\s*((?:\d+(?:\.\d+)?\s*,\s*)*\d+(?:\.\d+)?)\s*\n\s*\]"
def replace_numeric_array(match):
# Get the array content
content = match.group(1)
# Remove extra whitespace but keep commas
compact_content = re.sub(r"\s+", " ", content).strip()
return f"[{compact_content}]"
return re.sub(pattern, replace_numeric_array, json.dumps(data, indent=4, default=str), flags=re.DOTALL)
def get_git_revision() -> str:
base_path = pathlib.Path(__file__).parent.parent.parent
git_dir = base_path / ".git"
with (git_dir / "HEAD").open("r") as head:
ref = head.readline().split(" ")[-1].strip()
with (git_dir / ref).open("r") as git_hash:
return git_hash.readline().strip()
def get_sdpa_backend(backend_name: str | None) -> torch.nn.attention.SDPBackend | None:
"""Get the SDPA backend enum from string name."""
if backend_name is None:
return None
try:
backend_map = {
"math": torch.nn.attention.SDPBackend.MATH,
"flash_attention": torch.nn.attention.SDPBackend.FLASH_ATTENTION,
"efficient_attention": torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION,
"cudnn_attention": torch.nn.attention.SDPBackend.CUDNN_ATTENTION,
}
return backend_map.get(backend_name.lower())
except AttributeError:
# torch.nn.attention.SDPBackend not available in older torch versions
return None
def flush_memory():
"""Flush GPU memory and run garbage collection."""
gc.collect()
# Dynamo resets
torch._dynamo.reset()
torch._dynamo.reset_code_caches()
if hasattr(torch._inductor, "codecache"):
# Clear FX graph cache
if hasattr(torch._inductor.codecache, "FxGraphCache"):
torch._inductor.codecache.FxGraphCache.clear()
# Clear PyCodeCache
if hasattr(torch._inductor.codecache, "PyCodeCache"):
torch._inductor.codecache.PyCodeCache.cache_clear()
# Clear TritonFuture cache (for async compilation)
if hasattr(torch._inductor.codecache, "TritonFuture"):
if hasattr(torch._inductor.codecache.TritonFuture, "_compile_cache"):
torch._inductor.codecache.TritonFuture._compile_cache.clear()
# Clear CUDA cache
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.synchronize()
gc.collect()
class BenchmarkStreamer(BaseStreamer):
def __init__(self, **kwargs) -> None:
self.timeout = kwargs.pop("timeout", 10)
self.timestamps = []
self.text_queue = Queue()
self.stop_signal = None
def put(self, value):
"""Receives tokens and logs the timestamp of the generation."""
self.timestamps.append(time.perf_counter())
self.text_queue.put(value)
def end(self):
self.timestamps.append(time.perf_counter())
self.text_queue.put(self.stop_signal)
def __iter__(self):
return self
def __next__(self):
value = self.text_queue.get(timeout=self.timeout)
if value == self.stop_signal:
raise StopIteration()
else:
return value
class BenchmarkRunner:
"""Main benchmark runner that coordinates benchmark execution."""
def __init__(
self,
logger: logging.Logger,
output_dir: str | None = None,
branch_name: str | None = None,
commit_id: str | None = None,
commit_message: str | None = None,
) -> None:
# Those stay constant for the whole run
self.logger = logger
if output_dir is None:
output_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "benchmark_results")
self.output_dir = output_dir
self.branch_name = branch_name
self.commit_id = get_git_revision() if commit_id is None else commit_id
self.commit_message = commit_message
os.makedirs(self.output_dir, exist_ok=True)
self.profile_dir = None
# Attributes that are reset for each model
self._setup_for = ""
# Attributes that are reset for each run
self.model: GenerationMixin | None = None
def cleanup(self) -> None:
del self.model
self.model = None
flush_memory()
def setup_benchmark(self, model_id: str, config: BenchmarkConfig) -> None:
# Some attributes only need to be set once per model
if self._setup_for != model_id:
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
# We set the EOS token to the padding token for open-ended generation
self.tokenizer.eos_token = self.tokenizer.pad_token
self._setup_for = model_id
# Prepare inputs
self.inputs = self.tokenizer(
[DEFAULT_PROMPT for _ in range(config.batch_size)],
return_tensors="pt",
max_length=config.sequence_length,
truncation=True,
return_attention_mask=True,
).to(config.device)
self.inputs["use_cache"] = True
# Prepare generation config
gen_config = GenerationConfig(
do_sample=False, top_p=1.0, temperature=1.0, max_new_tokens=config.num_tokens_to_generate
)
# Prepare compile config
if config.compile_mode is not None:
gen_config.compile_config = CompileConfig(mode=config.compile_mode, options=config.compile_options)
gen_config.cache_implementation = "static"
# Load model
self.logger.debug(f"Loading model {model_id} on device {config.device}...")
dtype = getattr(torch, config.dtype.removeprefix("torch."))
self.model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=dtype, attn_implementation=config.attn_implementation, generation_config=gen_config
)
self.model = self.model.eval().to(config.device)
# Kernelize the model if needed
if config.kernelize and kernelize is not None and Mode is not None:
self.model = kernelize(self.model, mode=Mode.INFERENCE)
def run_benchmark(
self, model_id: str, config: BenchmarkConfig, num_tokens_to_profile: int = 0
) -> dict[str, Any] | None:
"""Run a single benchmark with the given model ID and config."""
sdpa_ctx = nullcontext()
if config.attn_implementation == "sdpa":
sdpa_backend = get_sdpa_backend(config.sdpa_backend)
sdpa_ctx = torch.nn.attention.sdpa_kernel(sdpa_backend)
with sdpa_ctx, torch.no_grad():
self.logger.info(f"Running benchmark scenario: {config.name}")
# Quick validation: try one measurement first to see if this scenario works
generate_fn = self.time_generate_batch if config.continuous_batching else self.time_generate
flush_memory()
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = generate_fn(
max_new_tokens=1, gpu_monitor=None
)
if e2e_latency < 0:
self.logger.warning(f"Skipping config {config.name}: {e2e_latency = } (no GPU monitoring)")
return None
# Warmup runs
self.logger.info(f"Warming up with {config.warmup_iterations} iterations...")
for _ in trange(config.warmup_iterations):
_ = generate_fn(max_new_tokens=config.num_tokens_to_generate)
self.logger.info("Warmup over.")
# Measurement runs
result = BenchmarkResult()
self.logger.info(f"Benchmarking with {config.measurement_iterations} iterations.")
for _ in trange(config.measurement_iterations):
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = generate_fn(
max_new_tokens=config.num_tokens_to_generate,
gpu_monitor=(GPUMonitor(logger=self.logger) if config.gpu_monitoring else None),
)
result.accumulate(e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics)
self.logger.info("Benchmarking done. Cleaning up.")
# Profile if needed
if num_tokens_to_profile > 0:
self.profile_generate(num_tokens_to_profile, config.name)
return {
"metadata": BenchmarkMetadata(
model_id=model_id,
branch_name=self.branch_name,
commit_id=self.commit_id,
commit_message=self.commit_message,
),
"measurements": result,
"config": config,
}
# TODO: refactor `generate_batch` to handle streaming so we can use it here
def time_generate_batch(
self,
max_new_tokens: int,
gpu_monitor: GPUMonitor | None = None,
) -> tuple[float, list[float], str, GPURawMetrics | None]:
if gpu_monitor is not None:
gpu_monitor.start()
config = GenerationConfig(
max_new_tokens=max_new_tokens,
eos_token_id=self.tokenizer.eos_token_id,
pad_token_id=self.tokenizer.pad_token_id,
do_sample=True,
)
manager = self.model.init_continuous_batching(config)
manager.start()
try:
first_req_results = []
timestamps = []
wall_time_0 = time.perf_counter()
inputs = self.inputs["input_ids"].tolist()
manager.add_requests(inputs, max_new_tokens=max_new_tokens, streaming=True)
first_req_id = None
num_requests = len(inputs)
finished_requests = 0
while finished_requests < num_requests:
# NOTE: I don't like having the extra if stmt here, but hopefully won't degrade perf too much
result = manager.get_result()
if result:
timestamps.append(time.perf_counter() - wall_time_0)
if result.is_finished():
finished_requests += 1
if first_req_id is None:
first_req_id = result.request_id
if result.request_id == first_req_id:
first_req_results.append(result)
else:
if not manager.is_running():
raise RuntimeError("Generation thread exited unexpectedly")
wall_time_1 = time.perf_counter()
gpu_metrics = gpu_monitor.stop_and_collect() if gpu_monitor is not None else None
decoded_output = self.tokenizer.decode(
[res.generated_tokens[0] for res in first_req_results], skip_special_tokens=True
)
shape_and_decoded_output = f"{(1, len(first_req_results))} | {decoded_output}"
e2e_latency = wall_time_1 - wall_time_0
return e2e_latency, timestamps, shape_and_decoded_output, gpu_metrics
except Exception as e:
raise e
finally:
manager.stop()
def time_generate(
self,
max_new_tokens: int,
gpu_monitor: GPUMonitor | None = None,
) -> tuple[float, list[float], str, GPURawMetrics | None]:
"""Time the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
# Prepare gpu monitoring if needed
if gpu_monitor is not None:
gpu_monitor.start()
# Prepare streamer
streamer = BenchmarkStreamer()
# Generate and time
wall_time_0 = time.perf_counter()
outputs = self.model.generate(
**self.inputs,
max_new_tokens=max_new_tokens,
streamer=streamer,
)
wall_time_1 = time.perf_counter()
# Stop gpu monitoring if needed
gpu_metrics = gpu_monitor.stop_and_collect() if gpu_monitor is not None else None
# Check if generation had the right number of tokens
input_tokens = self.inputs["input_ids"].size(-1)
batch_size, output_tokens = outputs.shape
new_tokens = output_tokens - input_tokens
if new_tokens != max_new_tokens:
raise RuntimeError(f"Generated {new_tokens} tokens, expected {max_new_tokens}")
# Decode outputs
decoded_output = self.tokenizer.decode(outputs[0, input_tokens:], skip_special_tokens=True)
shape_and_decoded_output = f"{tuple(outputs.shape)} | {decoded_output}"
# Compute intermediate quantities
e2e_latency = wall_time_1 - wall_time_0
token_generation_times = [t - wall_time_0 for t in streamer.timestamps[1:]]
return e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics
def profile_generate(self, num_tokens_to_profile: int, config_name: str) -> None:
"""Profile the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
profiler = torch.profiler.profile(
activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
record_shapes=True,
)
with profiler as prof:
_ = self.model.generate(
**self.inputs,
max_new_tokens=num_tokens_to_profile,
)
if self.profile_dir is None:
self.profile_dir = self.output_dir + "_profiles"
os.makedirs(self.profile_dir, exist_ok=True)
prof.export_chrome_trace(f"{self.profile_dir}/{config_name}.json")
def run_benchmarks(
self,
model_id: str,
benchmark_configs: list[BenchmarkConfig],
num_tokens_to_profile: int = 0,
pretty_print_summary: bool = True,
) -> tuple[str, dict[str, Any]]:
"""Run multiple benchmarks for the given model ID and list of benchmark configs."""
all_results = {}
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
start_time = time.perf_counter()
n_configs = len(benchmark_configs)
for i, config in enumerate(benchmark_configs):
# Skip if already run
if config.hash in all_results:
self.logger.info(f"Skipping duplicate config {config.name} for model {model_id} ({i + 1}/{n_configs})")
continue
# Otherwise, run the benchmark
self.setup_benchmark(model_id, config)
self.logger.info(
f"Running benchmark of model {model_id} with scenario: {config.name} ({i + 1}/{n_configs})"
)
# Launch benchmark in a try/except block to avoid stopping the whole run if one benchmark fails
try:
results = self.run_benchmark(model_id, config, num_tokens_to_profile)
if results is not None:
all_results[config.hash] = results
except Exception as e:
self.logger.error(f"Error running with scenario: {config.name}:\n{repr(e)}")
# Cleanup model and save results
self.cleanup()
self.save_results(model_id, all_results, timestamp=timestamp)
if len(all_results) < 1:
raise RuntimeError("No benchmark was run succesfully")
if pretty_print_summary:
print()
print("=" * 100)
print(f"Finished benchmarks in {time.perf_counter() - start_time:.2f} seconds")
print(f"Total number of benchmarks: {len(all_results)}")
print("First run metadata:")
first_key = list(all_results.keys())[0]
first_metadata = all_results[first_key]["metadata"].to_dict()
hardware_info = first_metadata.pop("hardware_info")
pretty_print_dict(first_metadata | hardware_info, tabs=1)
for result in all_results.values():
print("=" * 100)
print(f"Config: {result['config'].infer_name(compact=False)}\n")
result["measurements"].pprint(
batch_size=result["config"].batch_size,
num_generated_tokens=result["config"].num_tokens_to_generate,
tabs=1,
)
print("=" * 100)
return (timestamp, all_results)
def save_results(self, model_name: str, results: dict, timestamp: str = "") -> str:
"""Save benchmark results to JSON file."""
# Create model-specific subdirectory
model_name = model_name.replace("/", "_")
model_dir = os.path.join(self.output_dir, model_name)
os.makedirs(model_dir, exist_ok=True)
# Create filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") if not timestamp else timestamp
filename = f"{model_name}_benchmark_{timestamp}.json"
filepath = os.path.join(model_dir, filename)
# Convert results to dict
converted_results = {}
for cfg_hash in results.keys():
converted_results[cfg_hash] = {
"metadata": results[cfg_hash]["metadata"].to_dict(),
"measurements": results[cfg_hash]["measurements"].to_dict(),
"config": results[cfg_hash]["config"].to_dict(),
}
# Save to JSON file
with open(filepath, "w") as f:
f.write(compact_json_numeric_arrays(converted_results))
self.logger.info(f"Results saved to {filepath}")
return filepath
def push_results_to_hub(self, dataset_id: str, results: dict[Any, Any], timestamp: str) -> None:
if PUSH_TO_HUB_TOKEN is None:
raise ValueError(
"PUSH_TO_HUB_TOKEN is not set, cannot push results to the Hub. When setting dataset_id, please also set the PUSH_TO_HUB_TOKEN environment variable."
)
n_results = len(results)
self.logger.info(f"Pushing {n_results} results to: {dataset_id}")
rows = []
for cfg_hash, entry in results.items():
row = {
"benchmark_config_hash": cfg_hash,
"config": entry["config"].to_dict(),
"measurements": entry["measurements"].to_dict(),
"metadata": entry["metadata"].to_dict(),
}
rows.append(row)
ds = Dataset.from_list(rows)
with tempfile.TemporaryDirectory() as tmp:
jsonl_path = os.path.join(tmp, "data.jsonl")
with open(jsonl_path, "w") as f:
json_lines = []
for ex in ds:
json_lines.append(json.dumps(ex, ensure_ascii=False))
f.write("\n".join(json_lines))
api = HfApi()
# NOTE: we expect the repository to already exist
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") if not timestamp else timestamp
file_name = f"benchmark_run_{timestamp}.jsonl"
api.upload_file(
path_or_fileobj=jsonl_path,
path_in_repo=file_name,
repo_id=dataset_id,
repo_type="dataset",
token=PUSH_TO_HUB_TOKEN,
)
self.logger.info(f"Succesfully uploaded results to: {dataset_id}")

View File

@ -1,160 +0,0 @@
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any
import numpy as np
from .hardware_metrics import GPURawMetrics, HardwareInfo
def compute_basic_statistics(measurements: list[float]) -> dict[str, float]:
return {
"avg": np.mean(measurements),
"std": np.std(measurements),
"min": np.min(measurements),
"med": np.median(measurements),
"max": np.max(measurements),
"p95": np.percentile(measurements, 95),
}
def add_unit_to_duration(stats: dict[str, float]) -> dict[str, str]:
for key in list(stats.keys()):
value = stats[key]
if value > 3600:
stats[key] = f"{(value / 3600):.2f}hr"
elif value > 60:
stats[key] = f"{(value / 60):.2f}min"
elif value > 1:
stats[key] = f"{value:.2f}s"
elif value > 1e-3:
stats[key] = f"{(value * 1e3):.2f}ms"
elif value > 1e-6:
stats[key] = f"{(value * 1e6):.2f}us"
else:
stats[key] = f"{(value * 1e9):.2f}ns"
return stats
def equalize_lengths_and_collate(stats: dict[str, dict[str, str]]) -> dict[str, str]:
"""Note: This operation is destructive as it will update values in place before returning a new correctly formatted dict"""
keys = ["avg", "std", "min", "med", "max", "p95"]
for key in keys:
max_length = max(len(stat[key]) for stat in stats.values())
for stat in stats.values():
stat[key] = stat[key].ljust(max_length, " ")
return {name: " ".join([f"{key}={stat[key]}" for key in keys]) for name, stat in stats.items()}
def pretty_print_dict(data: dict[str, str], tabs: int = 0) -> None:
max_key_length = max([len(key) for key in data.keys()])
for key, value in data.items():
tabs_str = " " * tabs
padded_key = key.ljust(max_key_length + 1, ".")
print(f"{tabs_str}{padded_key}: {value}")
@dataclass
class BenchmarkMetadata:
"""Metadata collected for each benchmark run."""
model_id: str
timestamp: str
branch_name: str
commit_id: str
commit_message: str
hardware_info: HardwareInfo
def __init__(self, model_id: str, commit_id: str, branch_name: str = "main", commit_message: str = "") -> None:
self.model_id = model_id
self.timestamp = datetime.now(timezone.utc).isoformat()
self.branch_name = branch_name
self.commit_id = commit_id
self.commit_message = commit_message
self.hardware_info = HardwareInfo()
def to_dict(self) -> dict[str, Any]:
return {
"model_id": self.model_id,
"timestamp": self.timestamp,
"branch_name": self.branch_name,
"commit_id": self.commit_id,
"commit_message": self.commit_message,
"hardware_info": self.hardware_info.to_dict(),
}
class BenchmarkResult:
"""Result from a series of benchmark runs."""
def __init__(self) -> None:
self.e2e_latency = []
self.token_generation_times = [] # time at which each token was generated (relative to start of the generation)
self.shape_and_decoded_outputs = []
self.gpu_metrics = []
def accumulate(
self,
e2e_latency: float,
token_generation_times: list[float],
shape_and_decoded_output: str,
gpu_metrics: GPURawMetrics | None,
) -> None:
self.e2e_latency.append(e2e_latency)
self.token_generation_times.append(token_generation_times)
self.shape_and_decoded_outputs.append(shape_and_decoded_output)
self.gpu_metrics.append(gpu_metrics)
def to_dict(self) -> dict[str, None | int | float]:
# Save GPU metrics as None if it contains only None values
if all(gm is None for gm in self.gpu_metrics):
gpu_metrics = None
else:
gpu_metrics = [gm.to_dict() for gm in self.gpu_metrics]
return {
"e2e_latency": self.e2e_latency,
"token_generation_times": self.token_generation_times,
"shape_and_decoded_outputs": self.shape_and_decoded_outputs,
"gpu_metrics": gpu_metrics,
}
@classmethod
def from_dict(cls, data: dict[str, None | int | float]) -> "BenchmarkResult":
# Handle GPU metrics, which is saved as None if it contains only None values
if data["gpu_metrics"] is None:
gpu_metrics = [None for _ in range(len(data["e2e_latency"]))]
else:
gpu_metrics = [GPURawMetrics.from_dict(gm) for gm in data["gpu_metrics"]]
# Create a new instance and accumulate the data
new_instance = cls()
for i in range(len(data["e2e_latency"])):
new_instance.accumulate(
e2e_latency=data["e2e_latency"][i],
token_generation_times=data["token_generation_times"][i],
shape_and_decoded_output=data["shape_and_decoded_outputs"][i],
gpu_metrics=gpu_metrics[i],
)
return new_instance
def get_measured_ttft(self) -> list[float]:
return [dt[0] for dt in self.token_generation_times if len(dt) > 0]
def get_measured_itl(self) -> list[float]:
return [(dt[-1] - dt[0]) / (len(dt) - 1) for dt in self.token_generation_times if len(dt) > 1]
def get_throughput(self, total_generated_tokens: int) -> list[float]:
return [total_generated_tokens / e2e_latency for e2e_latency in self.e2e_latency]
def pprint(self, batch_size: int = 0, num_generated_tokens: int = 0, tabs: int = 0) -> None:
measurements = {
"E2E Latency": add_unit_to_duration(compute_basic_statistics(self.e2e_latency)),
"Time to First Token": add_unit_to_duration(compute_basic_statistics(self.get_measured_ttft())),
}
itl_values = self.get_measured_itl()
if len(itl_values) > 0:
measurements["Inter-Token Latency"] = add_unit_to_duration(compute_basic_statistics(itl_values))
if batch_size > 0:
throughput_stats = compute_basic_statistics(self.get_throughput(batch_size * num_generated_tokens))
measurements["Throughput"] = {key: f"{value:.2f}tok/s" for key, value in throughput_stats.items()}
dict_to_pprint = equalize_lengths_and_collate(measurements)
pretty_print_dict(dict_to_pprint, tabs=tabs)

View File

@ -1,171 +0,0 @@
import json
import logging
import subprocess
import sys
import threading
import time
from dataclasses import dataclass
from enum import Enum
from logging import Logger
import gpustat
import psutil
import torch
# Data class to hold the hardware information
def get_device_name_and_memory_total() -> tuple[str, float]:
"""Returns the name and memory total of GPU 0."""
device_name = torch.cuda.get_device_properties(0).name
device_memory_total = torch.cuda.get_device_properties(0).total_memory / 1024**3
return device_name, device_memory_total
class HardwareInfo:
"""A class to hold information about the hardware."""
def __init__(self) -> None:
# Retrieve GPU stats
try:
self.gpu_name, self.gpu_memory_total_gb = get_device_name_and_memory_total()
except Exception:
self.gpu_name, self.gpu_memory_total_gb = None, None
# Retrieve python, torch and CUDA version
self.python_version = f"{sys.version.split()[0]}"
self.torch_version = torch.__version__
if hasattr(torch, "cuda") and torch.cuda.is_available():
self.cuda_version = torch.version.cuda
else:
self.cuda_version = None
# Retrieve general hardware information
self.cpu_count = psutil.cpu_count()
self.memory_total_mb = int(psutil.virtual_memory().total / (1024 * 1024))
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"gpu_name": self.gpu_name,
"gpu_memory_total_gb": self.gpu_memory_total_gb,
"python_version": self.python_version,
"torch_version": self.torch_version,
}
# Functions to get information about the GPU
def get_amd_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an AMD GPU, both in percent"""
rocm_smi_output = subprocess.check_output(["rocm-smi", "--json", "--showuse", "--showmeminfo", "VRAM"])
gpu_stats = json.loads(rocm_smi_output.decode("utf-8"))
gpu_stats = [
(card_id, stats["GPU use (%)"], stats["VRAM Total Used Memory (B)"]) for card_id, stats in gpu_stats.items()
]
gpu_stats.sort(key=lambda x: x[1], reverse=True)
return int(gpu_stats[0][1]), float(gpu_stats[0][2]) / 1024**3
def get_nvidia_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an NVIDIA GPU, both in percent"""
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_stats = gpu_stats[0]
return int(gpu_stats["utilization.gpu"]), float(gpu_stats["memory.used"]) / 1024**3
class GPUStatsCollector:
"""A class to get statistics about the GPU. It serves as a wrapper that holds the GPU total memory and its name,
which is used to call the right function to get the utilization and memory used."""
def __init__(self) -> None:
self.device_name, self.device_memory_total = get_device_name_and_memory_total()
# Monkey patch the get_utilization_and_memory_used method based on the GPU type
if "amd" in self.device_name.lower():
self.get_utilization_and_memory_used = get_amd_gpu_stats
elif "nvidia" in self.device_name.lower():
self.get_utilization_and_memory_used = get_nvidia_gpu_stats
else:
raise RuntimeError(f"Unsupported GPU: {self.device_name}")
def get_measurements(self) -> tuple[int, float]:
"""Get the utilization and memory used of the GPU, both in percent"""
raise NotImplementedError("This method is meant to be monkey patched during __init__")
# Simple data classes to hold the raw GPU metrics
class GPUMonitoringStatus(Enum):
"""Status of GPU monitoring."""
SUCCESS = "success"
FAILED = "failed"
NO_GPUS_AVAILABLE = "no_gpus_available"
NO_SAMPLES_COLLECTED = "no_samples_collected"
@dataclass
class GPURawMetrics:
"""Raw values for GPU utilization and memory used."""
utilization: list[float] # in percent
memory_used: list[float] # in GB
timestamps: list[float] # in seconds
timestamp_0: float # in seconds
monitoring_status: GPUMonitoringStatus
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"utilization": self.utilization,
"memory_used": self.memory_used,
"timestamps": self.timestamps,
"timestamp_0": self.timestamp_0,
"monitoring_status": self.monitoring_status.value,
}
# Main class, used to monitor the GPU utilization during benchmark execution
class GPUMonitor:
"""Monitor GPU utilization during benchmark execution."""
def __init__(self, sample_interval_sec: float = 0.1, logger: Logger | None = None):
self.sample_interval_sec = sample_interval_sec
self.logger = logger if logger is not None else logging.getLogger(__name__)
self.num_available_gpus = torch.cuda.device_count()
if self.num_available_gpus == 0:
raise RuntimeError("No GPUs detected by torch.cuda.device_count().")
self.gpu_stats_getter = GPUStatsCollector()
def start(self):
"""Start monitoring GPU metrics."""
# Clear the stop event to enable monitoring
self.stop_event = threading.Event()
self.gpu_utilization = []
self.gpu_memory_used = []
self.timestamps = []
self.thread = threading.Thread(target=self._monitor_loop)
self.thread.start()
self.logger.debug("GPU monitoring started")
def stop_and_collect(self) -> GPURawMetrics:
"""Stop monitoring and return collected metrics."""
self.stop_event.set()
self.thread.join()
if self.gpu_utilization:
timestamp_0 = self.timestamps[0]
metrics = GPURawMetrics(
utilization=self.gpu_utilization,
memory_used=self.gpu_memory_used,
timestamps=[t - timestamp_0 for t in self.timestamps],
timestamp_0=timestamp_0,
monitoring_status=GPUMonitoringStatus.SUCCESS,
)
self.logger.debug(f"GPU monitoring completed: {len(self.gpu_utilization)} samples collected")
else:
metrics = GPURawMetrics(monitoring_status=GPUMonitoringStatus.NO_SAMPLES_COLLECTED)
return metrics
def _monitor_loop(self):
"""Background monitoring loop using threading.Event for communication."""
while not self.stop_event.is_set():
utilization, memory_used = self.gpu_stats_getter.get_utilization_and_memory_used()
self.gpu_utilization.append(utilization)
self.gpu_memory_used.append(memory_used)
self.timestamps.append(time.time())
if self.stop_event.wait(timeout=self.sample_interval_sec):
break

View File

@ -2,5 +2,6 @@ numpy>=1.21.0
psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
huggingface_hub>=0.16.0
huggingface_hub>=0.16.0

View File

@ -19,95 +19,477 @@ in the ./benches directory, organizing outputs into model-specific subfolders.
"""
import argparse
import importlib.util
import json
import logging
import os
import sys
import uuid
from framework.benchmark_config import adapt_configs, get_config_by_level
from framework.benchmark_runner import BenchmarkRunner
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
if __name__ == "__main__":
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--output-dir", type=str, default=None, help="Output dir for benchmark results")
parser.add_argument("--log-level", type=str, choices=["DEBUG", "INFO", "WARNING", "ERROR"], default="INFO")
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup", "-w", type=int, default=3, help="Number of warmup iterations")
parser.add_argument("--iterations", "-i", type=int, default=10, help="Number of measurement iterations")
parser.add_argument("--batch-size", "-b", type=int, nargs="+", help="Batch size")
parser.add_argument("--sequence-length", "-s", type=int, nargs="+", help="Sequence length")
parser.add_argument("--num-tokens-to-generate", "-n", type=int, nargs="+", help="Number of tokens to generate")
parser.add_argument(
"--level",
type=int,
default=1,
help="Level of coverage for the benchmark. 0: only the main config, 1: a few important configs, 2: a config for"
" each attn implementation an option, 3: cross-generate all combinations of configs, 4: cross-generate all"
" combinations of configs w/ all compile modes",
)
parser.add_argument("--num-tokens-to-profile", "-p", type=int, default=0, help="Number of tokens to profile")
parser.add_argument("--branch-name", type=str, help="Git branch name")
parser.add_argument("--commit-id", type=str, help="Git commit ID (if not provided, will auto-detect from git)")
parser.add_argument("--commit-message", type=str, help="Git commit message")
parser.add_argument(
"--no-gpu-monitoring", action="store_true", help="Disables GPU monitoring during benchmark runs"
)
parser.add_argument(
"--push-result-to-dataset",
type=str,
default=None,
help="Name of the dataset to push results to. If not provided, results are not pushed to the Hub.",
)
args = parser.parse_args()
# Setup logging
benchmark_run_uuid = str(uuid.uuid4())[:8]
numeric_level = getattr(logging, args.log_level.upper())
def setup_logging(log_level: str = "INFO", enable_file_logging: bool = False) -> logging.Logger:
"""Setup logging configuration."""
numeric_level = getattr(logging, log_level.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError(f"Invalid log level: {log_level}")
handlers = [logging.StreamHandler(sys.stdout)]
if enable_file_logging:
handlers.append(logging.FileHandler(f"benchmark_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"))
logging.basicConfig(
level=numeric_level, format="[%(levelname)s - %(asctime)s] %(name)s: %(message)s", handlers=handlers
)
logger = logging.getLogger("benchmark_v2")
return logging.getLogger(__name__)
def discover_benchmarks(benches_dir: str) -> list[dict[str, Any]]:
"""
Discover all benchmark modules in the benches directory.
Returns:
List of dictionaries containing benchmark module info
"""
benchmarks = []
benches_path = Path(benches_dir)
if not benches_path.exists():
raise FileNotFoundError(f"Benches directory not found: {benches_dir}")
for py_file in benches_path.glob("*.py"):
if py_file.name.startswith("__"):
continue
module_name = py_file.stem
try:
# Import the module
spec = importlib.util.spec_from_file_location(module_name, py_file)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Check if it has a benchmark runner function
if hasattr(module, f"run_{module_name}"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, f"run_{module_name}"),
}
)
elif hasattr(module, "run_benchmark"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, "run_benchmark"),
}
)
else:
logging.warning(f"No runner function found in {py_file}")
except Exception as e:
logging.error(f"Failed to import {py_file}: {e}")
return benchmarks
def run_single_benchmark(
benchmark_info: dict[str, Any], output_dir: str, logger: logging.Logger, **kwargs
) -> Optional[str]:
"""
Run a single benchmark and return the output file path.
Args:
benchmark_info: Dictionary containing benchmark module info
output_dir: Base output directory
logger: Logger instance
**kwargs: Additional arguments to pass to the benchmark
Returns:
Path to the output file if successful, None otherwise
"""
benchmark_name = benchmark_info["name"]
runner_func = benchmark_info["runner_function"]
logger.info(f"Running benchmark: {benchmark_name}")
try:
# Check function signature to determine what arguments to pass
import inspect
sig = inspect.signature(runner_func)
# Prepare arguments based on function signature
func_kwargs = {"logger": logger, "output_dir": output_dir}
# Add other kwargs if the function accepts them
for param_name in sig.parameters:
if param_name in kwargs:
func_kwargs[param_name] = kwargs[param_name]
# Filter kwargs to only include parameters the function accepts
# If function has **kwargs, include all provided kwargs
has_var_kwargs = any(param.kind == param.VAR_KEYWORD for param in sig.parameters.values())
if has_var_kwargs:
valid_kwargs = {**func_kwargs, **kwargs}
else:
valid_kwargs = {k: v for k, v in func_kwargs.items() if k in sig.parameters}
# Run the benchmark
result = runner_func(**valid_kwargs)
if isinstance(result, str):
# Function returned a file path
return result
else:
logger.info(f"Benchmark {benchmark_name} completed successfully")
return "completed"
except Exception as e:
logger.error(f"Benchmark {benchmark_name} failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return None
def generate_summary_report(
output_dir: str,
benchmark_results: dict[str, Any],
logger: logging.Logger,
benchmark_run_uuid: Optional[str] = None,
) -> str:
"""Generate a summary report of all benchmark runs."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.json")
summary_data = {
"run_metadata": {
"timestamp": datetime.utcnow().isoformat(),
"benchmark_run_uuid": benchmark_run_uuid,
"total_benchmarks": len(benchmark_results),
"successful_benchmarks": len([r for r in benchmark_results.values() if r is not None]),
"failed_benchmarks": len([r for r in benchmark_results.values() if r is None]),
},
"benchmark_results": benchmark_results,
"output_directory": output_dir,
}
with open(summary_file, "w") as f:
json.dump(summary_data, f, indent=2, default=str)
logger.info(f"Summary report saved to: {summary_file}")
return summary_file
def upload_results_to_hf_dataset(
output_dir: str,
summary_file: str,
dataset_name: str,
run_id: Optional[str] = None,
token: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> Optional[str]:
"""
Upload benchmark results to a HuggingFace Dataset.
Based on upload_collated_report() from utils/collated_reports.py
Args:
output_dir: Local output directory containing results
summary_file: Path to the summary file
dataset_name: Name of the HuggingFace dataset to upload to
run_id: Unique run identifier (if None, will generate one)
token: HuggingFace token for authentication (if None, will use environment variables)
logger: Logger instance
Returns:
The run_id used for the upload, None if upload failed
"""
if logger is None:
logger = logging.getLogger(__name__)
import os
from huggingface_hub import HfApi
api = HfApi()
if run_id is None:
github_run_number = os.getenv("GITHUB_RUN_NUMBER")
github_run_id = os.getenv("GITHUB_RUN_ID")
if github_run_number and github_run_id:
run_id = f"{github_run_number}-{github_run_id}"
date_folder = datetime.now().strftime("%Y-%m-%d")
github_event_name = os.getenv("GITHUB_EVENT_NAME")
if github_event_name != "schedule":
# Non-scheduled runs go under a runs subfolder
repo_path = f"{date_folder}/runs/{run_id}/benchmark_results"
else:
# Scheduled runs go directly under the date
repo_path = f"{date_folder}/{run_id}/benchmark_results"
logger.info(f"Uploading benchmark results to dataset '{dataset_name}' at path '{repo_path}'")
try:
# Upload all files in the output directory
from pathlib import Path
output_path = Path(output_dir)
for file_path in output_path.rglob("*"):
if file_path.is_file():
# Calculate relative path from output_dir
relative_path = file_path.relative_to(output_path)
path_in_repo = f"{repo_path}/{relative_path}"
logger.debug(f"Uploading {file_path} to {path_in_repo}")
api.upload_file(
path_or_fileobj=str(file_path),
path_in_repo=path_in_repo,
repo_id=dataset_name,
repo_type="dataset",
token=token,
commit_message=f"Upload benchmark results for run {run_id}",
)
logger.info(
f"Successfully uploaded results to: https://huggingface.co/datasets/{dataset_name}/tree/main/{repo_path}"
)
return run_id
except Exception as upload_error:
logger.error(f"Failed to upload results: {upload_error}")
import traceback
logger.debug(traceback.format_exc())
return None
def main():
"""Main entry point for the benchmarking script."""
# Generate a unique UUID for this benchmark run
benchmark_run_uuid = str(uuid.uuid4())[:8]
parser = argparse.ArgumentParser(
description="Run all benchmarks in the ./benches directory",
epilog="""
Examples:
# Run all available benchmarks
python3 run_benchmarks.py
# Run with specific model and upload to HuggingFace Dataset
python3 run_benchmarks.py --model-id meta-llama/Llama-2-7b-hf --upload-to-hf username/benchmark-results
# Run with custom run ID and upload to HuggingFace Dataset
python3 run_benchmarks.py --run-id experiment_v1 --upload-to-hf org/benchmarks
# Run only specific benchmarks with file logging
python3 run_benchmarks.py --include llama --enable-file-logging
""", # noqa: W293
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--output-dir",
type=str,
default="benchmark_results",
help="Base output directory for benchmark results (default: benchmark_results)",
)
parser.add_argument(
"--benches-dir",
type=str,
default="./benches",
help="Directory containing benchmark implementations (default: ./benches)",
)
parser.add_argument(
"--log-level",
type=str,
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level (default: INFO)",
)
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup-iterations", type=int, default=3, help="Number of warmup iterations (default: 3)")
parser.add_argument(
"--measurement-iterations", type=int, default=5, help="Number of measurement iterations (default: 5)"
)
parser.add_argument(
"--num-tokens-to-generate",
type=int,
default=100,
help="Number of tokens to generate in benchmarks (default: 100)",
)
parser.add_argument("--include", type=str, nargs="*", help="Only run benchmarks matching these names")
parser.add_argument("--exclude", type=str, nargs="*", help="Exclude benchmarks matching these names")
parser.add_argument("--enable-file-logging", action="store_true", help="Enable file logging (disabled by default)")
parser.add_argument(
"--commit-id", type=str, help="Git commit ID for metadata (if not provided, will auto-detect from git)"
)
parser.add_argument(
"--push-to-hub",
type=str,
help="Upload results to HuggingFace Dataset (provide dataset name, e.g., 'username/benchmark-results')",
)
parser.add_argument(
"--run-id", type=str, help="Custom run ID for organizing results (if not provided, will generate a unique ID)"
)
parser.add_argument(
"--token",
type=str,
help="HuggingFace token for dataset uploads (if not provided, will use HF_TOKEN environment variable)",
)
args = parser.parse_args()
# Setup logging
logger = setup_logging(args.log_level, args.enable_file_logging)
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Benches directory: {args.benches_dir}")
# We cannot compute ITL if we don't have at least two measurements
if any(n <= 1 for n in args.num_tokens_to_generate):
raise ValueError("--num_tokens_to_generate arguments should be larger than 1")
# Create output directory
os.makedirs(args.output_dir, exist_ok=True)
# Error out if one of the arguments is not provided
if len(args.batch_size) * len(args.sequence_length) * len(args.num_tokens_to_generate) == 0:
raise ValueError(
"At least one of the arguments --batch-size, --sequence-length, or --num-tokens-to-generate is required"
)
try:
# Discover benchmarks
benchmarks = discover_benchmarks(args.benches_dir)
logger.info(f"Discovered {len(benchmarks)} benchmark(s): {[b['name'] for b in benchmarks]}")
# Get the configs for the given coverage level
configs = get_config_by_level(args.level)
# Adapt the configs to the given arguments
configs = adapt_configs(
configs,
args.warmup,
args.iterations,
args.batch_size,
args.sequence_length,
args.num_tokens_to_generate,
not args.no_gpu_monitoring,
)
if not benchmarks:
logger.warning("No benchmarks found!")
return 1
runner = BenchmarkRunner(logger, args.output_dir, args.branch_name, args.commit_id, args.commit_message)
timestamp, results = runner.run_benchmarks(
args.model_id, configs, args.num_tokens_to_profile, pretty_print_summary=True
)
# Filter benchmarks based on include/exclude
filtered_benchmarks = benchmarks
dataset_id = args.push_result_to_dataset
if dataset_id is not None and len(results) > 0:
runner.push_results_to_hub(dataset_id, results, timestamp)
if args.include:
filtered_benchmarks = [
b for b in filtered_benchmarks if any(pattern in b["name"] for pattern in args.include)
]
logger.info(f"Filtered to include: {[b['name'] for b in filtered_benchmarks]}")
if args.exclude:
filtered_benchmarks = [
b for b in filtered_benchmarks if not any(pattern in b["name"] for pattern in args.exclude)
]
logger.info(f"After exclusion: {[b['name'] for b in filtered_benchmarks]}")
if not filtered_benchmarks:
logger.warning("No benchmarks remaining after filtering!")
return 1
# Prepare common kwargs for benchmarks
benchmark_kwargs = {
"warmup_iterations": args.warmup_iterations,
"measurement_iterations": args.measurement_iterations,
"num_tokens_to_generate": args.num_tokens_to_generate,
}
if args.model_id:
benchmark_kwargs["model_id"] = args.model_id
# Add commit_id if provided
if args.commit_id:
benchmark_kwargs["commit_id"] = args.commit_id
# Run benchmarks
benchmark_results = {}
successful_count = 0
for benchmark_info in filtered_benchmarks:
result = run_single_benchmark(benchmark_info, args.output_dir, logger, **benchmark_kwargs)
benchmark_results[benchmark_info["name"]] = result
if result is not None:
successful_count += 1
# Generate summary report
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger, benchmark_run_uuid)
# Upload results to HuggingFace Dataset if requested
upload_run_id = None
if args.push_to_hub:
logger.info("=" * 60)
logger.info("UPLOADING TO HUGGINGFACE DATASET")
logger.info("=" * 60)
# Use provided run_id or fallback to benchmark run UUID
effective_run_id = args.run_id or benchmark_run_uuid
upload_run_id = upload_results_to_hf_dataset(
output_dir=args.output_dir,
summary_file=summary_file,
dataset_name=args.push_to_hub,
run_id=effective_run_id,
token=args.token,
logger=logger,
)
if upload_run_id:
logger.info(f"Upload completed with run ID: {upload_run_id}")
else:
logger.warning("Upload failed - continuing with local results")
# Final summary
total_benchmarks = len(filtered_benchmarks)
failed_count = total_benchmarks - successful_count
logger.info("=" * 60)
logger.info("BENCHMARK RUN SUMMARY")
logger.info("=" * 60)
logger.info(f"Total benchmarks: {total_benchmarks}")
logger.info(f"Successful: {successful_count}")
logger.info(f"Failed: {failed_count}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Summary report: {summary_file}")
if args.push_to_hub:
if upload_run_id:
logger.info(f"HuggingFace Dataset: {args.push_to_hub}")
logger.info(f"Run ID: {upload_run_id}")
logger.info(
f"View results: https://huggingface.co/datasets/{args.push_to_hub}/tree/main/{datetime.now().strftime('%Y-%m-%d')}/runs/{upload_run_id}"
)
else:
logger.warning("Upload to HuggingFace Dataset failed")
if failed_count > 0:
logger.warning(f"{failed_count} benchmark(s) failed. Check logs for details.")
return 1
else:
logger.info("All benchmarks completed successfully!")
return 0
except Exception as e:
logger.error(f"Benchmark run failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return 1
if __name__ == "__main__":
sys.exit(main())

View File

@ -58,12 +58,15 @@ NOT_DEVICE_TESTS = {
"test_model_get_set_embeddings",
"test_model_main_input_name",
"test_correct_missing_keys",
"test_tie_model_weights",
"test_can_use_safetensors",
"test_load_save_without_tied_weights",
"test_tied_weights_keys",
"test_model_weights_reload_no_missing_tied_weights",
"test_can_load_ignoring_mismatched_shapes",
"test_model_is_small",
"test_tf_from_pt_safetensors",
"test_flax_from_pt_safetensors",
"ModelTest::test_pipeline_", # None of the pipeline tests from PipelineTesterMixin (of which XxxModelTest inherits from) are running on device
"ModelTester::test_pipeline_",
"/repo_utils/",
@ -87,10 +90,8 @@ def pytest_configure(config):
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
config.addinivalue_line("markers", "flash_attn_test: mark test which tests flash attention functionality")
config.addinivalue_line("markers", "flash_attn_3_test: mark test which tests flash attention 3 functionality")
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "true"
os.environ['DISABLE_SAFETENSORS_CONVERSION'] = 'true'
def pytest_collection_modifyitems(items):

View File

@ -6,8 +6,10 @@ RUN apt-get update && apt-get install -y time git g++ pkg-config make git-lfs
ENV UV_PYTHON=/usr/local/bin/python
RUN pip install uv && uv pip install --no-cache-dir -U pip setuptools GitPython
RUN uv pip install --no-cache-dir --upgrade 'torch' 'torchaudio' 'torchvision' --index-url https://download.pytorch.org/whl/cpu
# tensorflow pin matching setup.py
RUN uv pip install --no-cache-dir pypi-kenlm
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[quality,testing,torch-speech,vision]"
RUN uv pip install --no-cache-dir "tensorflow-cpu<2.16" "tf-keras<2.16"
RUN uv pip install --no-cache-dir "git+https://github.com/huggingface/transformers.git@${REF}#egg=transformers[flax,quality,testing,torch-speech,vision]"
RUN git lfs install
RUN uv pip uninstall transformers

View File

@ -9,14 +9,11 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.9.0'
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# This needs to be compatible with the above `PYTORCH`.
ARG TORCHCODEC='0.8.0'
ARG FLASH_ATTN='false'
# Disable kernel mapping for now until all tests pass
ENV DISABLE_KERNEL_MAPPING=1
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
@ -26,48 +23,16 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev]
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. For `torchcodec`, use `cpu` as we don't have `libnvcuvid.so` on the host runner. See https://github.com/meta-pytorch/torchcodec/issues/912
# **Important**: We need to specify `torchcodec` version if the torch version is not the latest stable one.
# 3. `set -e` means "exit immediately if any command fails".
RUN set -e; \
# Determine torch version
if [ ${#PYTORCH} -gt 0 ] && [ "$PYTORCH" != "pre" ]; then \
VERSION="torch==${PYTORCH}.*"; \
TORCHCODEC_VERSION="torchcodec==${TORCHCODEC}.*"; \
else \
VERSION="torch"; \
TORCHCODEC_VERSION="torchcodec"; \
fi; \
\
# Log the version being installed
echo "Installing torch version: $VERSION"; \
\
# Install PyTorch packages
if [ "$PYTORCH" != "pre" ]; then \
python3 -m pip install --no-cache-dir -U \
$VERSION \
torchvision \
torchaudio \
--extra-index-url https://download.pytorch.org/whl/$CUDA; \
# We need to specify the version if the torch version is not the latest stable one.
python3 -m pip install --no-cache-dir -U \
$TORCHCODEC_VERSION --extra-index-url https://download.pytorch.org/whl/cpu; \
else \
python3 -m pip install --no-cache-dir -U --pre \
torch \
torchvision \
torchaudio \
--extra-index-url https://download.pytorch.org/whl/nightly/$CUDA; \
python3 -m pip install --no-cache-dir -U --pre \
torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/cpu; \
fi
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA && python3 -m pip uninstall -y tensorflow tensorflow_text tensorflow_probability
RUN python3 -m pip uninstall -y flax jax
RUN python3 -m pip install --no-cache-dir -U timm
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN python3 -m pip install --no-cache-dir pytesseract
@ -92,7 +57,7 @@ RUN python3 -m pip install --no-cache-dir bitsandbytes
RUN python3 -m pip install --no-cache-dir quanto
# After using A10 as CI runner, let's run FA2 tests
RUN [ "$FLASH_ATTN" != "false" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
# TODO (ydshieh): check this again
# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests

View File

@ -10,7 +10,7 @@ RUN apt-get -y update && apt-get install -y libsndfile1-dev && apt install -y te
# Torch needs to be installed before deepspeed
RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed]
RUN python3 -m pip install --no-cache-dir --no-build-isolation torchvision git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install --no-cache-dir torchvision git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# Test if the image could successfully build the doc. before publishing the image

View File

@ -15,6 +15,7 @@ RUN apt update && \
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
python3 -m pip install --no-cache-dir \
jupyter \
tensorflow \
torch
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels

View File

@ -0,0 +1,59 @@
ARG BASE_DOCKER_IMAGE
FROM $BASE_DOCKER_IMAGE
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
# Use login shell to read variables from `~/.profile` (to pass dynamic created variables between RUN commands)
SHELL ["sh", "-lc"]
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs libaio-dev
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime]
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
ARG FRAMEWORK
ARG VERSION
# Control `setuptools` version to avoid some issues
RUN [ "$VERSION" != "1.10" ] && python3 -m pip install -U setuptools || python3 -m pip install -U "setuptools<=59.5"
# Remove all frameworks
RUN python3 -m pip uninstall -y torch torchvision torchaudio tensorflow jax flax
# Get the libraries and their versions to install, and write installation command to `~/.profile`.
RUN python3 ./transformers/utils/past_ci_versions.py --framework $FRAMEWORK --version $VERSION
# Install the target framework
RUN echo "INSTALL_CMD = $INSTALL_CMD"
RUN $INSTALL_CMD
RUN [ "$FRAMEWORK" != "pytorch" ] && echo "`deepspeed-testing` installation is skipped" || python3 -m pip install --no-cache-dir ./transformers[deepspeed-testing]
# Remove `accelerate`: it requires `torch`, and this causes import issues for TF-only testing
# We will install `accelerate@main` in Past CI workflow file
RUN python3 -m pip uninstall -y accelerate
# Uninstall `torch-tensorrt` and `apex` shipped with the base image
RUN python3 -m pip uninstall -y torch-tensorrt apex
# Pre-build **nightly** release of DeepSpeed, so it would be ready for testing (otherwise, the 1st deepspeed test will timeout)
RUN python3 -m pip uninstall -y deepspeed
# This has to be run inside the GPU VMs running the tests. (So far, it fails here due to GPU checks during compilation.)
# Issue: https://github.com/deepspeedai/DeepSpeed/issues/2010
# RUN git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build && \
# DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 DS_BUILD_UTILS=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -1,4 +1,4 @@
FROM rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.7.1
FROM rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -10,8 +10,8 @@ RUN apt update && \
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy importlib-metadata setuptools wheel ninja pytesseract "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy
RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
ARG REF=main
WORKDIR /
@ -23,6 +23,9 @@ RUN git clone https://github.com/huggingface/transformers && cd transformers &&
# Install transformers
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video,audio]
# Remove tensorflow and flax as they are no longer supported by transformers
RUN python3 -m pip uninstall -y tensorflow flax
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
@ -39,7 +42,6 @@ RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"
# Install flash attention from source. Tested with commit 6387433156558135a998d5568a9d74c1778666d8
RUN git clone https://github.com/ROCm/flash-attention/ -b tridao && \
cd flash-attention && \
GPU_ARCHS="gfx942" python setup.py install
# GPU_ARCHS builds for MI300, MI325 but not MI355: we would need to add `;gfx950` but it takes too long to build.
GPU_ARCHS="gfx942" python setup.py install
RUN python3 -m pip install --no-cache-dir einops

View File

@ -29,7 +29,7 @@ RUN python3 -m pip uninstall -y apex torch torchvision torchaudio
RUN python3 -m pip install torch==$PYTORCH torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO --index-url https://download.pytorch.org/whl/rocm$ROCM --no-cache-dir
# Pre-build DeepSpeed, so it's be ready for testing (to avoid timeout)
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache-dir -v --disable-pip-version-check 2>&1
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache-dir -v --disable-pip-version-check 2>&1
ARG REF=main
WORKDIR /

View File

@ -21,7 +21,7 @@ RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'p
# Install latest release PyTorch
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip uninstall -y torch torchvision torchaudio torchcodec && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -43,7 +43,7 @@ RUN python3 -m pip uninstall -y deepspeed
# This has to be run (again) inside the GPU VMs running the tests.
# The installation works here, but some tests fail, if we don't pre-build deepspeed again in the VMs running the tests.
# TODO: Find out why test fail.
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
# `kernels` may give different outputs (within 1e-5 range) even with the same model (weights) and the same inputs
RUN python3 -m pip uninstall -y kernels

View File

@ -25,6 +25,8 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN [ ${#TORCH_VISION} -gt 0 ] && VERSION='torchvision=='TORCH_VISION'.*' || VERSION='torchvision'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN [ ${#TORCH_AUDIO} -gt 0 ] && VERSION='torchaudio=='TORCH_AUDIO'.*' || VERSION='torchaudio'; python3 -m pip install --no-cache-dir -U $VERSION --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y tensorflow flax
RUN python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"

View File

@ -3,10 +3,11 @@ LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VER=3.12
ARG PYTHON_VER=3.11
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get remove -y python3.10 && apt-get autoremove -y
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
@ -22,6 +23,7 @@ RUN apt-get update && \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
@ -33,6 +35,7 @@ RUN apt-get update && \
rsync \
sudo \
libnl-genl-3-200 \
xpu-smi \
unzip \
ffmpeg \
tesseract-ocr \
@ -42,47 +45,34 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install -y \
linux-headers-$(uname -r) linux-modules-extra-$(uname -r) \
linux-headers-$(uname -r) \
linux-modules-extra-$(uname -r) \
flex bison \
intel-fw-gpu intel-i915-dkms xpu-smi intel-ocloc clinfo\
intel-fw-gpu intel-i915-dkms xpu-smi \
intel-opencl-icd libze-intel-gpu1 libze1 \
intel-media-va-driver-non-free libmfx-gen1 libvpl2 \
libegl-mesa0 libegl1 libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo intel-ocloc \
libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev libze-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Use virtual env because Ubuntu-24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VER} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install --upgrade pip
RUN pip install triton==3.3.0
RUN pip install --upgrade pip wheel
RUN pip install triton==3.4.0
RUN pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install torch==2.8.0+xpu torchvision==0.23.0+xpu torchaudio==2.8.0+xpu --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install evaluate torchdata pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree
RUN pip install gguf hqq compressed_tensors gptqmodel mergekit autoawq deepspeed torchao onnx
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft
RUN pip install torchcodec torchdata --no-cache-dir
RUN pip install evaluate pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree setuptools
RUN pip install gptqmodel --no-build-isolation
RUN pip install gguf hqq compressed_tensors autoawq deepspeed torchao onnx auto_round
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft diffusers trl kernels
# install liger-kernel
RUN pip install git+https://github.com/linkedin/Liger-Kernel.git --extra-index-url https://download.pytorch.org/whl/test/xpu
# install mergekit
RUN pip install --break-system-packages git+https://github.com/arcee-ai/mergekit.git@v0.1.3
# install bitsandbytes
RUN pip install git+https://github.com/bitsandbytes-foundation/bitsandbytes.git

View File

@ -12,6 +12,8 @@ SHELL ["sh", "-lc"]
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# Disable kernel mapping for quantization tests
ENV DISABLE_KERNEL_MAPPING=1
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
@ -24,7 +26,7 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN echo torch=$VERSION
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build.
# Currently, let's just use their latest releases (when `torch` is installed with a release version)
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -50,7 +52,7 @@ RUN python3 -m pip install --no-cache-dir hqq
RUN python3 -m pip install --no-cache-dir gguf
# Add autoawq for quantization testing
RUN python3 -m pip install --no-cache-dir --no-build-isolation autoawq[kernels]
RUN python3 -m pip install --no-cache-dir autoawq[kernels]
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir optimum-quanto
@ -81,7 +83,7 @@ RUN python3 -m pip uninstall -y flash-attn
RUN cd transformers && python3 setup.py develop
# Add fp-quant for quantization testing
RUN python3 -m pip install --no-cache-dir "fp-quant>=0.3.2"
RUN python3 -m pip install --no-cache-dir "fp-quant>=0.2.0"
# Low usage or incompatible lib, will enable later on

View File

@ -0,0 +1,25 @@
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-tensorflow,testing]
# If set to nothing, will install the latest version
ARG TENSORFLOW='2.13'
RUN [ ${#TENSORFLOW} -gt 0 ] && VERSION='tensorflow=='$TENSORFLOW'.*' || VERSION='tensorflow'; python3 -m pip install --no-cache-dir -U $VERSION
RUN python3 -m pip uninstall -y torch flax
RUN python3 -m pip install -U "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir -U "tensorflow_probability<0.22"
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop

View File

@ -24,7 +24,7 @@ pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to work around it.
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
Then you need to install our special tool that builds the documentation:
@ -38,7 +38,7 @@ pip install git+https://github.com/huggingface/doc-builder
## Building the documentation
Once you have set up the `doc-builder` and additional packages, you can generate the documentation by
Once you have setup the `doc-builder` and additional packages, you can generate the documentation by
typing the following command:
```bash
@ -295,11 +295,12 @@ Here's an example of a tuple return, comprising several objects:
Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate them to this dataset.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
## Styling the docstring
We have an automatic script running with the `make style` command that will make sure that:
We have an automatic script running with the `make style` comment that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library

View File

@ -123,6 +123,10 @@
title: تشغيل التدريب على Amazon SageMaker
- local: serialization
title: التصدير إلى ONNX
- local: tflite
title: التصدير إلى TFLite
- local: torchscript
title: التصدير إلى TorchScript
- local: notebooks
title: دفاتر الملاحظات مع الأمثلة
- local: community
@ -180,6 +184,8 @@
# title: التدريب الفعال على وحدة المعالجة المركزية (CPU)
# - local: perf_train_cpu_many
# title: التدريب الموزع لوحدة المعالجة المركزية (CPU)
# - local: perf_train_tpu_tf
# title: التدريب على (TPU) باستخدام TensorFlow
# - local: perf_train_special
# title: تدريب PyTorch على Apple silicon
# - local: perf_hardware
@ -197,6 +203,8 @@
# title: إنشاء نموذج كبير
# - local: debugging
# title: تصحيح الأخطاء البرمجية
# - local: tf_xla
# title: تكامل XLA لنماذج TensorFlow
# - local: perf_torch_compile
# title: تحسين الاستدلال باستخدام `torch.compile()`
# title: الأداء وقابلية التوسع
@ -252,12 +260,16 @@
# title: التكوين
# - local: main_classes/data_collator
# title: مجمع البيانات
# - local: main_classes/keras_callbacks
# title: استدعاءات Keras
# - local: main_classes/logging
# title: التسجيل
# - local: main_classes/model
# title: النماذج
# - local: main_classes/text_generation
# title: توليد النصوص
# - local: main_classes/onnx
# title: ONNX
# - local: main_classes/optimizer_schedules
# title: التحسين
# - local: main_classes/output

View File

@ -52,7 +52,7 @@
<figcaption class="mt-2 text-center text-sm text-gray-500">الصورة توضح مخطط مراحل نموذج Swin.</figcaption>
</div>
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PreTrainedConfig.from_pretrained`]:
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PretrainedConfig.from_pretrained`]:
* `out_indices` هو فهرس الطبقة التي تريد الحصول على خريطة الميزات منها
* `out_features` هو اسم الطبقة التي تريد الحصول على خريطة الميزات منها
@ -115,6 +115,8 @@
## النموذج التلقائي (AutoModel)
<frameworkcontent>
<pt>
تسمح لك فئات `AutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -141,4 +143,25 @@
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</pt>
<tf>
أخيرًا، تسمح لك فئات `TFAutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام نفس نقطة التفتيش لتحميل بنية لمهمة مختلفة:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `TFAutoModelFor` لتحميل نسخ لنماذج مُدربة مسبقًا. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، ستتعرف على كيفية استخدام المُجزّئ اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</tf>
</frameworkcontent>

View File

@ -54,19 +54,19 @@ DistilBertConfig {
```
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PreTrainedConfig.from_pretrained`] :
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PretrainedConfig.from_pretrained`] :
```py
>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PreTrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PretrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
```py
>>> my_config.save_pretrained(save_directory="./your_model_save_path")
```
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PreTrainedConfig.from_pretrained`]:
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PretrainedConfig.from_pretrained`]:
```py
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
@ -81,6 +81,8 @@ DistilBertConfig {
الخطوة التالية هي إنشاء [نموذج](main_classes/models). النموذج - ويُشار إليه أحيانًا باسم البنية - يُحدد وظيفة كل طبقة والعمليات الحسابية المُنفذة. تُستخدم خصائص مثل `num_hidden_layers` من التكوين لتحديد هذه البنية. تشترك جميع النماذج في فئة أساسية واحدة هي [`PreTrainedModel`] وبعض الوظائف المُشتركة مثل غيير حجم مُدخلات الكلمات وتقليص رؤوس آلية الانتباه الذاتي. بالإضافة إلى ذلك، فإن جميع النماذج هي فئات فرعية إما من [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)، [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) أو [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . هذا يعني النماذج متوافقة مع كل استخدام لإطار العمل الخاص بها.
<frameworkcontent>
<pt>
قم بتحميل خصائص التكوين المخصصة الخاصة بك في النموذج:
```py
@ -103,11 +105,39 @@ DistilBertConfig {
```py
>>> model = DistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</pt>
<tf>
قم بتحميل خصائص التكوين المُخصصة الخاصة بك في النموذج:
```py
>>> from transformers import TFDistilBertModel
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/my_config.json")
>>> tf_model = TFDistilBertModel(my_config)
```
هذا ينشئ نموذجًا بقيم عشوائية بدلاً من الأوزان المُدربة مسبقًا. لن يكون هذا النموذج مفيدًا حتى يتم تدريبه. تُعد عملية التدريب مكلفة وتستغرق وقتًا طويلاً. من الأفضل بشكل عام استخدام نموذج مُدرب مسبقًا للحصول على نتائج أفضل بشكل أسرع، مع استخدام جزء بسيط فقط من الموارد المطلوبة للتدريب.
قم بإنشاء نموذج مُدرب مسبقًا باستخدام [`~TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased")
```
عندما تقوم بتحميل الأوزان المُدربة مسبقًا،يتم تحميل إعدادات النموذج الافتراضي تلقائيًا إذا كان النموذج من مكتبة 🤗 Transformers. ومع ذلك، يمكنك أيضًا استبدال - بعض أو كل - إعدادات النموذج الافتراضية بإعداداتك الخاصة:
```py
>>> tf_model = TFDistilBertModel.from_pretrained("distilbert/distilbert-base-uncased"، config=my_config)
```
</tf>
</frameworkcontent>
### رؤوس النموذج
في هذه المرحلة، لديك نموذج DistilBERT الأساسي الذي يخرج *حالات الكامنة*. تُمرَّر هذه الحالات الكامنة كمدخلات لرأس النموذج لإنتاج المخرجات النهائية. توفر مكتبة 🤗 Transformers رأس نموذج مختلف لكل مهمة طالما أن النموذج يدعم المهمة (أي لا يمكنك استخدام DistilBERT لمهمة تسلسل إلى تسلسل مثل الترجمة).
<frameworkcontent>
<pt>
على سبيل المثال، [`DistilBertForSequenceClassification`] هو نموذج DistilBERT الأساس مزودًا برأس تصنيف تسلسلي. يُشكّل رأس التصنيف التسلسلي طبقة خطية فوق المخرجات المجمعة.
```py
@ -123,6 +153,25 @@ DistilBertConfig {
>>> model = DistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</pt>
<tf>
على سبيل المثال، [`TFDistilBertForSequenceClassification`] هو نموذج DistilBERT الأساسي برأس تصنيف تسلسل. رأس التصنيف التسلسلي هو طبقة خطية أعلى المخرجات المجمعة.
```py
>>> from transformers import TFDistilBertForSequenceClassification
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
أعد استخدام هذا نقطة التحقق لمهمة أخرى عن طريق التبديل إلى رأس نموذج مختلف. لمهمة الإجابة على الأسئلة، ستستخدم رأس النموذج [`TFDistilBertForQuestionAnswering`]. رأس الإجابة على الأسئلة مشابه لرأس التصنيف التسلسلي باستثناء أنه طبقة خطية أعلى حالات الإخراج المخفية.
```py
>>> from transformers import TFDistilBertForQuestionAnswering
>>> tf_model = TFDistilBertForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
</tf>
</frameworkcontent>
## مجزئ النصوص

View File

@ -20,11 +20,11 @@
في مثالنا، سنعدّل بعض الوسائط في فئة ResNet التي قد نرغب في ضبطها. ستعطينا التكوينات المختلفة أنواع ResNets المختلفة الممكنة. سنقوم بتخزين هذه الوسائط بعد التحقق من صحته.
```python
from transformers import PreTrainedConfig
from transformers import PretrainedConfig
from typing import List
class ResnetConfig(PreTrainedConfig):
class ResnetConfig(PretrainedConfig):
model_type = "resnet"
def __init__(
@ -58,11 +58,11 @@ class ResnetConfig(PreTrainedConfig):
```
الأشياء الثلاثة المهمة التي يجب تذكرها عند كتابة تكوينك الخاص هي:
- يجب أن ترث من `PreTrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PreTrainedConfig` أي معامﻻت إضافية kwargs،
- يجب أن ترث من `PretrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PretrainedConfig` أي معامﻻت إضافية kwargs،
- يجب تمرير هذه المعامﻻت الإضافية إلى دالة `__init__` فى الفئة الأساسية الاعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PreTrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PretrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
تحديد `model_type` لتكوينك (هنا `model_type="resnet"`) ليس إلزاميًا، ما لم ترغب في
تسجيل نموذجك باستخدام الفئات التلقائية (راجع القسم الأخير).
@ -82,7 +82,7 @@ resnet50d_config.save_pretrained("custom-resnet")
resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
```
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PreTrainedConfig`]، مثل [`~PreTrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PretrainedConfig`]، مثل [`~PretrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
## كتابة نموذج مخصص

View File

@ -60,10 +60,10 @@ pip install transformers bitsandbytes>=0.39.0 -q
أولاً، تحتاج إلى تحميل النموذج.
```py
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... )
```
@ -113,12 +113,12 @@ pip install transformers bitsandbytes>=0.39.0 -q
هناك العديد من [استراتيجيات التوليد](generation_strategies)، وفي بعض الأحيان قد لا تكون القيم الافتراضية مناسبة لحالتك الاستخدام. إذا لم تكن الإخراج الخاصة بك متوافقة مع ما تتوقعه، فقد قمنا بإنشاء قائمة بأكثر الأخطاء الشائعة وكيفية تجنبها.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
>>> tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... )
```
@ -192,7 +192,7 @@ LLMs هي [معماريات فك التشفير فقط](https://huggingface.co/l
```python
>>> tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
>>> model = AutoModelForCausalLM.from_pretrained(
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
... )
>>> set_seed(0)
>>> prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""

View File

@ -231,7 +231,7 @@ flush()
دعنا نرى ما هو استهلاك ذاكرة GPU الذروة الذي يوفره تكميم 4 بت. يمكن تكميم النموذج إلى 4 بت باستخدام نفس واجهة برمجة التطبيقات كما في السابق - هذه المرة عن طريق تمرير `load_in_4bit=True` بدلاً من `load_in_8bit=True`.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", quantization_config=BitsAndBytesConfig(load_in_4bit=True), pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", load_in_4bit=True, pad_token_id=0)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
@ -329,6 +329,174 @@ $$ \textbf{O}_i \leftarrow s^a_{ij} * \textbf{O}_i + s^b_{ij} * \mathbf{V}_{j} \
لنلقِ نظرة على مثال عملي.
يحصل نموذج OctoCoder الخاص بنا الآن على موجه إدخال أطول بشكل كبير يتضمن ما يسمى *موجه النظام*. تُستخدم موجهات النظام لتوجيه LLM إلى مساعد أفضل مصمم لمهام المستخدمين.
فيما يلي، نستخدم موجه النظام الذي سيجعل OctoCoder مساعد ترميز أفضل.
```python
system_prompt = """Below are a series of dialogues between various people and an AI technical assistant.
The assistant tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble but knowledgeable.
The assistant is happy to help with code questions and will do their best to understand exactly what is needed.
It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer.
That said, the assistant is practical really does its best, and doesn't let caution get too much in the way of being useful.
The Starcoder models are a series of 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2) (excluding opt-out requests).
The model uses Multi Query Attention, was trained using the Fill-in-the-Middle objective, and with 8,192 tokens context window for a trillion tokens of heavily deduplicated data.
-----
Question: Write a function that takes two lists and returns a list that has alternating elements from each input list.
Answer: Sure. Here is a function that does that.
def alternating(list1, list2):
results = []
for i in range(len(list1)):
results.append(list1[i])
results.append(list2[i])
return results
Question: Can you write some test cases for this function?
Answer: Sure, here are some tests.
assert alternating([10, 20, 30], [1, 2, 3]) == [10, 1, 20, 2, 30, 3]
assert alternating([True, False], [4, 5]) == [True, 4, False, 5]
assert alternating([], []) == []
Question: Modify the function so that it returns all input elements when the lists have uneven length. The elements from the longer list should be at the end.
Answer: Here is the modified function.
def alternating(list1, list2):
results = []
for i in range(min(len(list1), len(list2))):
results.append(list1[i])
results.append(list2[i])
if len(list1) > len(list2):
results.extend(list1[i+1:])
else:
results.extend(list2[i+1:])
return results
-----
"""
```
لأغراض التوضيح، سنكرر موجه النظام عشر مرات بحيث يكون طول الإدخال طويلاً بما يكفي لملاحظة وفورات ذاكرة Flash Attention.
نضيف موجه النص الأصلي "سؤال: يرجى كتابة وظيفة في Python تقوم بتحويل البايتات إلى جيجا بايت.
```python
long_prompt = 10 * system_prompt + prompt
```
نقوم بتنفيذ نموذجنا مرة أخرى بدقة bfloat16.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("bigcode/octocoder")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
```
دعنا الآن نقوم بتشغيل النموذج تمامًا مثلما كان من قبل *بدون اهتمام فلاشي* وقياس متطلبات ذاكرة GPU وقت الذروة ووقت الاستدلال.
```python
import time
start_time = time.time()
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 10.96854019165039 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس الإخراج كما كان من قبل، ولكن هذه المرة، يقوم النموذج بتكرار الإجابة عدة مرات حتى يتم قطعها عند 60 رمزًا. ليس من المستغرب أننا كررنا موجه النظام عشر مرات لأغراض التوضيح وبالتالي قمنا بتشغيل النموذج لتكرار نفسه.
**ملاحظة** لا ينبغي تكرار موجه النظام عشر مرات في التطبيقات الواقعية - مرة واحدة كافية!
دعنا نقيس متطلبات ذاكرة GPU وقت الذروة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
37.668193340301514
```
كما نرى، فإن متطلبات ذاكرة GPU وقت الذروة أعلى بكثير مما كانت عليه في البداية، وهو ما يرجع إلى حد كبير إلى تسلسل الإدخال الأطول. أيضًا، يستغرق التوليد أكثر من دقيقة بقليل الآن.
نستدعي `flush()` لتحرير ذاكرة GPU لتجربتنا التالية.
```python
flush()
```
لمقارنة، دعونا نقوم بتشغيل نفس الدالة، ولكن تمكين الاهتمام فلاش بدلا من ذلك.
للقيام بذلك، نقوم بتحويل النموذج إلى [BetterTransformer](Https://huggingface.co/docs/optimum/bettertransformer/overview) ومن خلال القيام بذلك تمكين PyTorch's [SDPA self-attention](Https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) والتي بدورها قادرة على استخدام الاهتمام فلاش.
```python
model.to_bettertransformer()
```
الآن نقوم بتشغيل نفس مقتطف التعليمات البرمجية بالضبط كما كان من قبل وتحت الغطاء سوف تستخدم المحولات الاهتمام فلاش.
```py
start_time = time.time()
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 3.0211617946624756 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس النتيجة بالضبط كما كان من قبل، ولكن يمكننا ملاحظة تسريع كبير بفضل الاهتمام فلاش.
دعنا نقيس استهلاك الذاكرة لآخر مرة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
32.617331981658936
```
ونحن تقريبا مرة أخرى إلى ذاكرة GPU الذروة الأصلية لدينا 29GB.
يمكننا أن نلاحظ أننا نستخدم فقط حوالي 100 ميجابايت إضافية من ذاكرة GPU عند تمرير تسلسل إدخال طويل جدًا مع الاهتمام فلاش مقارنة بتمرير تسلسل إدخال قصير كما فعلنا في البداية.
```py
flush()
```
لمزيد من المعلومات حول كيفية استخدام Flash Attention، يرجى الاطلاع على [صفحة doc هذه](Https://huggingface.co/docs/transformers/en/perf_infer_gpu_one#flashattention-2).
## 3. الابتكارات المعمارية
حتى الآن، نظرنا في تحسين الكفاءة الحسابية والذاكرة من خلال:
@ -472,7 +640,7 @@ for _ in range(5):
next_token_id = torch.argmax(next_logits, dim=-1)
print("shape of input_ids", next_token_id.shape)
print("length of key-value cache", past_key_values.get_seq_length()) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
print("length of key-value cache", len(past_key_values[0][0])) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
generated_tokens.append(next_token_id.item())
generated_text = tokenizer.batch_decode(generated_tokens)

View File

@ -65,15 +65,43 @@ pip install huggingface_hub
تحويل نقطة التحقق لإطار عمل آخر أمر سهل. تأكد من تثبيت PyTorch و TensorFlow (راجع [هنا](installation) لتعليمات التثبيت)، ثم ابحث عن النموذج الملائم لمهمتك في الإطار الآخر.
<frameworkcontent>
<pt>
حدد `from_tf=True` لتحويل نقطة تحقق من TensorFlow إلى PyTorch:
```py
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
```
</pt>
<tf>
حدد `from_pt=True` لتحويل نقطة تحقق من PyTorch إلى TensorFlow:
```py
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
بعد ذلك، يمكنك حفظ نموذج TensorFlow الجديد بنقطة التحقق الجديدة:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
```
</tf>
<jax>
إذا كان النموذج متاحًا في Flax، فيمكنك أيضًا تحويل نقطة تحقق من PyTorch إلى Flax:
```py
>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
... "path/to/awesome-name-you-picked", from_pt=True
... )
```
</jax>
</frameworkcontent>
## دفع نموذج أثناء التدريب
<frameworkcontent>
<pt>
<Youtube id="Z1-XMy-GNLQ"/>
مشاركة نموذجك على Hub مر بسيط للغاية كل ما عليك هو إضافة معلمة أو استدعاء رد إضافي. كما تذكر من درس [التدريب الدقيق](training)، فإن فئة [`TrainingArguments`] هي المكان الذي تحدد فيه المعلمات الفائقة وخيارات التدريب الإضافية. تشمل إحدى خيارات التدريب هذه القدرة على دفع النموذج مباشرة إلى المنصة Hub. قم بتعيين `push_to_hub=True` في [`TrainingArguments`]:
@ -99,6 +127,29 @@ pip install huggingface_hub
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
شارك نموذجًا على Hub باستخدام [`PushToHubCallback`]. في دالة [`PushToHubCallback`], أضف:
- دليل إخراج لنموذجك.
- مُجزّئ اللغوي.
- `hub_model_id`، والذي هو اسم مستخدم Hub واسم النموذج الخاص بك.
```py
>>> from transformers import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
... )
```
أضف الاستدعاء إلى [`fit`](https://keras.io/api/models/model_training_apis/)، وسيقوم 🤗 Transformers بدفع النموذج المدرب إلى Hub:
```py
>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
```
</tf>
</frameworkcontent>
## استخدام دالة `push_to_hub`
@ -169,4 +220,4 @@ pip install huggingface_hub
* قم بإنشاء ملف `README.md` وتحميله يدويًا.
* انقر فوق الزر **Edit model card** في مستودع نموذجك.
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).

View File

@ -152,6 +152,8 @@ pip install datasets
قم بتعيين معلمة `return_tensors` إلى إما `pt` لـ PyTorch، أو `tf` لـ TensorFlow:
<frameworkcontent>
<pt>
```py
>>> batch_sentences = [
@ -171,6 +173,33 @@ pip install datasets
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
</pt>
<tf>
```py
>>> batch_sentences = [
... "But what about second breakfast?",
... "Don't think he knows about second breakfast, Pip.",
... "What about elevensies?",
... ]
>>> encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(encoded_input)
{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0],
[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
[101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}
```
</tf>
</frameworkcontent>
<Tip>

View File

@ -12,10 +12,20 @@
ستحتاج أيضًا إلى تثبيت إطار عمل التعلم الآلي المفضل لديك:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
## خط الأنابيب
@ -112,6 +122,8 @@ label: NEGATIVE, with score: 0.5309
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
استخدم [`AutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `AutoClass` في القسم التالي):
```py
@ -120,6 +132,18 @@ label: NEGATIVE, with score: 0.5309
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
استخدم [`TFAutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `TFAutoClass` في القسم التالي):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
حدد النموذج والمعالج في [`pipeline`]. الآن يمكنك تطبيق `classifier` على النص الفرنسي:
@ -168,6 +192,8 @@ label: NEGATIVE, with score: 0.5309
يمكن المجزئ أيضًا قبول قائمة من المدخلات، ويقوم بـ "حشو" و"تقصير" النص لإرجاع كدفعة بطول موحد:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
@ -178,6 +204,20 @@ label: NEGATIVE, with score: 0.5309
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
<Tip>
@ -187,6 +227,8 @@ label: NEGATIVE, with score: 0.5309
### AutoModel
<frameworkcontent>
<pt>
تقدم مكتبة 🤗 Transformers طريقة بسيطة وموحدة لتحميل نماذج مدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`AutoModel`] كما لو كنت تقوم بتحميل [`AutoTokenizer`]. الفرق الوحيد هو اختيار فئة [`AutoModel`] المناسبة للمهمة. بالنسبة لتصنيف النص (أو التسلسل)، يجب عليك تحميل [`AutoModelForSequenceClassification`]:
```py
@ -222,6 +264,39 @@ label: NEGATIVE, with score: 0.5309
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
يوفر 🤗 Transformers طريقة بسيطة وموحدة لتحميل مثيلات مُدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`TFAutoModel`] مثل تحميل [`AutoTokenizer`]. والفرق الوحيد هو تحديد [`TFAutoModel`] الصحيح للمهمة. للتصنيف النصي (أو التسلسلي)، يجب تحميل [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
راجع [ملخص المهام](./task_summary) للمهام المدعومة بواسطة فئة [`AutoModel`].
</Tip>
الآن، مرر دفعة المدخلات المعالجة مسبقًا مباشرة إلى النموذج. يمكنك تمرير المصفوفات كما هي:
```py
>>> tf_outputs = tf_model(tf_batch)
```
يقوم النموذج بإخراج التنشيطات النهائية في سمة `logits`. طبق دالة softmax على `logits` لاسترداد الاحتمالات:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
@ -231,6 +306,8 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
### حفظ النموذج
<frameworkcontent>
<pt>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`PreTrainedModel.save_pretrained`]:
```py
@ -244,9 +321,28 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
بمجرد ضبط نموذجك، يمكنك حفظه مع برنامج الترميز الخاص به باستخدام [`TFPreTrainedModel.save_pretrained`]:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
عندما تكون مستعدًا لاستخدام النموذج مرة أخرى، أعد تحميله باستخدام [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
من الميزات الرائعة في 🤗 Transformers القدرة على حفظ نموذج وإعادة تحميله كنموذج PyTorch أو TensorFlow. يمكن أن يحول معامل `from_pt` أو `from_tf` النموذج من إطار عمل إلى آخر:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
@ -254,6 +350,17 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</tf>
</frameworkcontent>
## إنشاء نماذج مخصصة
@ -268,6 +375,8 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`AutoModel.from_config`]:
```py
@ -275,6 +384,17 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
قم بإنشاء نموذج من تكوينك المخصص باستخدام [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
الق نظرة على دليل [إنشاء بنية مخصصة](./create_a_model) لمزيد من المعلومات حول بناء التكوينات المخصصة.

View File

@ -76,6 +76,8 @@ pip install -r requirements.txt
## تشغيل نص برمجي
<frameworkcontent>
<pt>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) على بنية تدعم الملخص.
@ -93,8 +95,31 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets/).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام Keras على بنية تدعم الملخص.
- يوضح المثال التالي كيفية ضبط نموذج [T5-small](https://huggingface.co/google-t5/t5-small) على مجموعة بيانات [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail).
- يتطلب نموذج T5 ماعمل `source_prefix` إضافية بسبب الطريقة التي تم تدريبه بها. يتيح هذا المطالبة لـ T5 معرفة أن هذه مهمة التلخيص.
```bash
python examples/tensorflow/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## التدريب الموزع والدقة المختلطة
@ -116,6 +141,7 @@ torchrun \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -123,6 +149,8 @@ torchrun \
## تشغيل نص برمجي على وحدة معالجة الدقة الفائقة (TPU)
<frameworkcontent>
<pt>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. يدعم PyTorch وحدات معالجة الدقة الفائقة (TPUs) مع [XLA](https://www.tensorflow.org/xla) مجمع الدقة الفائقة للتعلم العميق (راجع [هنا](https://github.com/pytorch/xla/blob/master/README.md) لمزيد من التفاصيل). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتشغيل نص `xla_spawn.py` البرمجي واستخدم معامل `num_cores` لتعيين عدد وحدات معالجة الدقة الفائقة (TPU) التي تريد استخدامها.
@ -138,8 +166,28 @@ python xla_spawn.py --num_cores 8 \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. تستخدم نصوص TensorFlow البرمجية استراتيجية [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) للتدريب على وحدات معالجة الدقة الفائقة (TPUs). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتمرير اسم مورد وحدة معالجة الدقة الفائقة (TPU) إلى حجة `tpu`.
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## تشغيل نص برمجي باستخدام 🤗 Accelerate
@ -194,6 +242,7 @@ python examples/pytorch/summarization/run_summarization.py \
--summary_column summary_column_name \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--predict_with_generate
@ -221,6 +270,7 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -234,6 +284,8 @@ examples/pytorch/summarization/run_summarization.py -h
خيار آخر مفيد لتمكينه هو استئناف التدريب من نقطة تفتيش سابقة. سيضمن ذلك أنك تستطيع الاستمرار من حيث توقفت دون البدء من جديد إذا تم مقاطعة تدريبك. هناك طريقتان لاستئناف التدريب من نقطة تفتيش.
تستخدم الطريقة الأولى المعلمة `output_dir previous_output_dir` لاستئناف التدريب من أحدث نقطة تفتيش مخزنة في `output_dir`. في هذه الحالة، يجب عليك إزالة `overwrite_output_dir`:
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
@ -245,6 +297,24 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--output_dir previous_output_dir \
--predict_with_generate
```
تستخدم الطريقة الثانية معلمة `resume_from_checkpoint path_to_specific_checkpoint` لاستئناف التدريب من مجلد نقطة تفتيش محددة.
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--resume_from_checkpoint path_to_specific_checkpoint \
--predict_with_generate
```
@ -276,5 +346,6 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```

View File

@ -32,7 +32,7 @@
لتصدير نموذج 🤗 Transformers إلى ONNX، قم أولاً بتثبيت اعتماد إضافي:
```bash
pip install optimum-onnx
pip install optimum[exporters]
```
للاطلاع على جميع المعامﻻت المتاحة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)، أو عرض المساعدة في سطر الأوامر:
@ -111,3 +111,60 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
### تصدير نموذج لهندسة غير مدعومة
إذا كنت ترغب في المساهمة من خلال إضافة دعم لنموذج لا يُمكن تصديره حاليًا، فيجب عليك أولاً التحقق مما إذا كان مدعومًا في [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)، وإذا لم يكن مدعومًا، [فيمكنك المساهمة في 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) مُباشرةً.
### تصدير نموذج باستخدام `transformers.onnx`
<Tip warning={true}>
لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
</Tip>
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:
```bash
pip install transformers[onnx]
```
استخدم حزمة `transformers.onnx` كنموذج Python لتصدير نقطة حفظ باستخدام تكوين جاهز:
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
يُصدّر هذا رسمًا بيانيًا ONNX لنقطة الحفظ المُحددة بواسطة وسيطة `--model`. مرر أي نقطة حفظ على 🤗 Hub أو نقطة حفظ مُخزنة محليًا.
يُمكن بعد ذلك تشغيل ملف `model.onnx` الناتج على أحد المُسرعات العديدة التي تدعم معيار ONNX. على سبيل المثال، قم بتحميل وتشغيل النموذج باستخدام ONNX Runtime كما يلي:
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # يتوقع ONNX Runtime مصفوفات NumPy كمدخلات
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
يُمكن الحصول على أسماء المخرجات المطلوبة (مثل `["last_hidden_state"]`) من خلال إلقاء نظرة على تكوين ONNX لكل نموذج. على سبيل المثال، بالنسبة لـ DistilBERT، لدينا:
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
العمليات مُتطابقة لنقاط الحفظ TensorFlow على Hub. على سبيل المثال، صدّر نقطة حفظ TensorFlow خالصة كما يلي:
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
```
لتصدير نموذج مُخزن محليًا، احفظ أوزان النموذج ومجزىء اللغوى في نفس الدليل (على سبيل المثال `local-pt-checkpoint`)، ثم قم بتصديره إلى ONNX عن طريق توجيه وسيط `--model` لحزمة `transformers.onnx` إلى الدليل المطلوب:
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -182,6 +182,8 @@ pip install transformers datasets evaluate
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأفضل أن تقوم بـ *الحشو الديناميكي* للجمل إلى الطول الأطول في الدفعة أثناء التجميع، بدلاً من حشو كامل المجموعة من البيانات إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز للحشو، وحدد `mlm_probability` لحجب الرموز بشكل عشوائي عند كل تكرار للبيانات:
```py
@ -191,9 +193,23 @@ pip install transformers datasets evaluate
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
```
</pt>
<tf>
استخدم رمز نهاية التسلسل كرمز للحشو، وحدد `mlm_probability` لحجب الرموز بشكل عشوائي عند كل تكرار للبيانات:
```py
>>> from transformers import DataCollatorForLanguageModeling
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -251,6 +267,75 @@ Perplexity: 49.61
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتدريب نموذج باستخدام Keras، اطلع على [البرنامج التعليمي الأساسي](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لتدريب نموذج في TensorFlow، ابدأ بإعداد دالة المحسن، وجدول معدل التعلم، وبعض معاملات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل DistilGPT2 باستخدام [`TFAutoModelForCausalLM`]:
```py
>>> from transformers import TFAutoModelForCausalLM
>>> model = TFAutoModelForCausalLM.from_pretrained("distilbert/distilgpt2")
```
حول مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... lm_dataset["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... lm_dataset["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة الافتراضية، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # لا يوجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومجمّع البيانات في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_eli5_clm-model",
... tokenizer=tokenizer,
... )
```
أخيراً، أنت جاهز لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العصور، والتعليقات الخاصة بك لتدريب النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -280,6 +365,8 @@ Perplexity: 49.61
[{'generated_text': "Somatic hypermutation allows the immune system to be able to effectively reverse the damage caused by an infection.\n\n\nThe damage caused by an infection is caused by the immune system's ability to perform its own self-correcting tasks."}]
```
<frameworkcontent>
<pt>
قسم النص وإرجع `input_ids` كتنسورات PyTorch:
```py
@ -305,3 +392,31 @@ Perplexity: 49.61
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
["Somatic hypermutation allows the immune system to react to drugs with the ability to adapt to a different environmental situation. In other words, a system of 'hypermutation' can help the immune system to adapt to a different environmental situation or in some cases even a single life. In contrast, researchers at the University of Massachusetts-Boston have found that 'hypermutation' is much stronger in mice than in humans but can be found in humans, and that it's not completely unknown to the immune system. A study on how the immune system"]
```
</pt>
<tf>
قم بتقسيم النص وإرجاع `input_ids` كـ TensorFlow tensors:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_eli5_clm-model")
>>> inputs = tokenizer(prompt, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الملخص. للمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والبارامترات للتحكم في التوليد، راجع صفحة [استراتيجيات توليد النص](../generation_strategies).
```py
>>> from transformers import TFAutoModelForCausalLM
>>> model = TFAutoModelForCausalLM.from_pretrained("username/my_awesome_eli5_clm-model")
>>> outputs = model.generate(input_ids=inputs, max_new_tokens=100, do_sample=True, top_k=50, top_p=0.95)
```
فك ترميز الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.batch_decode(outputs, skip_special_tokens=True)
['Somatic hypermutation allows the immune system to detect the presence of other viruses as they become more prevalent. Therefore, researchers have identified a high proportion of human viruses. The proportion of virus-associated viruses in our study increases with age. Therefore, we propose a simple algorithm to detect the presence of these new viruses in our samples as a sign of improved immunity. A first study based on this algorithm, which will be published in Science on Friday, aims to show that this finding could translate into the development of a better vaccine that is more effective for']
```
</tf>
</frameworkcontent>

View File

@ -176,6 +176,8 @@ pip install transformers datasets evaluate
الآن، قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأكثر كفاءة أن تقوم بـ *الحشو الديناميكي* ليصل طولها إلى أطول جملة في الدفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز الحشو وحدد `mlm_probability` لحجب الرموز عشوائياً كل مرة تكرر فيها البيانات:
@ -185,9 +187,23 @@ pip install transformers datasets evaluate
>>> tokenizer.pad_token = tokenizer.eos_token
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)
```
</pt>
<tf>
استخدم رمز نهاية التسلسل كرمز الحشو وحدد `mlm_probability` لحجب الرموز عشوائياً كل مرة تكرر فيها البيانات:
```py
>>> from transformers import DataCollatorForLanguageModeling
>>> data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -247,6 +263,75 @@ Perplexity: 8.76
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام Keras، ألق نظرة على الدليل الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لتعديل نموذج في TensorFlow، ابدأ بإعداد دالة محسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل DistilRoBERTa باستخدام [`TFAutoModelForMaskedLM`]:
```py
>>> from transformers import TFAutoModelForMaskedLM
>>> model = TFAutoModelForMaskedLM.from_pretrained("distilbert/distilroberta-base")
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... lm_dataset["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... lm_dataset["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers لديها جميعها دالة خسارة افتراضية ذات صلة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة ما لم تكن تريد ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # لا توجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالج الرموز في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_eli5_mlm_model",
... tokenizer=tokenizer,
... )
```
أخيراً، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد العصور، والتعليقات الخاصة بك لتعديل النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائياً إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -287,6 +372,8 @@ Perplexity: 8.76
'sequence': 'The Milky Way is a small galaxy.'}]
```
<frameworkcontent>
<pt>
قم بتجزئة النص وإرجاع `input_ids` كمتجهات PyTorch. ستحتاج أيضًا إلى تحديد موضع رمز `<mask>`:
```py
@ -318,3 +405,38 @@ The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.
```
</pt>
<tf>
قم بتقسيم النص إلى رموز وإرجاع `input_ids` كـ TensorFlow tensors. ستحتاج أيضًا إلى تحديد موضع رمز `<mask>`:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_eli5_mlm_model")
>>> inputs = tokenizer(text, return_tensors="tf")
>>> mask_token_index = tf.where(inputs["input_ids"] == tokenizer.mask_token_id)[0, 1]
```
قم بتمرير المدخلات إلى النموذج وإرجاع `logits` للرمز المقنع:
```py
>>> from transformers import TFAutoModelForMaskedLM
>>> model = TFAutoModelForMaskedLM.from_pretrained("username/my_awesome_eli5_mlm_model")
>>> logits = model(**inputs).logits
>>> mask_token_logits = logits[0, mask_token_index, :]
```
ثم قم بإرجاع الرموز الثلاثة المقنعة ذات الاحتمالية الأعلى وطباعتها:
```py
>>> top_3_tokens = tf.math.top_k(mask_token_logits, 3).indices.numpy()
>>> for token in top_3_tokens:
... print(text.replace(tokenizer.mask_token, tokenizer.decode([token])))
The Milky Way is a spiral galaxy.
The Milky Way is a massive galaxy.
The Milky Way is a small galaxy.
```
</tf>
</frameworkcontent>

View File

@ -116,6 +116,8 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
يقوم `DataCollatorForMultipleChoice` بتجميع جميع مدخلات النموذج، ويطبق الحشو، ثم يعيد تجميع النتائج في شكلها الأصلي:
<frameworkcontent>
<pt>
```py
>>> from dataclasses import dataclass
@ -156,6 +158,50 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
... batch["labels"] = torch.tensor(labels, dtype=torch.int64)
... return batch
```
</pt>
<tf>
```py
>>> from dataclasses import dataclass
>>> from transformers.tokenization_utils_base import PreTrainedTokenizerBase, PaddingStrategy
>>> from typing import Optional, Union
>>> import tensorflow as tf
>>> @dataclass
... class DataCollatorForMultipleChoice:
... """
... Data collator that will dynamically pad the inputs for multiple choice received.
... """
... tokenizer: PreTrainedTokenizerBase
... padding: Union[bool, str, PaddingStrategy] = True
... max_length: Optional[int] = None
... pad_to_multiple_of: Optional[int] = None
... def __call__(self, features):
... label_name = "label" if "label" in features[0].keys() else "labels"
... labels = [feature.pop(label_name) for feature in features]
... batch_size = len(features)
... num_choices = len(features[0]["input_ids"])
... flattened_features = [
... [{k: v[i] for k, v in feature.items()} for i in range(num_choices)] for feature in features
... ]
... flattened_features = sum(flattened_features, [])
... batch = self.tokenizer.pad(
... flattened_features,
... padding=self.padding,
... max_length=self.max_length,
... pad_to_multiple_of=self.pad_to_multiple_of,
... return_tensors="tf",
... )
... batch = {k: tf.reshape(v, (batch_size, num_choices, -1)) for k, v in batch.items()}
... batch["labels"] = tf.convert_to_tensor(labels, dtype=tf.int64)
... return batch
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -182,6 +228,8 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -235,6 +283,93 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فراجع الدرس الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل التعلم وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_train_epochs = 2
>>> total_train_steps = (len(tokenized_swag["train"]) // batch_size) * num_train_epochs
>>> optimizer, schedule = create_optimizer(init_lr=5e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
```
ثم يمكنك تحميل BERT باستخدام [`TFAutoModelForMultipleChoice`]:
```py
>>> from transformers import TFAutoModelForMultipleChoice
>>> model = TFAutoModelForMultipleChoice.from_pretrained("google-bert/bert-base-uncased")
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> data_collator = DataCollatorForMultipleChoice(tokenizer=tokenizer)
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_swag["train"],
... shuffle=True,
... batch_size=batch_size,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_swag["validation"],
... shuffle=False,
... batch_size=batch_size,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة مناسبة للمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> model.compile(optimizer=optimizer) # لا توجد وسيطة خسارة!
```
الخطوتان الأخيرتان قبل بدء التدريب هما: حساب دقة التنبؤات، وتوفير طريقة لرفع النموذج إلى Hub. ويمكن تحقيق ذلك باستخدام [استدعاءات Keras](../main_classes/keras_callbacks)
مرر دالتك `compute_metrics` إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك ومعالجك في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_model",
... tokenizer=tokenizer,
... )
```
ثم قم بتضمين الاستدعاءات معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! استدعِ[`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب والاستدعاءات لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=2, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -255,6 +390,8 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
>>> candidate2 = "The law applies to baguettes."
```
<frameworkcontent>
<pt>
قم بتحليل كل مطالبة وزوج إجابة مرشح وأعد تنسورات PyTorch. يجب عليك أيضًا إنشاء بعض `العلامات`:
```py
@ -282,3 +419,34 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
>>> predicted_class
0
```
</pt>
<tf>
قم بتحليل كل مطالبة وزوج إجابة مرشح وأعد موترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_swag_model")
>>> inputs = tokenizer([[prompt, candidate1], [prompt, candidate2]], return_tensors="tf", padding=True)
```
مرر مدخلاتك إلى النموذج وأعد القيم logits:
```py
>>> from transformers import TFAutoModelForMultipleChoice
>>> model = TFAutoModelForMultipleChoice.from_pretrained("username/my_awesome_swag_model")
>>> inputs = {k: tf.expand_dims(v, 0) for k, v in inputs.items()}
>>> outputs = model(inputs)
>>> logits = outputs.logits
```
استخرج الفئة ذات الاحتمالية الأكبر:
```py
>>> predicted_class = int(tf.math.argmax(logits, axis=-1)[0])
>>> predicted_class
0
```
</tf>
</frameworkcontent>

View File

@ -167,15 +167,29 @@ pip install transformers datasets evaluate
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DefaultDataCollator`]. بخلاف مجمّعات البيانات الأخرى في 🤗 Transformers، لا يطبق [`DefaultDataCollator`] أي معالجة مسبقة إضافية مثل الحشو.
<frameworkcontent>
<pt>
```py
>>> from transformers import DefaultDataCollator
>>> data_collator = DefaultDataCollator()
```
</pt>
<tf>
```py
>>> from transformers import DefaultDataCollator
>>> data_collator = DefaultDataCollator(return_tensors="tf")
```
</tf>
</frameworkcontent>
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -226,6 +240,82 @@ pip install transformers datasets evaluate
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن، وجدول معدل التعلم، وبعض المعاملات الفائقة للتدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_epochs = 2
>>> total_train_steps = (len(tokenized_squad["train"]) // batch_size) * num_epochs
>>> optimizer, schedule = create_optimizer(
... init_lr=2e-5,
... num_warmup_steps=0,
... num_train_steps=total_train_steps,
... )
```
ثم يمكنك تحميل DistilBERT باستخدام [`TFAutoModelForQuestionAnswering`]:
```py
>>> from transformers import TFAutoModelForQuestionAnswering
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("distilbert/distilbert-base-uncased")
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_squad["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_squad["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer)
```
آخر شيء يجب إعداده قبل بدء التدريب هو توفير طريقة لدفع نموذجك إلى Hub. يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالجك المعجمي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> callback = PushToHubCallback(
... output_dir="my_awesome_qa_model",
... tokenizer=tokenizer,
... )
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العهود، ومعاودة الاتصال الخاصة بك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=[callback])
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -267,6 +357,8 @@ pip install transformers datasets evaluate
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسّم النص وأرجع تنسورات PyTorch:
@ -302,3 +394,39 @@ pip install transformers datasets evaluate
>>> tokenizer.decode(predict_answer_tokens)
'176 billion parameters and can generate text in 46 languages natural languages and 13'
```
</pt>
<tf>
قم بتحليل النص المعجمي وأعد موترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("my_awesome_qa_model")
>>> inputs = tokenizer(question, context, return_tensors="tf")
```
مرر مدخلاتك إلى النموذج وأعد `logits`:
```py
>>> from transformers import TFAutoModelForQuestionAnswering
>>> model = TFAutoModelForQuestionAnswering.from_pretrained("my_awesome_qa_model")
>>> outputs = model(**inputs)
```
احصل على أعلى احتمال من مخرجات النموذج لموضعي البداية والنهاية:
```py
>>> answer_start_index = int(tf.math.argmax(outputs.start_logits, axis=-1)[0])
>>> answer_end_index = int(tf.math.argmax(outputs.end_logits, axis=-1)[0])
```
استخلاص الإجابة من الرموز المتوقعة:
```py
>>> predict_answer_tokens = inputs.input_ids[0, answer_start_index : answer_end_index + 1]
>>> tokenizer.decode(predict_answer_tokens)
'176 billion parameters and can generate text in 46 languages natural languages and 13'
```
</tf>
</frameworkcontent>

View File

@ -92,12 +92,24 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`]. الأكثر كفاءة هو استخدام الحشو الديناميكي لجعل الجمل متساوية في الطول داخل كل دفعة، بدلًا من حشو كامل البيانات إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorWithPadding
>>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorWithPadding
>>> data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم(Evaluate)
@ -131,6 +143,8 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> label2id = {"NEGATIVE": 0, "POSITIVE": 1}
```
<frameworkcontent>
<pt>
<Tip>
إذا لم تكن على دراية بضبط نموذج دقيق باستخدام [`Trainer`], فالق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
@ -191,6 +205,98 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بضبط نموذج باستخدام Keras، قم بالاطلاع على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة المحسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> import tensorflow as tf
>>> batch_size = 16
>>> num_epochs = 5
>>> batches_per_epoch = len(tokenized_imdb["train"]) // batch_size
>>> total_train_steps = int(batches_per_epoch * num_epochs)
>>> optimizer, schedule = create_optimizer(init_lr=2e-5, num_warmup_steps=0, num_train_steps=total_train_steps)
```
ثم يمكنك تحميل DistilBERT مع [`TFAutoModelForSequenceClassification`] بالإضافة إلى عدد التصنيفات المتوقعة، وتعيينات التسميات:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(
... "distilbert/distilbert-base-uncased", num_labels=2, id2label=id2label, label2id=label2id
... )
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_imdb["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_imdb["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب الدقة من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
قم بتمرير دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك والمجزئ اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع الاستدعاءات معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد الحقبات، واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -222,6 +328,8 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قم يتجزئة النص وإرجاع تنسورات PyTorch:
```py
@ -248,3 +356,32 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> model.config.id2label[predicted_class_id]
'POSITIVE'
```
</pt>
<tf>
قم بتحليل النص وإرجاع تنسيقات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_model")
>>> inputs = tokenizer(text, return_tensors="tf")
```
قم بتمرير مدخلاتك إلى النموذج وإرجاع `logits`:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("stevhliu/my_awesome_model")
>>> logits = model(**inputs).logits
```
استخرج الفئة ذات الاحتمالية الأعلى، واستخدم `id2label` لتحويلها إلى تصنيف نصي:
```py
>>> predicted_class_id = int(tf.math.argmax(logits, axis=-1)[0])
>>> model.config.id2label[predicted_class_id]
'POSITIVE'
```
</tf>
</frameworkcontent>

View File

@ -118,12 +118,24 @@ pip install transformers datasets evaluate rouge_score
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء عملية التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -158,6 +170,8 @@ pip install transformers datasets evaluate rouge_score
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -212,6 +226,91 @@ pip install transformers datasets evaluate rouge_score
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل التعلم وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer, AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل T5 باستخدام [`TFAutoModelForSeq2SeqLM`]:
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_billsum["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... tokenized_billsum["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة افتراضيًا، لذلك لست بحاجة إلى تحديد واحدة ما لم تكن ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب درجة ROUGE من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_test_set)
```
حدد مكان دفع نموذجك ومُحلِّلك اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_billsum_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع استدعاءاتك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -242,6 +341,8 @@ pip install transformers datasets evaluate rouge_score
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسم النص وإرجع `input_ids` كتنسورات PyTorch:
```py
@ -266,3 +367,31 @@ pip install transformers datasets evaluate rouge_score
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
```
</pt>
<tf>
قسم النص وإرجع `input_ids` كتنسورات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_billsum_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء التلخيص. لمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والمعلمات للتحكم في التوليد، راجع واجهة برمجة تطبيقات [توليد النص](../main_classes/text_generation).
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("username/my_awesome_billsum_model")
>>> outputs = model.generate(inputs, max_new_tokens=100, do_sample=False)
```
فك تشفير معرفات الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it's the most aggressive action on tackling the climate crisis in american history. it will ask the ultra-wealthy and corporations to pay their fair share.'
```
</tf>
</frameworkcontent>

View File

@ -151,11 +151,22 @@ pip install transformers datasets evaluate seqeval
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`].من الأفضل استخدام *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بالكامل إلى الطول الأقصى.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForTokenClassification
>>> data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForTokenClassification
>>> data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم(Evaluate)
@ -235,6 +246,8 @@ pip install transformers datasets evaluate seqeval
... }
```
<frameworkcontent>
<pt>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
@ -289,6 +302,101 @@ pip install transformers datasets evaluate seqeval
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام Keras، ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
للتعديل على نموذج في TensorFlow، ابدأ بإعداد دالة محسن، وجدول معدل التعلم، وبعض معلمات التدريب:
```py
>>> from transformers import create_optimizer
>>> batch_size = 16
>>> num_train_epochs = 3
>>> num_train_steps = (len(tokenized_wnut["train"]) // batch_size) * num_train_epochs
>>> optimizer, lr_schedule = create_optimizer(
... init_lr=2e-5,
... num_train_steps=num_train_steps,
... weight_decay_rate=0.01,
... num_warmup_steps=0,
... )
```
ثم يمكنك تحميل DistilBERT مع [`TFAutoModelForTokenClassification`] إلى جانب عدد التسميات المتوقعة، وتخطيطات التسميات:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained(
... "distilbert/distilbert-base-uncased", num_labels=13, id2label=id2label, label2id=label2id
... )
```
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` مع [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_wnut["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_validation_set = model.prepare_tf_dataset(
... tokenized_wnut["validation"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
هيّئ النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers تتضمن دالة خسارة افتراضية مرتبطة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب درجات seqeval من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_validation_set)
```
حدد مكان دفع نموذجك والمحلل اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_wnut_model",
... tokenizer=tokenizer,
... )
```
ثم جمّع callbacks الخاصة بك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز الآن لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع بيانات التدريب والتحقق، وعدد الحقبات، وcallbacks لتعديل النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_validation_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -349,6 +457,8 @@ pip install transformers datasets evaluate seqeval
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قسّم النص إلى رموز وأرجع المُوتّرات بلغة PyTorch:
```py
@ -392,3 +502,49 @@ pip install transformers datasets evaluate seqeval
'O',
'O']
```
</pt>
<tf>
قسّم النص إلى رموز وأرجع المُوتّرات ب TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("stevhliu/my_awesome_wnut_model")
>>> inputs = tokenizer(text, return_tensors="tf")
```
مرر مدخلاتك إلى النموذج واحصل على `logits`:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("stevhliu/my_awesome_wnut_model")
>>> logits = model(**inputs).logits
```
استخرج الفئة ذات الاحتمالية الأعلى، واستخدم جدول `id2label` الخاصة بالنموذج لتحويلها إلى تسمية نصية:
```py
>>> predicted_token_class_ids = tf.math.argmax(logits, axis=-1)
>>> predicted_token_class = [model.config.id2label[t] for t in predicted_token_class_ids[0].numpy().tolist()]
>>> predicted_token_class
['O',
'O',
'B-location',
'I-location',
'B-group',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'O',
'B-location',
'B-location',
'O',
'O']
```
</tf>
</frameworkcontent>

View File

@ -113,12 +113,24 @@ pip install transformers datasets evaluate sacrebleu
الآن أنشئ دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. من الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
<frameworkcontent>
<pt>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint)
```
</pt>
<tf>
```py
>>> from transformers import DataCollatorForSeq2Seq
>>> data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=checkpoint, return_tensors="tf")
```
</tf>
</frameworkcontent>
## التقييم (Evaluate)
@ -165,6 +177,8 @@ pip install transformers datasets evaluate sacrebleu
## التدريب (Train)
<frameworkcontent>
<pt>
<Tip>
@ -219,6 +233,91 @@ pip install transformers datasets evaluate sacrebleu
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام Keras، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-a-tensorflow-model-with-keras)!
</Tip>
لضبط نموذج في TensorFlow، ابدأ بإعداد دالة مُحسِّن وجدول معدل تعلم وبعض المعلمات الفائقة للتدريب:
```py
>>> from transformers import AdamWeightDecay
>>> optimizer = AdamWeightDecay(learning_rate=2e-5, weight_decay_rate=0.01)
```
ثم يمكنك تحميل T5 باستخدام [`TFAutoModelForSeq2SeqLM`]:
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained(checkpoint)
```
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>> tf_train_set = model.prepare_tf_dataset(
... tokenized_books["train"],
... shuffle=True,
... batch_size=16,
... collate_fn=data_collator,
... )
>>> tf_test_set = model.prepare_tf_dataset(
... tokenized_books["test"],
... shuffle=False,
... batch_size=16,
... collate_fn=data_collator,
... )
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>> import tensorflow as tf
>>> model.compile(optimizer=optimizer) # No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب مقياس SacreBLEU من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
```py
>>> from transformers.keras_callbacks import KerasMetricCallback
>>> metric_callback = KerasMetricCallback(metric_fn=compute_metrics, eval_dataset=tf_test_set)
```
حدد مكان دفع نموذجك ومعالجك اللغوي في [`~transformers.PushToHubCallback`]:
```py
>>> from transformers.keras_callbacks import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="my_awesome_opus_books_model",
... tokenizer=tokenizer,
... )
```
ثم اجمع استدعاءاتك معًا:
```py
>>> callbacks = [metric_callback, push_to_hub_callback]
```
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
```py
>>> model.fit(x=tf_train_set, validation_data=tf_test_set, epochs=3, callbacks=callbacks)
```
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -252,6 +351,8 @@ pip install transformers datasets evaluate sacrebleu
يمكنك أيضًا تكرار نتائج `pipeline` يدويًا إذا أردت:
<frameworkcontent>
<pt>
قم بتحويل النص إلى رموز وإرجاع `input_ids` كموترات PyTorch:
```py
@ -276,3 +377,31 @@ pip install transformers datasets evaluate sacrebleu
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lignées partagent des ressources avec des bactéries enfixant l'azote.'
```
</pt>
<tf>
قم بتحويل النص إلى رموز وإرجاع `input_ids` كموترات TensorFlow:
```py
>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("username/my_awesome_opus_books_model")
>>> inputs = tokenizer(text, return_tensors="tf").input_ids
```
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الترجمة. لمزيد من التفاصيل حول استراتيجيات توليد النصوص المختلفة والمعلمات للتحكم في التوليد، تحقق من واجهة برمجة تطبيقات [توليد النصوص](../main_classes/text_generation).
```py
>>> from transformers import TFAutoModelForSeq2SeqLM
>>> model = TFAutoModelForSeq2SeqLM.from_pretrained("username/my_awesome_opus_books_model")
>>> outputs = model.generate(inputs, max_new_tokens=40, do_sample=True, top_k=30, top_p=0.95)
```
فك تشفير معرفات الرموز المولدة مرة أخرى إلى نص:
```py
>>> tokenizer.decode(outputs[0], skip_special_tokens=True)
'Les lugumes partagent les ressources avec des bactéries fixatrices d'azote.'
```
</tf>
</frameworkcontent>

40
docs/source/ar/tflite.md Normal file
View File

@ -0,0 +1,40 @@
# التصدير إلى TFLite
[TensorFlow Lite](https://www.tensorflow.org/lite/guide) هو إطار عمل خفيف الوزن لنشر نماذج التعلم الآلي على الأجهزة المحدودة الموارد، مثل الهواتف المحمولة، والأنظمة المدمجة، وأجهزة إنترنت الأشياء (IoT). تم تصميم TFLite لتشغيل النماذج وتحسينها بكفاءة على هذه الأجهزة ذات الطاقة الحاسوبية والذاكرة واستهلاك الطاقة المحدودة.
يُمثَّل نموذج TensorFlow Lite بتنسيق محمول فعال خاص يُعرَّف بامتداد الملف `.tflite`.
🤗 Optimum يقدم وظيفة لتصدير نماذج 🤗 Transformers إلى TFLite من خلال الوحدة النمطية `exporters.tflite`. بالنسبة لقائمة هندسات النماذج المدعومة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/tflite/overview).
لتصدير نموذج إلى TFLite، قم بتثبيت متطلبات البرنامج المطلوبة:
```bash
pip install optimum[exporters-tf]
```
للاطلاع على جميع المغامﻻت المتاحة، راجع [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model)، أو عرض المساعدة في سطر الأوامر:
```bash
optimum-cli export tflite --help
```
لتصدير نسخة النموذج ل 🤗 Hub، على سبيل المثال، `google-bert/bert-base-uncased`، قم بتشغيل الأمر التالي:
```bash
optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
```
ستظهر لك السجلات التي تُبيّن التقدم وموقع حفظ ملف `model.tflite` الناتج، كما في المثال التالي:
```bash
Validating TFLite model...
-[] TFLite model output names match reference model (logits)
- Validating TFLite Model output "logits":
-[] (1, 128, 30522) matches (1, 128, 30522)
-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
- logits: max diff = 5.817413330078125e-05.
The exported model was saved at: bert_tflite
```
يُبيّن المثال أعلاه كيفية تصدير نسخة من النموذج ل 🤗 Hub. عند تصدير نموذج محلي، تأكد أولاً من حفظ ملفات أوزان النموذج المجزء اللغوى في نفس المسار (`local_path`). عند استخدام CLI، قم بتمرير `local_path` إلى معامل `model` بدلاً من اسم النسخة على 🤗 Hub.

View File

@ -0,0 +1,154 @@
# التصدير إلى TorchScript
<Tip>
هذه هي بداية تجاربنا مع TorchScript ولا زلنا نستكشف قدراته مع نماذج المدخلات المتغيرة الحجم. إنه مجال اهتمامنا وسنعمق تحليلنا في الإصدارات القادمة، مع المزيد من الأمثلة البرمجية، وتنفيذ أكثر مرونة، ومقاييس مقارنة بين الأكواد القائمة على Python مع أكواد TorchScript المُجمّعة.
</Tip>
وفقًا لـ [وثائق TorchScript](https://pytorch.org/docs/stable/jit.html):
> TorchScript هي طريقة لإنشاء نماذج قابلة للتسلسل والتحسين من تعليمات PyTorch البرمجية.
هناك وحدتان من PyTorch، [JIT and TRACE](https://pytorch.org/docs/stable/jit.html)، تتيحان للمطورين تصدير نماذجهم لإعادة استخدامها في برامج أخرى مثل برامج C++ المُحسّنة للأداء.
نقدم واجهة تتيح لك تصدير نماذج 🤗 Transformers إلى TorchScript بحيث يمكن إعادة استخدامها في بيئة مختلفة عن برامج Python القائمة إلى PyTorch. هنا نشرح كيفية تصدير نماذجنا واستخدامها باستخدام TorchScript.
يتطلب تصدير نموذج أمرين:
- تهيئة مثيل للنموذج باستخدام علامة `torchscript`
- تمرير مُدخلات وهمية (dummy inputs) خلال النموذج
تنطوي هذه الضرورات على عدة أمور يجب على المطورين توخي الحذر بشأنها كما هو مفصل أدناه.
## علامة TorchScript والأوزان المرتبطة
علامة `torchscript` ضرورية لأن معظم نماذج اللغة 🤗 Transformers لها أوزان مرتبطة بين طبقة `Embedding` وطبقة `Decoding`. لا يسمح لك TorchScript بتصدير النماذج ذات الأوزان المرتبطة، لذلك من الضروري فصل الأوزان ونسخها مسبقًا.
النماذج المُهيأة باستخدام علامة `torchscript` لها طبقة `Embedding` وطبقة`Decoding` منفصلتين، مما يعني أنه لا ينبغي تدريبها لاحقًا. سيؤدي التدريب إلى عدم تزامن الطبقتين، مما يؤدي إلى نتائج غير متوقعة.
هذا لا ينطبق على النماذج التي لا تحتوي على رأس نموذج اللغة، حيث لا تملك أوزانًا مرتبطة. يمكن تصدير هذه النماذج بأمان دون علامة `torchscript`.
## المدخلات الوهمية والأطوال القياسية
تُستخدم المُدخلات الوهمية لتمرير أمامي خلال النموذج. أثناء انتشار قيم المُدخلات عبر الطبقات، يتتبع PyTorch العمليات المختلفة التي يتم تنفيذها على كل مصفوفة(tensor). ثم يتم استخدام هذه العمليات المُسجلة بعد ذلك لإنشاء *أثر* النموذج.
يتم إنشاء التتبع بالنسبة لأبعاد المُدخلات. وبالتالي، فهو مُقيّد بأبعاد المُدخلات الوهمية، ولن يعمل لأي طول تسلسل أو حجم دفعة مختلف. عند المحاولة بحجم مختلف، يتم رفع الخطأ التالي:
```
`The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
```
نوصي بتتبع النموذج باستخدام حجم مُدخلات وهمية لا يقل عن أكبر مُدخل سيتم تقديمه للنموذج أثناء الاستدلال. يمكن أن تساعد الحشوة(padding) في ملء القيم المفقودة. ومع ذلك، نظرًا لتتبع النموذج بحجم مُدخل أكبر، ستكون أبعاد المصفوفة ستكون كبيرة أيضًا، مما يؤدي عنه المزيد من الحسابات.
انتبه إلى إجمالي عدد العمليات المُنفذة على كل مُدخل وتابع الأداء عن كثب عند تصدير نماذج متغيرة طول التسلسل.
## استخدام TorchScript في Python
يوضح هذا القسم كيفية حفظ النماذج وتحميلها، بالإضافة إلى كيفية استخدام التتبع للاستدلال.
### حفظ نموذج
لتصدير `BertModel` باستخدام TorchScript، قم بتهيئة ـ `BertModel` من فئة `BertConfig` ثم احفظه على القرص تحت اسم الملف `traced_bert.pt`:
```python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = "[MASK]"
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(
vocab_size_or_config_json_file=32000,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
torchscript=True,
)
# Instantiating the model
model = BertModel(config)
# The model needs to be in evaluation mode
model.eval()
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
```
### تحميل نموذج
يمكنك الآن تحميل `BertModel` المُحفظ سابقًا، `traced_bert.pt`، من القرص واستخدامه على `dummy_input` المُهيأ سابقًا:
```python
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
```
### استخدام نموذج مُتتبع للاستدلال
استخدم النموذج المُتتبع للاستدلال باستخدام أسلوب `__call__` الخاص به:
```python
traced_model(tokens_tensor, segments_tensors)
```
## نشر نماذج Hugging Face TorchScript على AWS باستخدام Neuron SDK
قدمت AWS عائلة [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) من اﻷجهزة لخفض التكلفة وأداء التعلم الآلي عالي الأداء في البيئة السحابية. تعمل أجهزة Inf1 بواسطة شريحة Inferentia من AWS، وهي مُسرّع أجهزة مُخصص، متخصص في أعباء عمل الاستدلال للتعلم العميق. [AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#) هي SDK لـ Inferentia التي تدعم تتبع نماذج المحولات وتحسينها للنشر على Inf1. توفر Neuron SDK ما يلي:
1. واجهة برمجة تطبيقات سهلة الاستخدام مع تغيير سطر واحد من التعليمات البرمجية لتتبع نموذج TorchScript وتحسينه للاستدلال في البيئة السحابية.
2. تحسينات الأداء الجاهزة للاستخدام [تحسين التكلفة والأداء](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>).
3. دعم نماذج Hugging Face المحولات المبنية باستخدام إما [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html) أو [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html).
### الآثار المترتبة
تعمل نماذج المحولات المستندة إلى بنية [BERT (تمثيلات الترميز ثنائية الاتجاه من المحولات)](https://huggingface.co/docs/transformers/main/model_doc/bert) أو متغيراتها مثل [distilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert) و [roBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta) بشكل أفضل على Inf1 للمهام غير التوليدية مثل الإجابة على الأسئلة الاستخراجية، وتصنيف التسلسلات، وتصنيف الرموز (tokens). ومع ذلك، يمكن تكييف مهام توليد النصوص للعمل على Inf1 وفقًا لهذا [برنامج تعليمي AWS Neuron MarianMT](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html). يمكن العثور على مزيد من المعلومات حول النماذج التي يمكن تحويلها جاهزة على Inferentia في قسم [ملاءمة بنية النموذج](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia) من وثائق Neuron.
### التبعيات (Dependencies)
يتطلب استخدام AWS Neuron لتحويل النماذج [بيئة SDK Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide) والتي تأتي مسبقًا على [AMI للتعلم العميق من AWS](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
### تحويل نموذج لـ AWS Neuron
قم بتحويل نموذج لـ AWS NEURON باستخدام نفس التعليمات البرمجية من [استخدام TorchScript في Python](torchscript#using-torchscript-in-python) لتتبع `BertModel`. قم باستيراد امتداد إطار عمل `torch.neuron` للوصول إلى مكونات Neuron SDK من خلال واجهة برمجة تطبيقات Python:
```python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
import torch.neuron
```
كل ما عليك فعله هو تعديل السطر التالي:
```diff
- torch.jit.trace(model, [tokens_tensor, segments_tensors])
+ torch.neuron.trace(model, [token_tensor, segments_tensors])
```
يتيح ذلك لـ Neuron SDK تتبع النموذج وتحسينه لمثيلات Inf1.
لمعرفة المزيد حول ميزات AWS Neuron SDK والأدوات ودروس البرامج التعليمية والتحديثات الأخيرة، يرجى الاطلاع على [وثائق AWS NeuronSDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).

View File

@ -611,6 +611,7 @@ accelerate launch \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
يمكنك أيضًا تحديد المعلمات من ملف `config_file.yaml` مباشرة في سطر الأوامر:
@ -633,6 +634,7 @@ accelerate launch --num_processes=2 \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
اطلع على برنامج تعليمي [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) لمعرفة المزيد حول `accelerate_launch` والتكوينات المخصصة.

View File

@ -58,6 +58,8 @@
في شريط التنقل الأيمن للقفز إلى الإطار الذي تريده - وإذا كنت تريد إخفاء كل المحتوى لإطار معين،
فاستخدم الزر في الركن العلوي الأيمن من كتلة الإطار!
<frameworkcontent>
<pt>
<Youtube id="nvBXf7s7vTI"/>
## التدريب باستخدام PyTorch Trainer
@ -137,10 +139,124 @@
```py
>>> trainer.train()
```
</pt>
<tf>
<a id='keras'></a>
<Youtube id="rnTGBy2ax1c"/>
## تدريب نموذج TensorFlow باستخدام Keras
يمكنك أيضًا تدريب نماذج 🤗 Transformers في TensorFlow باستخدام واجهة برمجة تطبيقات Keras!
### تحميل البيانات لـ Keras
عندما تريد تدريب نموذج 🤗 Transformers باستخدام واجهة برمجة تطبيقات Keras، فأنت بحاجة إلى تحويل مجموعة البيانات الخاصة بك إلى تنسيق يفهمه
Keras. إذا كانت مجموعة البيانات الخاصة بك صغيرة، فيمكنك ببساطة تحويلها بالكامل إلى مصفوفات NumPy وإرسالها إلى Keras.
دعونا نجرب ذلك أولاً قبل أن نقوم بأي شيء أكثر تعقيدًا.
أولاً، قم بتحميل مجموعة بيانات. سنستخدم مجموعة بيانات CoLA من معيار [GLUE benchmark](https://huggingface.co/datasets/glue
نظرًا لأنه مهمة تصنيف نص ثنائي بسيطة، وسنأخذ فقط قسم التدريب الآن.
```py
from datasets import load_dataset
dataset = load_dataset("glue"، "cola")
dataset = dataset ["train"] # خذ فقط قسم التدريب الآن
```
بعد ذلك، قم بتحميل أداة المُجزّئ اللغوي وقم بترميز البيانات كمصفوفات NumPy. لاحظ أن التصنيفات هي بالفعل قائمة من 0 و 1،
لذا يمكننا ببساطة تحويل ذلك مباشرة إلى مصفوفة NumPy بدون ترميز!
```py
from transformers import AutoTokenizer
import numpy as np
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["sentence"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
```
أخيرًا، قم بتحميل وتجميع وتناسب النموذج. لاحظ أن نماذج Transformers تحتوي جميعها على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذا فأنت لست بحاجة إلى تحديد واحدة ما لم ترغب في ذلك:
```py
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# تحميل وتجميع النموذج الخاص بنا
model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# معدلات التعلم المنخفضة أفضل غالبًا لضبط النماذج الدقيقة
model.compile(optimizer=Adam(3e-5)) # لا توجد دالة خسارة!
model.fit(tokenized_data, labels)
```
<Tip>
أنت لست مضطرًا لتمرير دالة خسارة إلى نماذجك عند تجميعها! تختار نماذج Hugging Face تلقائيًا
دالة خسارة مناسبة لمهمتها وهندسة نموذجها إذا تُركت هذه الحجة فارغة. يمكنك دائمًا
تجاوز ذلك عن طريق تحديد دالة خسارة بنفسك إذا كنت تريد ذلك!
</Tip>
يعمل هذا النهج بشكل رائع لمجموعات البيانات الصغيرة، ولكن بالنسبة لمجموعات البيانات الأكبر، فقد تجد أنه يصبح مشكلة. لماذا؟
لأن المصفوفة المرمزة والتصنيفات يجب أن يتم تحميلها بالكامل في الذاكرة، ولأن NumPy لا يتعامل مع
المصفوفات"غير المنتظمة"، لذا حشو كل عينة إلى طول أطول عينة في مجموعة البيانات بأكملها. سيؤدي ذلك إلى زيادة حجم المصفوفة لديك، وستبطئ الرموز الزائده من عملية التدريب أيضًا!
### تحميل البيانات كـ tf.data.Dataset
إذا كنت تريد تجنب إبطاء التدريب، فيمكنك تحميل بياناتك كـ `tf.data.Dataset` بدلاً من ذلك. على الرغم من أنه يمكنك كتابة خط أنابيب `tf.data` الخاص بك إذا كنت تريد، إلا أن لدينا طريقتين مختصرتين للقيام بذلك:
- [`~TFPreTrainedModel.prepare_tf_dataset`]: هذه هي الطريقة التي نوصي بها في معظم الحالات. نظرًا لأنه طريقة
على نموذجك، فيمكنه فحص النموذج لتحديد الأعمدة القابلة للاستخدام كمدخلات للنموذج تلقائيًا،
واستبعاد الأعمدة الأخرى لإنشاء مجموعة بيانات أبسط وأكثر كفاءة.
- [`~datasets.Dataset.to_tf_dataset`]: هذه الطريقة أكثر أساسية، وهي مفيدة عندما تريد التحكم بدقة في كيفية
إنشاء مجموعة البيانات الخاصة بك، عن طريق تحديد أعمدة `columns` و `label_cols` المحددة التي سيتم تضمينها.
قبل أن تتمكن من استخدام [`~TFPreTrainedModel.prepare_tf_dataset`]، ستحتاج إلى إضافة مخرجات المُجزئ إلى مجموعة البيانات الخاصة بك كأعمدة، كما هو موضح في
عينة التعليمات البرمجية التالية:
```py
def tokenize_dataset (data):
# ستتم إضافة مفاتيح القاموس الذي تمت إعادته كأعمدة إلى مجموعة البيانات
return tokenizer(data["text"])
dataset = dataset.map(tokenize_dataset)
```
تذكر أن مجموعات بيانات Hugging Face يتم تخزينها على القرص بشكل افتراضي، لذا فلن يؤدي ذلك إلى تضخيم استخدام الذاكرة لديك! بمجرد إضافة الأعمدة، يمكنك بث الدفعات من مجموعة البيانات وإضافة الترميز إلى كل دفعة، مما يقلل بشكل كبير من عدد رموز الترقيم مقارنة بترميز مجموعة البيانات بأكملها.
```py
>>> tf_dataset = model.prepare_tf_dataset(dataset["train"], batch_size=16, shuffle=True, tokenizer=tokenizer)
```
لاحظ أنه في عينة التعليمات البرمجية أعلاه، تحتاج إلى تمرير المُجزئ اللغوي إلى `prepare_tf_dataset` حتى تتمكن من حشو الدُفعات بشكل صحيح أثناء تحميلها.
إذا كانت جميع العينات في مجموعة البيانات الخاصة بك بنفس الطول ولم يكن الترميز ضروريًا، فيمكنك تخطي هذا المعامل.
إذا كنت بحاجة إلى القيام بشيء أكثر تعقيدًا من مجرد ترميز العينات (على سبيل المثال، إفساد الرموز للنمذجة اللغوية المُقنعة)،
فيمكنك استخدام معامل `collate_fn` بدلاً من ذلك لتمرير دالة يتم استدعاؤها لتحويل
قائمة العينات إلى دفعة وتطبيق أي معالجة مسبقة تريدها. راجع أمثلةنا [examples](https://github.com/huggingface/transformers/tree/main/examples) أو
[دفاتر الملاحظات](https://huggingface.co/docs/transformers/notebooks) لرؤية هذا النهج في العمل.
بمجرد إنشاء `tf.data.Dataset`، يمكنك تجميع النموذج وتناسبه كما هو الحال من قبل:
```py
model.compile(optimizer=Adam(3e-5)) # No loss argument!
model.fit(tf_dataset)
```
</tf>
</frameworkcontent>
<a id='pytorch_native'></a>
## تدريب في PyTorch الأصلي
<frameworkcontent>
<pt>
<Youtube id="Dh9CL8fyG80"/>
[`Trainer`] يهتم بحلقة التدريب ويسمح لك بضبط نموذج في سطر واحد من التعليمات البرمجية. بالنسبة للمستخدمين الذين يفضلون كتابة حلقة التدريب الخاصة بهم، يمكنك أيضًا ضبط نموذج 🤗 Transformers في PyTorch الأصلي.
@ -281,6 +397,8 @@ torch.cuda.empty_cache()
>>> metric.compute()
```
</pt>
</frameworkcontent>
<a id='additional-resources'></a>
@ -291,4 +409,4 @@ torch.cuda.empty_cache()
- [🤗 أمثلة المحولات](https://github.com/huggingface/transformers/tree/main/examples) تتضمن
النصوص البرمجية لتدريب مهام NLP الشائعة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.

View File

@ -53,7 +53,7 @@ Lassen Sie uns daher ein wenig tiefer in das allgemeine Design der Bibliothek ei
### Überblick über die Modelle
Um ein Modell erfolgreich hinzuzufügen, ist es wichtig, die Interaktion zwischen Ihrem Modell und seiner Konfiguration zu verstehen,
[`PreTrainedModel`] und [`PreTrainedConfig`]. Als Beispiel werden wir
[`PreTrainedModel`] und [`PretrainedConfig`]. Als Beispiel werden wir
das Modell, das zu 🤗 Transformers hinzugefügt werden soll, `BrandNewBert` nennen.
Schauen wir uns das mal an:
@ -81,10 +81,10 @@ model.config # model has access to its config
```
Ähnlich wie das Modell erbt die Konfiguration grundlegende Serialisierungs- und Deserialisierungsfunktionalitäten von
[`PreTrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
[`PretrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
unterschiedliche Formate serialisiert werden - das Modell in eine *pytorch_model.bin* Datei und die Konfiguration in eine *config.json* Datei. Aufruf von
[`~PreTrainedModel.save_pretrained`] wird automatisch
[`~PreTrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
[`~PretrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
### Code-Stil

View File

@ -81,6 +81,8 @@ Laden Sie einen Prozessor mit [`AutoProcessor.from_pretrained`]:
## AutoModel
<frameworkcontent>
<pt>
Mit den `AutoModelFor`-Klassen können Sie schließlich ein vortrainiertes Modell für eine bestimmte Aufgabe laden (siehe [hier](model_doc/auto) für eine vollständige Liste der verfügbaren Aufgaben). Laden Sie zum Beispiel ein Modell für die Sequenzklassifikation mit [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -106,3 +108,24 @@ TensorFlow- und Flax-Checkpoints sind nicht betroffen und können in PyTorch-Arc
</Tip>
Im Allgemeinen empfehlen wir die Verwendung der Klasse "AutoTokenizer" und der Klasse "AutoModelFor", um trainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nächsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz für die Feinabstimmung vorzuverarbeiten.
</pt>
<tf>
Mit den Klassen `TFAutoModelFor` schließlich können Sie ein vortrainiertes Modell für eine bestimmte Aufgabe laden (siehe [hier](model_doc/auto) für eine vollständige Liste der verfügbaren Aufgaben). Laden Sie zum Beispiel ein Modell für die Sequenzklassifikation mit [`TFAutoModelForSequenceClassification.from_pretrained`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Sie können denselben Prüfpunkt problemlos wiederverwenden, um eine Architektur für eine andere Aufgabe zu laden:
```py
>>> from transformers import TFAutoModelForTokenClassification
>>> model = TFAutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
Im Allgemeinen empfehlen wir, die Klasse "AutoTokenizer" und die Klasse "TFAutoModelFor" zu verwenden, um vortrainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nächsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz für die Feinabstimmung vorzuverarbeiten.
</tf>
</frameworkcontent>

View File

@ -78,10 +78,10 @@ Wenn Sie an der grundlegenden Verwendung von LLMs interessiert sind, ist unsere
Zunächst müssen Sie das Modell laden.
```py
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> from transformers import AutoModelForCausalLM
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... )
```
@ -119,12 +119,12 @@ Und das war's! Mit ein paar Zeilen Code können Sie sich die Macht eines LLM zun
Es gibt viele [Generierungsstrategien](generation_strategies), und manchmal sind die Standardwerte für Ihren Anwendungsfall vielleicht nicht geeignet. Wenn Ihre Ausgaben nicht mit dem übereinstimmen, was Sie erwarten, haben wir eine Liste der häufigsten Fallstricke erstellt und wie Sie diese vermeiden können.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_7b")
>>> tokenizer.pad_token = tokenizer.eos_token # Llama has no pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... )
```

View File

@ -79,15 +79,43 @@ Um sicherzustellen, dass Ihr Modell von jemandem verwendet werden kann, der mit
Die Konvertierung eines Checkpoints für ein anderes Framework ist einfach. Stellen Sie sicher, dass Sie PyTorch und TensorFlow installiert haben (siehe [hier](installation) für Installationsanweisungen), und finden Sie dann das spezifische Modell für Ihre Aufgabe in dem anderen Framework.
<frameworkcontent>
<pt>
Geben Sie `from_tf=True` an, um einen Prüfpunkt von TensorFlow nach PyTorch zu konvertieren:
```py
>>> pt_model = DistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_tf=True)
>>> pt_model.save_pretrained("path/to/awesome-name-you-picked")
```
</pt>
<tf>
Geben Sie `from_pt=True` an, um einen Prüfpunkt von PyTorch nach TensorFlow zu konvertieren:
```py
>>> tf_model = TFDistilBertForSequenceClassification.from_pretrained("path/to/awesome-name-you-picked", from_pt=True)
```
Dann können Sie Ihr neues TensorFlow-Modell mit seinem neuen Checkpoint speichern:
```py
>>> tf_model.save_pretrained("path/to/awesome-name-you-picked")
```
</tf>
<jax>
Wenn ein Modell in Flax verfügbar ist, können Sie auch einen Kontrollpunkt von PyTorch nach Flax konvertieren:
```py
>>> flax_model = FlaxDistilBertForSequenceClassification.from_pretrained(
... "path/to/awesome-name-you-picked", from_pt=True
... )
```
</jax>
</frameworkcontent>
## Ein Modell während des Trainings hochladen
<frameworkcontent>
<pt>
<Youtube id="Z1-XMy-GNLQ"/>
Die Weitergabe eines Modells an den Hub ist so einfach wie das Hinzufügen eines zusätzlichen Parameters oder Rückrufs. Erinnern Sie sich an das [Feinabstimmungs-Tutorial](training), in der Klasse [`TrainingArguments`] geben Sie Hyperparameter und zusätzliche Trainingsoptionen an. Eine dieser Trainingsoptionen beinhaltet die Möglichkeit, ein Modell direkt an den Hub zu pushen. Setzen Sie `push_to_hub=True` in Ihrer [`TrainingArguments`]:
@ -113,6 +141,29 @@ Nach der Feinabstimmung Ihres Modells rufen Sie [`~transformers.Trainer.push_to_
```py
>>> trainer.push_to_hub()
```
</pt>
<tf>
Geben Sie ein Modell mit [`PushToHubCallback`] an den Hub weiter. In der [`PushToHubCallback`] Funktion, fügen Sie hinzu:
- Ein Ausgabeverzeichnis für Ihr Modell.
- Einen Tokenizer.
- Die `hub_model_id`, die Ihr Hub-Benutzername und Modellname ist.
```py
>>> from transformers import PushToHubCallback
>>> push_to_hub_callback = PushToHubCallback(
... output_dir="./your_model_save_path", tokenizer=tokenizer, hub_model_id="your-username/my-awesome-model"
... )
```
Fügen Sie den Callback zu [`fit`](https://keras.io/api/models/model_training_apis/) hinzu, und 🤗 Transformers wird das trainierte Modell an den Hub weiterleiten:
```py
>>> model.fit(tf_train_dataset, validation_data=tf_validation_dataset, epochs=3, callbacks=push_to_hub_callback)
```
</tf>
</frameworkcontent>
## Verwenden Sie die Funktion `push_to_hub`.
@ -178,4 +229,4 @@ Um sicherzustellen, dass die Benutzer die Fähigkeiten, Grenzen, möglichen Verz
* Manuelles Erstellen und Hochladen einer "README.md"-Datei.
* Klicken Sie auf die Schaltfläche **Modellkarte bearbeiten** in Ihrem Modell-Repository.
Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert/distilbert-base-uncased) als gutes Beispiel für die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details über andere Optionen, die Sie in der Datei "README.md" einstellen können, wie z.B. den Kohlenstoff-Fußabdruck eines Modells oder Beispiele für Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards).
Werfen Sie einen Blick auf die DistilBert [model card](https://huggingface.co/distilbert/distilbert-base-uncased) als gutes Beispiel für die Art von Informationen, die eine Modellkarte enthalten sollte. Weitere Details über andere Optionen, die Sie in der Datei "README.md" einstellen können, wie z.B. den Kohlenstoff-Fußabdruck eines Modells oder Beispiele für Widgets, finden Sie in der Dokumentation [hier](https://huggingface.co/docs/hub/models-cards).

View File

@ -153,6 +153,8 @@ Schließlich möchten Sie, dass der Tokenizer die tatsächlichen Tensoren zurüc
Setzen Sie den Parameter `return_tensors` entweder auf `pt` für PyTorch, oder `tf` für TensorFlow:
<frameworkcontent>
<pt>
```py
>>> batch_sentences = [
@ -172,6 +174,32 @@ Setzen Sie den Parameter `return_tensors` entweder auf `pt` für PyTorch, oder `
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
</pt>
<tf>
```py
>>> batch_sentences = [
... "But what about second breakfast?",
... "Don't think he knows about second breakfast, Pip.",
... "What about elevensies?",
... ]
>>> encoded_input = tokenizer(batch_sentences, padding=True, truncation=True, return_tensors="tf")
>>> print(encoded_input)
{'input_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[101, 1252, 1184, 1164, 1248, 6462, 136, 102, 0, 0, 0, 0, 0, 0, 0],
[101, 1790, 112, 189, 1341, 1119, 3520, 1164, 1248, 6462, 117, 21902, 1643, 119, 102],
[101, 1327, 1164, 5450, 23434, 136, 102, 0, 0, 0, 0, 0, 0, 0, 0]],
dtype=int32)>,
'token_type_ids': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>,
'attention_mask': <tf.Tensor: shape=(2, 9), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)>}
```
</tf>
</frameworkcontent>
## Audio

View File

@ -66,10 +66,20 @@ Im folgenden Beispiel werden Sie die [`pipeline`] für die Stimmungsanalyse verw
Installieren Sie die folgenden Abhängigkeiten, falls Sie dies nicht bereits getan haben:
<frameworkcontent>
<pt>
```bash
pip install torch
```
</pt>
<tf>
```bash
pip install tensorflow
```
</tf>
</frameworkcontent>
Importieren sie die [`pipeline`] und spezifizieren sie die Aufgabe, welche sie lösen möchten:
@ -144,6 +154,8 @@ Die [`pipeline`] kann jedes Modell aus dem [Model Hub](https://huggingface.co/mo
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
```
<frameworkcontent>
<pt>
Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `AutoClass` below):
```py
@ -152,6 +164,18 @@ Use the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the
>>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</pt>
<tf>
Use the [`TFAutoModelForSequenceClassification`] and [`AutoTokenizer`] to load the pretrained model and its associated tokenizer (more on an `TFAutoClass` below):
```py
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
```
</tf>
</frameworkcontent>
Dann können Sie das Modell und den Tokenizer in der [`pipeline`] angeben und den `Klassifikator` auf Ihren Zieltext anwenden:
@ -202,6 +226,8 @@ Der Tokenizer gibt ein Wörterbuch zurück, das Folgendes enthält:
Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Darüber hinaus kann der Tokenizer den Text auch auffüllen und kürzen, um einen Stapel mit einheitlicher Länge zurückzugeben:
<frameworkcontent>
<pt>
```py
>>> pt_batch = tokenizer(
@ -212,11 +238,27 @@ Genau wie die [`pipeline`] akzeptiert der Tokenizer eine Liste von Eingaben. Dar
... return_tensors="pt",
... )
```
</pt>
<tf>
```py
>>> tf_batch = tokenizer(
... ["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."],
... padding=True,
... truncation=True,
... max_length=512,
... return_tensors="tf",
... )
```
</tf>
</frameworkcontent>
Lesen Sie das Tutorial [preprocessing](./preprocessing) für weitere Details zur Tokenisierung.
### AutoModel
<frameworkcontent>
<pt>
🤗 Transformers bietet eine einfache und einheitliche Möglichkeit, vortrainierte Instanzen zu laden. Das bedeutet, dass Sie ein [`AutoModel`] laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`AutoModel`] für die Aufgabe. Da Sie eine Text- oder Sequenzklassifizierung vornehmen, laden Sie [`AutoModelForSequenceClassification`]:
```py
@ -248,6 +290,39 @@ Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wen
tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
</pt>
<tf>
🤗 Transformers bietet eine einfache und einheitliche Methode zum Laden von vortrainierten Instanzen. Das bedeutet, dass Sie ein [`TFAutoModel`] genauso laden können, wie Sie einen [`AutoTokenizer`] laden würden. Der einzige Unterschied ist die Auswahl des richtigen [`TFAutoModel`] für die Aufgabe. Da Sie Text - oder Sequenz - Klassifizierung machen, laden Sie [`TFAutoModelForSequenceClassification`]:
```py
>>> from transformers import TFAutoModelForSequenceClassification
>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
```
<Tip>
In der [Aufgabenzusammenfassung](./task_summary) steht, welche [AutoModel]-Klasse für welche Aufgabe zu verwenden ist.
</Tip>
Jetzt können Sie Ihren vorverarbeiteten Stapel von Eingaben direkt an das Modell übergeben, indem Sie die Wörterbuchschlüssel direkt an die Tensoren übergeben:
```py
>>> tf_outputs = tf_model(tf_batch)
```
Das Modell gibt die endgültigen Aktivierungen in dem Attribut "logits" aus. Wenden Sie die Softmax-Funktion auf die "logits" an, um die Wahrscheinlichkeiten zu erhalten:
```py
>>> import tensorflow as tf
>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
>>> tf_predictions # doctest: +IGNORE_RESULT
```
</tf>
</frameworkcontent>
<Tip>
@ -267,6 +342,8 @@ Die Modellausgänge verhalten sich auch wie ein Tupel oder ein Wörterbuch (z.B.
### Modell speichern
<frameworkcontent>
<pt>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer speichern, indem Sie [`PreTrainedModel.save_pretrained`] verwenden:
```py
@ -280,9 +357,28 @@ Wenn Sie bereit sind, das Modell erneut zu verwenden, laden Sie es mit [`PreTrai
```py
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
```
</pt>
<tf>
Sobald Ihr Modell feinabgestimmt ist, können Sie es mit seinem Tokenizer unter Verwendung von [`TFPreTrainedModel.save_pretrained`] speichern:
```py
>>> tf_save_directory = "./tf_save_pretrained"
>>> tokenizer.save_pretrained(tf_save_directory) # doctest: +IGNORE_RESULT
>>> tf_model.save_pretrained(tf_save_directory)
```
Wenn Sie bereit sind, das Modell wieder zu verwenden, laden Sie es mit [`TFPreTrainedModel.from_pretrained`]:
```py
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
```
</tf>
</frameworkcontent>
Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell zu speichern und es entweder als PyTorch- oder TensorFlow-Modell wieder zu laden. Der Parameter "from_pt" oder "from_tf" kann das Modell von einem Framework in das andere konvertieren:
<frameworkcontent>
<pt>
```py
>>> from transformers import AutoModel
@ -290,6 +386,17 @@ Ein besonders cooles 🤗 Transformers-Feature ist die Möglichkeit, ein Modell
>>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
```
</pt>
<tf>
```py
>>> from transformers import TFAutoModel
>>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
```
</tf>
</frameworkcontent>
## Custom model builds
@ -303,6 +410,8 @@ Beginnen Sie mit dem Import von [`AutoConfig`] und laden Sie dann das trainierte
>>> my_config = AutoConfig.from_pretrained("distilbert/distilbert-base-uncased", n_heads=12)
```
<frameworkcontent>
<pt>
Create a model from your custom configuration with [`AutoModel.from_config`]:
```py
@ -310,6 +419,17 @@ Create a model from your custom configuration with [`AutoModel.from_config`]:
>>> my_model = AutoModel.from_config(my_config)
```
</pt>
<tf>
Create a model from your custom configuration with [`TFAutoModel.from_config`]:
```py
>>> from transformers import TFAutoModel
>>> my_model = TFAutoModel.from_config(my_config)
```
</tf>
</frameworkcontent>
Weitere Informationen zur Erstellung von benutzerdefinierten Konfigurationen finden Sie in der Anleitung [Erstellen einer benutzerdefinierten Architektur](./create_a_model).

View File

@ -85,6 +85,8 @@ pip install -r requirements.txt
## Ein Skript ausführen
<frameworkcontent>
<pt>
Das Beispielskript lädt einen Datensatz aus der 🤗 [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Dann nimmt das Skript eine Feinabstimmung eines Datensatzes mit dem [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) auf einer Architektur vor, die eine Zusammenfassung unterstützt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem Datensatz [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) durchgeführt wird. Das T5-Modell benötigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusätzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiß T5, dass es sich um eine Zusammenfassungsaufgabe handelt.
```bash
@ -98,8 +100,27 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
Das Beispielskript lädt einen Datensatz aus der 🤗 [Datasets](https://huggingface.co/docs/datasets/) Bibliothek herunter und verarbeitet ihn vor. Anschließend nimmt das Skript die Feinabstimmung eines Datensatzes mit Keras auf einer Architektur vor, die die Zusammenfassung unterstützt. Das folgende Beispiel zeigt, wie die Feinabstimmung von [T5-small](https://huggingface.co/google-t5/t5-small) auf dem [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail) Datensatz durchgeführt wird. Das T5-Modell benötigt aufgrund der Art und Weise, wie es trainiert wurde, ein zusätzliches Argument `source_prefix`. Mit dieser Eingabeaufforderung weiß T5, dass es sich um eine Zusammenfassungsaufgabe handelt.
```bash
python examples/tensorflow/summarization/run_summarization.py \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## Verteiltes Training und gemischte Präzision
@ -121,6 +142,7 @@ torchrun \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -128,6 +150,8 @@ TensorFlow-Skripte verwenden eine [`MirroredStrategy`](https://www.tensorflow.or
## Ein Skript auf einer TPU ausführen
<frameworkcontent>
<pt>
Tensor Processing Units (TPUs) sind speziell für die Beschleunigung der Leistung konzipiert. PyTorch unterstützt TPUs mit dem [XLA](https://www.tensorflow.org/xla) Deep Learning Compiler (siehe [hier](https://github.com/pytorch/xla/blob/master/README.md) für weitere Details). Um eine TPU zu verwenden, starten Sie das Skript `xla_spawn.py` und verwenden das Argument `num_cores`, um die Anzahl der TPU-Kerne festzulegen, die Sie verwenden möchten.
```bash
@ -142,8 +166,28 @@ python xla_spawn.py --num_cores 8 \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
</pt>
<tf>
Tensor Processing Units (TPUs) sind speziell für die Beschleunigung der Leistung konzipiert. TensorFlow Skripte verwenden eine [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) für das Training auf TPUs. Um eine TPU zu verwenden, übergeben Sie den Namen der TPU-Ressource an das Argument `tpu`.
```bash
python run_summarization.py \
--tpu name_of_tpu_resource \
--model_name_or_path google-t5/t5-small \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size 8 \
--per_device_eval_batch_size 16 \
--num_train_epochs 3 \
--do_train \
--do_eval
```
</tf>
</frameworkcontent>
## Führen Sie ein Skript mit 🤗 Accelerate aus.
@ -198,6 +242,7 @@ python examples/pytorch/summarization/run_summarization.py \
--summary_column summary_column_name \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--predict_with_generate
@ -225,6 +270,7 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -238,6 +284,8 @@ examples/pytorch/summarization/run_summarization.py -h
Eine weitere hilfreiche Option, die Sie aktivieren können, ist die Wiederaufnahme des Trainings von einem früheren Kontrollpunkt aus. Auf diese Weise können Sie im Falle einer Unterbrechung Ihres Trainings dort weitermachen, wo Sie aufgehört haben, ohne von vorne beginnen zu müssen. Es gibt zwei Methoden, um das Training von einem Kontrollpunkt aus wieder aufzunehmen.
Die erste Methode verwendet das Argument `output_dir previous_output_dir`, um das Training ab dem letzten in `output_dir` gespeicherten Kontrollpunkt wieder aufzunehmen. In diesem Fall sollten Sie `overwrite_output_dir` entfernen:
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
@ -249,6 +297,24 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--output_dir previous_output_dir \
--predict_with_generate
```
Die zweite Methode verwendet das Argument `Resume_from_checkpoint path_to_specific_checkpoint`, um das Training ab einem bestimmten Checkpoint-Ordner wieder aufzunehmen.
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--resume_from_checkpoint path_to_specific_checkpoint \
--predict_with_generate
```
@ -280,5 +346,6 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
```

View File

@ -73,6 +73,8 @@ An dieser Stelle sollten Sie dem Abschnitt folgen, der dem Rahmen entspricht, de
in der rechten Seitenleiste können Sie zu dem gewünschten Abschnitt springen - und wenn Sie den gesamten Inhalt eines bestimmten Frameworks ausblenden möchten,
klicken Sie einfach auf die Schaltfläche oben rechts im Block des jeweiligen Frameworks!
<frameworkcontent>
<pt>
<Youtube id="nvBXf7s7vTI"/>
## Trainieren mit PyTorch Trainer
@ -153,11 +155,128 @@ Anschließend können Sie Ihr Modell durch den Aufruf von [`~transformers.Traine
```py
>>> trainer.train()
```
</pt>
<tf>
<a id='keras'></a>
<Youtube id="rnTGBy2ax1c"/>
## Trainieren Sie ein TensorFlow-Modell mit Keras
Sie können auch 🤗 Transformers Modelle in TensorFlow mit der Keras API trainieren!
### Laden von Daten für Keras
Wenn Sie ein 🤗 Transformers Modell mit der Keras API trainieren wollen, müssen Sie Ihren Datensatz in ein Format konvertieren, das
Keras versteht. Wenn Ihr Datensatz klein ist, können Sie das Ganze einfach in NumPy-Arrays konvertieren und an Keras übergeben.
Probieren wir das zuerst aus, bevor wir etwas Komplizierteres tun.
Laden Sie zunächst ein Dataset. Wir werden den CoLA-Datensatz aus dem [GLUE-Benchmark](https://huggingface.co/datasets/glue) verwenden,
da es sich um eine einfache Aufgabe zur Klassifizierung von binärem Text handelt, und nehmen vorerst nur den Trainingssplit.
```py
from datasets import load_dataset
dataset = load_dataset("glue", "cola")
dataset = dataset["train"] # Just take the training split for now
```
Als nächstes laden Sie einen Tokenizer und tokenisieren die Daten als NumPy-Arrays. Beachten Sie, dass die Beschriftungen bereits eine Liste von 0 und 1en sind,
Wir können sie also ohne Tokenisierung direkt in ein NumPy-Array konvertieren!
```py
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
tokenized_data = tokenizer(dataset["text"], return_tensors="np", padding=True)
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data = dict(tokenized_data)
labels = np.array(dataset["label"]) # Label is already an array of 0 and 1
```
Schließlich laden, [`compile`](https://keras.io/api/models/model_training_apis/#compile-method) und [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) Sie das Modell:
```py
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.optimizers import Adam
# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
model.compile(optimizer=Adam(3e-5))
model.fit(tokenized_data, labels)
```
<Tip>
Sie müssen Ihren Modellen kein Verlustargument übergeben, wenn Sie sie `compile()`! Hugging-Face-Modelle wählen automatisch
einen Loss, der für ihre Aufgabe und Modellarchitektur geeignet ist, wenn dieses Argument leer gelassen wird. Sie können jederzeit außer Kraft setzen, indem Sie selbst einen Loss angeben, wenn Sie das möchten!
</Tip>
Dieser Ansatz eignet sich hervorragend für kleinere Datensätze, aber bei größeren Datensätzen kann er zu einem Problem werden. Warum?
Weil das tokenisierte Array und die Beschriftungen vollständig in den Speicher geladen werden müssten, und weil NumPy nicht mit
"gezackte" Arrays nicht verarbeiten kann, so dass jedes tokenisierte Sample auf die Länge des längsten Samples im gesamten Datensatz aufgefüllt werden müsste.
Datensatzes aufgefüllt werden. Dadurch wird das Array noch größer, und all die aufgefüllten Token verlangsamen auch das Training!
### Laden von Daten als tf.data.Dataset
Wenn Sie eine Verlangsamung des Trainings vermeiden wollen, können Sie Ihre Daten stattdessen als `tf.data.Dataset` laden. Sie können zwar Ihre eigene
tf.data"-Pipeline schreiben können, wenn Sie wollen, haben wir zwei bequeme Methoden, um dies zu tun:
- [`~TFPreTrainedModel.prepare_tf_dataset`]: Dies ist die Methode, die wir in den meisten Fällen empfehlen. Da es sich um eine Methode
Ihres Modells ist, kann sie das Modell inspizieren, um automatisch herauszufinden, welche Spalten als Modelleingaben verwendet werden können, und
verwirft die anderen, um einen einfacheren, leistungsfähigeren Datensatz zu erstellen.
- [`~datasets.Dataset.to_tf_dataset`]: Diese Methode ist eher auf niedriger Ebene angesiedelt und ist nützlich, wenn Sie genau kontrollieren wollen, wie
Dataset erstellt wird, indem man genau angibt, welche `columns` und `label_cols` einbezogen werden sollen.
Bevor Sie [`~TFPreTrainedModel.prepare_tf_dataset`] verwenden können, müssen Sie die Tokenizer-Ausgaben als Spalten zu Ihrem Datensatz hinzufügen, wie in
dem folgenden Codebeispiel:
```py
def tokenize_dataset(data):
# Keys of the returned dictionary will be added to the dataset as columns
return tokenizer(data["text"])
dataset = dataset.map(tokenize_dataset)
```
Denken Sie daran, dass Hugging Face-Datensätze standardmäßig auf der Festplatte gespeichert werden, so dass dies nicht zu einem erhöhten Arbeitsspeicherbedarf führen wird! Sobald die
Spalten hinzugefügt wurden, können Sie Batches aus dem Datensatz streamen und zu jedem Batch Auffüllungen hinzufügen, was die Anzahl der Auffüllungs-Token im Vergleich zum Auffüllen des gesamten Datensatzes reduziert.
```py
>>> tf_dataset = model.prepare_tf_dataset(dataset, batch_size=16, shuffle=True, tokenizer=tokenizer)
```
Beachten Sie, dass Sie im obigen Codebeispiel den Tokenizer an `prepare_tf_dataset` übergeben müssen, damit die Stapel beim Laden korrekt aufgefüllt werden können.
Wenn alle Stichproben in Ihrem Datensatz die gleiche Länge haben und kein Auffüllen erforderlich ist, können Sie dieses Argument weglassen.
Wenn Sie etwas Komplexeres als nur das Auffüllen von Stichproben benötigen (z. B. das Korrumpieren von Token für die maskierte Sprachmodellierung), können Sie das Argument
Modellierung), können Sie stattdessen das Argument `collate_fn` verwenden, um eine Funktion zu übergeben, die aufgerufen wird, um die
Liste von Stichproben in einen Stapel umwandelt und alle gewünschten Vorverarbeitungen vornimmt. Siehe unsere
[examples](https://github.com/huggingface/transformers/tree/main/examples) oder
[notebooks](https://huggingface.co/docs/transformers/notebooks), um diesen Ansatz in Aktion zu sehen.
Sobald Sie einen `tf.data.Dataset` erstellt haben, können Sie das Modell wie zuvor kompilieren und anpassen:
```py
model.compile(optimizer=Adam(3e-5))
model.fit(tf_dataset)
```
</tf>
</frameworkcontent>
<a id='pytorch_native'></a>
## Trainieren in nativem PyTorch
<frameworkcontent>
<pt>
<Youtube id="Dh9CL8fyG80"/>
[`Trainer`] kümmert sich um die Trainingsschleife und ermöglicht die Feinabstimmung eines Modells in einer einzigen Codezeile. Für Benutzer, die es vorziehen, ihre eigene Trainingsschleife zu schreiben, können Sie auch eine Feinabstimmung eines 🤗 Transformers-Modells in nativem PyTorch vornehmen.
@ -299,6 +418,8 @@ Genauso wie Sie eine Bewertungsfunktion zu [`Trainer`] hinzugefügt haben, müss
>>> metric.compute()
```
</pt>
</frameworkcontent>
<a id='additional-resources'></a>
@ -309,4 +430,4 @@ Weitere Beispiele für die Feinabstimmung finden Sie unter:
- [🤗 Transformers Examples](https://github.com/huggingface/transformers/tree/main/examples) enthält Skripte
um gängige NLP-Aufgaben in PyTorch und TensorFlow zu trainieren.
- [🤗 Transformers Notebooks](notebooks) enthält verschiedene Notebooks zur Feinabstimmung eines Modells für bestimmte Aufgaben in PyTorch und TensorFlow.
- [🤗 Transformers Notebooks](notebooks) enthält verschiedene Notebooks zur Feinabstimmung eines Modells für bestimmte Aufgaben in PyTorch und TensorFlow.

View File

@ -88,8 +88,6 @@
title: Tool use
- local: chat_templating_writing
title: Writing a chat template
- local: chat_response_parsing
title: Response parsing
title: Chat with models
- sections:
- local: serving
@ -119,8 +117,6 @@
title: Tools
- local: transformers_as_backend
title: Inference server backends
- local: continuous_batching
title: Continuous Batching
title: Inference
- isExpanded: false
sections:
@ -220,17 +216,16 @@
- local: quantization/contribute
title: Contribute
title: Quantization
- isExpanded: false
sections:
- local: kernel_doc/overview
title: Kernels in transformers
title: Kernels
- isExpanded: false
sections:
- local: serialization
title: ONNX
- local: tflite
title: LiteRT
- local: executorch
title: ExecuTorch
- local: torchscript
title: TorchScript
title: Export to production
- isExpanded: false
sections:
@ -286,8 +281,6 @@
title: Knowledge Distillation for Computer Vision
- local: tasks/keypoint_matching
title: Keypoint matching
- local: tasks/training_vision_backbone
title: Training vision models using Backbone API
title: Computer vision
- sections:
- local: tasks/image_captioning
@ -345,12 +338,16 @@
title: Configuration
- local: main_classes/data_collator
title: Data Collator
- local: main_classes/keras_callbacks
title: Keras callbacks
- local: main_classes/logging
title: Logging
- local: main_classes/model
title: Models
- local: main_classes/text_generation
title: Text Generation
- local: main_classes/onnx
title: ONNX
- local: main_classes/optimizer_schedules
title: Optimization
- local: main_classes/output
@ -377,8 +374,6 @@
title: Image Processor
- local: main_classes/video_processor
title: Video Processor
- local: main_classes/kernels
title: Kernels
title: Main Classes
- sections:
- sections:
@ -548,6 +543,8 @@
title: Helium
- local: model_doc/herbert
title: HerBERT
- local: model_doc/hgnet_v2
title: HGNet-V2
- local: model_doc/hunyuan_v1_dense
title: HunYuanDenseV1
- local: model_doc/hunyuan_v1_moe
@ -564,8 +561,6 @@
title: LED
- local: model_doc/lfm2
title: LFM2
- local: model_doc/lfm2_moe
title: LFM2Moe
- local: model_doc/llama
title: LLaMA
- local: model_doc/llama2
@ -1008,8 +1003,6 @@
title: AltCLIP
- local: model_doc/aria
title: Aria
- local: model_doc/audioflamingo3
title: AudioFlamingo3
- local: model_doc/aya_vision
title: AyaVision
- local: model_doc/blip
@ -1030,8 +1023,6 @@
title: CLIPSeg
- local: model_doc/clvp
title: CLVP
- local: model_doc/cwm
title: Code World Model (CWM)
- local: model_doc/cohere2_vision
title: Cohere2Vision
- local: model_doc/colpali
@ -1192,8 +1183,6 @@
title: TVP
- local: model_doc/udop
title: UDOP
- local: model_doc/video_llama_3
title: VideoLlama3
- local: model_doc/video_llava
title: VideoLlava
- local: model_doc/vilt
@ -1259,8 +1248,6 @@
title: Importing Utilities
- local: internal/time_series_utils
title: Utilities for Time Series
- local: internal/rope_utils
title: Rotary Embeddings Utilities
title: Internal helpers
- sections:
- local: reference/environment_variables

View File

@ -55,7 +55,6 @@ deepspeed --num_gpus 2 trainer-program.py ...
</hfoptions>
## Order of accelerators
To select specific accelerators to use and their order, use the environment variable appropriate for your hardware. This is often set on the command line for each run, but can also be added to your `~/.bashrc` or other startup config file.
For example, if there are 4 accelerators (0, 1, 2, 3) and you only want to run accelerators 0 and 2:

View File

@ -51,7 +51,7 @@ This section describes how the model and configuration classes interact and the
### Model and configuration
All Transformers' models inherit from a base [`PreTrainedModel`] and [`PreTrainedConfig`] class. The configuration is the models blueprint.
All Transformers' models inherit from a base [`PreTrainedModel`] and [`PretrainedConfig`] class. The configuration is the models blueprint.
There is never more than two levels of abstraction for any model to keep the code readable. The example model here, BrandNewLlama, inherits from `BrandNewLlamaPreTrainedModel` and [`PreTrainedModel`]. It is important that a new model only depends on [`PreTrainedModel`] so that it can use the [`~PreTrainedModel.from_pretrained`] and [`~PreTrainedModel.save_pretrained`] methods.
@ -66,9 +66,9 @@ model = BrandNewLlamaModel.from_pretrained("username/brand_new_llama")
model.config
```
[`PreTrainedConfig`] provides the [`~PreTrainedConfig.from_pretrained`] and [`~PreTrainedConfig.save_pretrained`] methods.
[`PretrainedConfig`] provides the [`~PretrainedConfig.from_pretrained`] and [`~PretrainedConfig.save_pretrained`] methods.
When you use [`PreTrainedModel.save_pretrained`], it automatically calls [`PreTrainedConfig.save_pretrained`] so that both the model and configuration are saved together.
When you use [`PreTrainedModel.save_pretrained`], it automatically calls [`PretrainedConfig.save_pretrained`] so that both the model and configuration are saved together.
A model is saved to a `model.safetensors` file and a configuration is saved to a `config.json` file.
@ -278,7 +278,7 @@ Every Transformers model output should have a precision or error tolerance of *1
Here are some tips for an efficient debugging environment.
- To debug intermediate results, it depends on the machine learning framework the original model repository is using. For PyTorch, you should write a script to decompose the original model into smaller sub-components to retrieve the intermediate values.
- To debug intermediate results, it depends on the machine learning framework the original model repository is using. For PyTorch, you should write a script to decompose the original model into smaller sub-components to retrieve the intermediate values. For TensorFlow, you may need to use [tf.print](https://www.tensorflow.org/api_docs/python/tf/print). For Flax, make sure the model is *not jitted* during the forward pass (refer to this GitHub [Issue](https://github.com/google/jax/issues/196) for more details).
- It is faster to debug with a smaller pretrained checkpoint versus a larger checkpoint where the forward pass takes more than 10 seconds. If only large checkpoints are available, create a dummy model with randomly initialized weights and save those weights to compare against the Transformers implementation.

Some files were not shown because too many files have changed in this diff Show More