Compare commits

..

412 Commits

Author SHA1 Message Date
193acecc87 final check 2025-11-05 22:19:31 +01:00
bb65d2d953 Fix pr_slow_ci_suggestion.yml after #42023 (#42049)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 22:10:12 +01:00
57bdb4a680 Cleanup workflow - part 1 (#42023)
* part 1

* part 2

* part 3

* part 4

* part 5

* fix 1

* check 1

* part 6

* part 7

* part 8

* part 9

* part 10: rename file

* OK: new_model_pr_merged_notification.yml

* part 11

* fix 2

* revert check

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 21:01:06 +01:00
1a0ae4bb81 Remove some custom datasets defined in codebase (#41511)
* how bad it woud be anyway?

* let's break all

* delete

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 18:26:47 +01:00
5689dd6b8e update huggingface_hub dependency version (#42033)
* update huggingface_hub version

* nit
2025-11-05 16:22:22 +01:00
571352d378 🔴 Isolate prefill from generation loops (#40652)
* isolate-prefill: squash

* prefill inside decoding methods

* simplify autocompile helpers
2025-11-05 14:40:01 +00:00
2418196ef4 Fix the order of methods in processor loading (#42031)
* fix the order

* add a test
2025-11-05 15:33:07 +01:00
561233cabf Change trigger time for AMD CI (#42034)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 14:17:12 +01:00
36b640562b extend fp_quant cases to xpu (#41833)
* extend fp_quant UTs to xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-11-05 10:50:31 +00:00
0c4a202408 [tests] Add Context-parallel CI tests (#41860)
* intial

* simplify tests

* add test_cp_equivalence

* removed fsdp_transformer_layer_cls_to_wrap

* use DataCollatorForLanguageModeling

* remove use_cache=False.

* changes from review

* make script self contained

* moved to fsdp folder

* fix class name
2025-11-05 11:40:51 +01:00
20396951af CodeQL workflow for security analysis (#42015)
* CodeQL workflow for security analysis

Created CodeQL workflow to use reusable workflow from internal and simplified configuration.

* Update CodeQL workflow for main branch only and remving python from analysis

Restrict CodeQL analysis to 'actions' language only.

* Disable pull_request trigger in CodeQL workflow temporarly

Comment out pull_request trigger for CodeQL workflow
2025-11-05 10:59:37 +01:00
3c4cdd549d fix deeepspeed in AMD docker file (#42025)
fix deeepspeed in AMD docker

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-05 10:40:29 +01:00
020e713ac8 [FPQuant] MXFP8 and MXFP4 backwards support (#41897)
* FP-Quant backwards

* fp-quant v0.3.0 docker

* availability version bump

* fp_quant==0.3.1

* fp_quant v0.3.2
2025-11-04 16:52:47 +00:00
371ef0f4a2 [v5] Deprecate Text2Text and related pipelines (#41996)
* Deprecate Text2Text and related pipelines

* Try a restructure

* make fixup

* logging -> logger
2025-11-04 16:47:06 +00:00
6efc1799c1 [kernels] Fix XPU layernorm kernel (#41583)
* fix

* add comment

* better fix

* style

* Update src/transformers/modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-11-04 16:59:07 +01:00
325810e7fc add fuyu fast image processors (#41817)
* added fast processor for fuyu (#36978)

* updated docs for fuyu model (#36978)

* updated test_image_processing  and image_processing_fuyu_fast

* updated fuyu.md and image_processing_fuyu_fast (#36978)

* updated test_image_processing_fuyu (#36978)

* formatted image_processing_fuyu_fast and test_image_processing_fuyu (#36978)

* updated tests and fuyu fast image processing (#36978)

* Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors

* fixed format (#36978)

* formatted files (#36978)

* formatted files

* revert unnecessary changes

* clean up and process by group

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-11-04 15:45:02 +00:00
9a19171fad Add GLPNImageProcessorFast (#41725)
* Add GLPNImageProcessorFast for torch backend

* Address review feedback

- Simplified to_dict() method
- Keep tensors as torch instead of converting to numpy for heterogeneous shapes
- Removed unnecessary shape guards in post_process_depth_estimation
- Improved variable names (tgt -> target_size, d -> resized)
- Removed unnecessary GLPNImageProcessorKwargs class

* Address review feedback

- Simplified to_dict() method
- Keep tensors as torch instead of converting to numpy for heterogeneous shapes
- Removed unnecessary shape guards in post_process_depth_estimation
- Improved variable names (tgt -> target_size, d -> resized)
- Removed unnecessary GLPNImageProcessorKwargs class

* commits after 2nd review

* Address all review feedback and add explicit batched test

- Simplified to_dict() with descriptive variable names (d->output_dict)
- Fixed resize operation: changed from crop to proper resize with interpolation
- Added padding for heterogeneous batch shapes in both slow and fast processors
- Fused rescale and normalize operations for efficiency
- Improved all variable names (tgt->target_size, d->depth_4d->resized)
- Added GLPNImageProcessorKwargs class in slow processor and imported in fast
- Renamed test_equivalence_slow_fast to test_slow_fast_equivalence
- Added explicit test_slow_fast_equivalence_batched test
- All 20 tests passing

* using padding from utils

* simplify glpn image processor fast

* fix docstring

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-11-04 15:44:52 +00:00
26fca86312 Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors (#41871)
* Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors

* Fix default initialization of image_rows and image_cols in Idefics3 and SmolVLM processors
2025-11-04 15:42:47 +00:00
900cf9d33b Fix issue with from pretrained and kwargs in image processors (#41997)
* accept kwargs in image proc from_pretrained

* only use kwargs that are in cls.valid_kwargs

* remove specific logic for _from_auto

* add image_seq_length to Images_kwargs for backward compatibility

* fix missing image kwargs in pix2struct
2025-11-04 10:35:39 -05:00
154d5101a4 add back logging_dir (#42013)
* add back

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-04 16:22:58 +01:00
e3d4fa692e Fix continuous batching tests (#42012)
* Fix continuous batching tests

* make fixup
2025-11-04 15:10:35 +00:00
dd4e048e75 Reduce the number of benchmark in the CI (#42008)
Changed how benchmark cfgs are chosen
2025-11-04 14:07:17 +01:00
6ff4fabd9d Correct syntax error in trainer.md (#42001)
A comma is missing between two parameters in the signature of compute_loss function.
2025-11-04 12:36:54 +00:00
6d4450e341 Fix torch+deepspeed docker file (#41985)
* fix

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-04 10:41:22 +00:00
aee5c2384a DOC Fix typo in argument name: pseudoquant (#41994)
The correct argument name is pseudoquantization. Since there is no error
on passing wrong arguments name (which is arguably an anti-pattern),
this is difficult for users to debug.
2025-11-04 10:48:39 +01:00
5b6c209bc5 [kernels] change import time in KernelConfig (#42004)
* change import time

* style
2025-11-04 10:26:24 +01:00
258c76e4dc Fix run slow v2: empty report when there is only one model (#42002)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-04 06:46:21 +01:00
64397a8301 Fixed wrong padding value in OWLv2 (#41938)
* Update image_processing_owlv2_fast.py

fixed padding value

* fixed padding value

* Change padding constant value from 0.5 to 0.0

* Fixed missed padding value in modular_owlv2.py

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-11-03 18:46:28 -05:00
cd309610c0 Integrate colqwen2.5 using colqwen2 modelling code (#40600)
* adding option for 2.5

* minor - arg in conversion script

* getting started on modelling.py

* minor - shouldve been using modular

* adressing comments + fixing datatype/device _get method

* minor

* commiting suggestion

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* docs + first test

* ruff fix

* minor fix

* ruff fix

* model fix

* model fix

* fine-grained check, with a hardcoded score from the original Hf implementation.

* minor ruff

* update tests values with CI hardware

* adding 2.5 to conversion script

* Apply style fixes

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-03 18:31:07 -05:00
dd8f231495 fix 3 failed test cases for video_llama_3 model on Intel XPU (#41931)
* fix 3 failed test cases for video_llama_3 model on Intel XPU

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust format

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-11-03 18:18:20 +01:00
1619a3475f fix (CI): Refactor SSH runners (#41991)
* Change ssh runner type

* Add wait step to SSH runner workflow

* Rename wait step to wait2 in ssh-runner.yml

* Remove wait step from ssh-runner.yml

Removed the wait step from the SSH runner workflow.

* Update runner type for single GPU A10 instance

* Update SSH runner version to 1.90.3

* Add sha256sum to ssh-runner workflow

* Update runner type and remove unused steps
2025-11-03 18:16:32 +01:00
ff0f7d6498 More data in benchmarking (#41848)
* Reduce scope of cross-generate

* Rm generate_sall configs

* Workflow benchmarks more

* Prevent crash when FA is not installed
2025-11-03 18:05:26 +01:00
80305364e2 Move the Mi355 to regular docker (#41989)
* Move the Mi355 to regular docker

* Disable gfx950 compilation for FA on AMD
2025-11-03 16:41:06 +01:00
a623cda427 [kernels] Add Tests & CI for kernels (#41765)
* first commit

* add tests

* add kernel config

* add more tests

* add ci

* small fix

* change branch name

* update tests

* nit

* change test name

* revert jobs

* addressing review

* reenable all jobs

* address second review
2025-11-03 16:36:52 +01:00
7d5160bd7a Fix torchcodec version in quantization docker file (#41988)
check

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-03 15:27:47 +01:00
22e39dfb31 docs: add continuous batching page (#41847)
* docs: add continuous batching page

* docs(cb): add `generate_batch` example

* docs(cb): add `opentelemtry` and `serving` section

* feat: add `TODO` note about opentelemetry dependency

* docs(cb): add supported features

* docs(cb): add unsupported features

* docs(cb): add `ContinuousBatchingManager` example

* docs(cb): x reference CB in optimizing inference
2025-11-03 15:19:30 +01:00
63fbd50fb4 fix: dict[RopeParameters] to dict[str, RopeParameters] (#41963) 2025-11-03 14:09:27 +00:00
b433ec8b50 test tensor parallel: make tests for dense model more robust (#41968)
* make test forward and backward more robust

* refactor compile part of test tensor parallel

* linting

* pass rank around instead of calling it over and over

* Run slow v2 (#41914)

* Super

* Super

* Super

* Super

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `detectron2` installation in docker files (#41975)

* detectron2 - part 1

* detectron2 - part 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `autoawq[kernels]` installation in quantization docker file (#41978)

fix autoawq[kernels]

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add support for saving encoder only so any parakeet model can be loaded for inference (#41969)

* add support for saving encoder only so any decoder model can be loaded

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* use convolution_bias

* convert modular

* convolution_bias in convertion script

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-11-03 13:56:26 +01:00
3c16c1ae43 Use indices as position_ids in modernebert (#41789)
* Use indices as position_ids in modernebert

* Move position_ids init to the branch
2025-11-03 12:10:24 +01:00
b9f90dc388 add support for saving encoder only so any parakeet model can be loaded for inference (#41969)
* add support for saving encoder only so any decoder model can be loaded

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* use convolution_bias

* convert modular

* convolution_bias in convertion script

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
2025-11-02 18:21:41 +00:00
37a6296283 Fix autoawq[kernels] installation in quantization docker file (#41978)
fix autoawq[kernels]

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-02 18:54:55 +01:00
0ed6d51ae8 Fix detectron2 installation in docker files (#41975)
* detectron2 - part 1

* detectron2 - part 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-02 14:20:36 +01:00
8fb854cac8 Run slow v2 (#41914)
* Super

* Super

* Super

* Super

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-11-01 19:40:40 +01:00
a0bf5a82ee Fix typo in image_processing_lfm2_vl_fast (#41940)
fix typo
2025-10-31 11:02:39 -04:00
6fb6d3c0fb make recurrent_gemma and voxtral cases pass on xpu (#41958)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-31 13:55:33 +00:00
5f8d02f2f1 [v5] Return a BatchEncoding dict from apply_chat_template by default (#41626)
* Flip the default return type for `apply_chat_template` to match the underlying tokenizer

* Remove test_tokenization_for_chat tests, which no longer do anything useful

* Remove test_tokenization_for_chat tests, which no longer do anything useful

* Fix test_encode_message tests

* Fix test_encode_message tests

* Return dicts for Processor too

* Fix mistral-common tests

* Catch one of the processors too

* revert test bug!

* nit fix

* nit fix
2025-10-31 13:50:26 +00:00
4418728dfa V4.57.1 training ci: Refactor test_tensor_parallel.py (#41918)
* refactor test to not depends on subprocess (this way we can easily debug test with breakpoint)

* make test more robust by testing on more process (2 4 8)

* remove 8 gpus tests because llama is too tiny to apply TP then => RuntimeError. This will imply bigger llama for test but since TP=2/4 works already, no need

* linting
2025-10-31 14:46:45 +01:00
0a8ab33f7a Fix: prevent .gitignore truncation in run_clm_no_trainer.py (#41957)
* fix: update gitignore update flow

* fix: remove whitespace
2025-10-31 12:17:09 +00:00
90d1b67db1 fix prepare_config_and_inputs_for_common bug in llava test (#41942)
fix bug

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-31 10:02:39 +01:00
02c324f43f Fix: Gemma3TextConfig rope scaling assignments (#41934)
* Fix: Gemma3TextConfig rope scaling assignments

* Fix: type annotation for rope_parameters
2025-10-30 12:23:54 +00:00
b47b35637f Fix rope_parameters for gemma3 weights conversion script (#41922)
Fix rope_parameters for gemma3 weights conversion script.

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
2025-10-30 11:49:18 +00:00
e7e7eca06b fix some ut failures on XPU w/ torch 2.9 (#41941)
* fix some ut failures on XPU w/ torch 2.9

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-30 11:23:57 +01:00
cad7eeeb5e Minor fix in docker image build workflow (#41949)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-30 11:02:11 +01:00
76fc50a152 Cache latest pytorch amd image locally on mi325 CI runner cluster (#41926) 2025-10-29 19:45:37 +01:00
a43b36cf80 fix some ut failures on XPU w/ torch 2.9 (#41923)
* fix 6 ut failures on XPU w/ torch 2.9

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix UT failures for 4 models on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-29 16:15:33 +01:00
10d557123b Update some workflow files (#41892)
* update

* update

* final check

* final check

* final clean

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-29 14:42:05 +01:00
259d174e36 Fix Florence2 conversion script model_type KeyError (#41866)
hopefully fixed florence2_language keyerror
2025-10-29 13:07:30 +00:00
38df1e946d Allow parse_response to accept token IDs (#41849)
* Allow tokenizer.parse_response() to accept IDs/arrays directly

* Allow tokenizer.parse_response() to accept IDs/arrays directly
2025-10-29 13:04:57 +00:00
5462376a5c Fix invalid examples in QwenVL model docstrings and add Qwen3VL example (#41812) 2025-10-29 12:34:13 +00:00
e6142ad8d2 Add 6 huggingface notebooks on AMD dev cloud (#41883)
* Add 6 huggingface notebooks on AMD dev cloud

* Change all AMD huggingface notebook links to https protocol.

---------

Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>
2025-10-29 12:31:53 +00:00
21dfd6e716 evaluate>=0.4.6 is needed (#41920)
* evaluate>=0.4.6 is needed

* update

Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>

---------

Signed-off-by: Stas Bekman <stas.bekman@snowflake.com>
Co-authored-by: Stas Bekman <stas.bekman@snowflake.com>
2025-10-29 12:20:53 +00:00
b22d0d07ac speed up loading checkpoints for zero stage 3 (#41850)
* update

* update

* update

---------

Co-authored-by: Robert Irvine <robert@seamlessml.com>
2025-10-29 11:59:08 +01:00
4d0b6758b9 Fix: avoid duplicate token in maybe_load_adapters (#41903) 2025-10-28 15:07:23 +00:00
2f9e3ae7f5 make lfm2_moe integration test pass on XPU (#41796)
* make lfm2_moe integration test pass on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_lfm2_moe.py

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-28 15:50:18 +01:00
1f0b490a2c revert changes in _is_package_available (#41891)
* update

* rm comment
2025-10-27 13:59:18 +01:00
8472ac6836 Fix installation cmds in docs (#41887)
* doc fixes

* Fix decorator

* up

* Revert changes
2025-10-27 11:08:05 +00:00
bf91715637 Fix torch.no_grad decorator in VLMS (#41888)
Fix decorator
2025-10-27 11:07:15 +00:00
77e8b9f8df Adds Universal Intelligence to awesome transformers documentation 2025-10-25 18:31:21 +02:00
e2e8dbed13 CI workflow for Flash Attn (#41857)
ci for flash attn

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-25 09:45:47 +02:00
7a833d1ccd 🚨 [Clip] Fix masking and enable flash attention on all model types (#41750)
* fix

* make kwargs fully passed and adjust with outputs xxx

* propogate metaclip 2

* propogate mlcd and fix test

* style

* fix repo consistency, need to add ignore rules as those are building blocks

* style

* oops

* fix mlcd
2025-10-24 20:44:10 +02:00
8bde822a86 Fix TypeError: find_adapter_config_file() got an unexpected keyword argument '_adapter_model_path' (#41604)
* Pass original dict instead of copy to maybe_load_adapters

* Revert "Pass original dict instead of copy to maybe_load_adapters"

This reverts commit 26fe1b3f35419fdc14932dfbda6bb39e4bdb9b34.

* Return cleaned version of adapter_kwargs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-10-24 19:52:14 +02:00
9bb51b311f Share embedding modules in BART, not only weights (#41821)
* Share embedding modules in BART, not only weights

Embedding modules are now shared between encoder, decoder
and shared - it is the same module, like in the T5 implementation.

This has the benefit that it does not matter which module is returned
in `get_input_embeddings`, the caller of the latter can be sure that
modifications done to that (e.g., hooks) apply to the embeddings.

Background: While revamping the gradient checkpointing tests in PEFT via
peft#2860 we found that the gradient enable step
(`modeling_utils.enable_input_require_grads`) does not work for BART.
This leads to gradient checkpointing with `use_reentrant=True` to
fail as it will not detect any gradients. The reason for this is that
the returned value by `get_input_embeddings` (`self.shared`) is not
the module that is called in the encoder, therefore any hooks added
to `self.shared` are not run - in this case the hook set by
`enable_input_require_grads`.

Since the background is a missing hook I've added a test that tests
directly the ability to define hooks and their ability to being called.

* Add explanatory comment

* Don't initialize embeddings when not neccessary

* make fix-copies

---------

Co-authored-by: nemo <git@ningu.net>
2025-10-24 17:22:02 +02:00
090a8946c6 Fix const parsing for dict inputs in chat schemas (#41824)
* Fix const parsing for dict inputs

* make fixup
2025-10-24 15:14:06 +01:00
4faf675232 Fix Qwen2Audio flash attention mask format for generation (#41843)
* Fix Qwen2Audio flash attention mask format for generation

* use create_bidirectional_mask instead

* fix

* fix

* empty

* fix
2025-10-24 14:45:48 +02:00
bb6028cb79 Fix MXFP4 quantizer to support variable num_local_experts and hidden_size (#41795)
Fix MXFP4 quantizer to support variable num_local_experts
2025-10-24 14:18:52 +02:00
7935b869dc Remove redundant code from Qwen3VLProcessor (#41836)
* Remove redundant code from Qwen3VLProcessor

* same modification to modular_qwen3_vl.py
2025-10-24 11:08:49 +00:00
c27efe6e65 further reducing flakiness in utils/check_bad_commit.py (#41658) (#41815)
* 111

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-24 11:36:01 +02:00
8c291846f5 extend 2 blip2 and falcon_h1 test cases to xpu (#41825)
* extend 2 blip2 and falcon_h1 test cases to xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:15:15 +02:00
beb71b7575 extend 2 trainer test cases to xpu (#41829)
extend a trainer test cases to xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:11:15 +02:00
82451cbb30 extend bitnet cases to xpu, all 8 cases pass (#41831)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-24 11:05:12 +02:00
9c20660138 unpin torch/torchcodec for CircleCI (#41839)
CirCleCI with torch 2.9

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-24 08:19:38 +02:00
e4b920b3cf [Parakeet] add output_attention_mask (#41694)
* add output_attention_mask

* style
2025-10-23 23:09:20 +00:00
81b4f9882c transformers serve quantization docs + some api fixes for bitsandbytes (#41253)
* doc

* fix api

* fix

* fix

* fix

* fix args

* minor doc fix

* fix

* style

* rm check for now

* fix

* style

* Update docs/source/en/serving.md

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* add log and update value

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-23 16:00:54 +00:00
2a3f66d9d2 Deprecate warmup_ratio (#41326)
* dep

* style

* deprecate warmup_ratio

* better

* fix

* Revert "style"

This reverts commit cf4f9e7c4f7837a88eea6eeabf8b4dfe9455f6dc.

* Revert "dep"

This reverts commit 1800beb13f407ddb881d0af936860643e84ba085.

* update version
2025-10-23 17:17:21 +02:00
ca01fe4d13 transformers cli default flag fix (#41761) 2025-10-23 13:33:55 +00:00
f780932e05 Fixed some grammar mistakes (#41802)
Added spaces between words, fixed a typo and other errors
2025-10-23 12:39:58 +00:00
e7c5a60368 Fixed grammar mistakes (#41799)
* Fixed grammar mistakes

fixed a couple grammar mistakes

* Update README.md

* Change phrasing a bit more

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2025-10-23 12:34:02 +00:00
91b5a680c0 [Trainer] remove env vars (#41697)
* remove env var

* style

* fix value

* update

* fix

* style

* fix

* maybe this time

* rm tests

* fix
2025-10-23 14:17:20 +02:00
d4562bb8ae Fix Qwen3Next dtype API usage (#41735)
Replace torch.get_current_dtype() with torch.get_default_dtype() to fix FLA compatibility
2025-10-23 12:02:02 +00:00
e46c2ff32e Add a safeguard around a flaky test in gemma2 (#41811)
* Fix _compile flag in flex attn integration

* Revert fix and add precaution around test
2025-10-23 12:36:50 +02:00
3b6ddbcb88 make apollo test case pass (#41805)
make apollo test cases pass

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-23 12:07:31 +02:00
ff04520266 Bump AMD docker (#41792) 2025-10-23 10:44:20 +02:00
01f5ac70a3 flash attn pytest marker (#41781)
* flash attn marker

* 111

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-23 08:39:19 +00:00
2c5b888c95 [Onnx docs] Remove some traces (#41791)
fix
2025-10-23 10:34:25 +02:00
0eb372ba19 [quantization] fix torchao tests after 0.14.0 release (#41777)
* initial commit

* clean int4_weight_only

* make style

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-23 08:26:44 +00:00
87be559508 Fix attention mask in mamba layers (#41790)
* not all mamba models are like LFM

* compile friendly

* adjust slow tests expectation

* naming
2025-10-22 18:15:38 +02:00
2ca506ca1d Fix chat schema tests (#41793)
* Fix chat schema tests

* make fixup
2025-10-22 15:00:49 +00:00
5426947e3a fix type annotation typo in docstring (#41788) 2025-10-22 13:58:18 +00:00
93671b4444 Swap columns and rows of the grid layout in LFM2-VL (#41755)
* swap columns and rows of the grid layout

* update integration tests

* fix the test case

* revert batched test change
2025-10-22 12:52:06 +00:00
18a3349a9f [quantization] Skip Fp8 tests when hardware capability < 8.9 (#41785)
* skipping tests

* style

* nit
2025-10-22 13:33:28 +02:00
e9f241bf89 [quantization] fix compressed_tensors tests (#41780)
fixing tests
2025-10-22 12:37:07 +02:00
7cd1d2b66c [v5] Delete legacy chat template saving (#41648)
* delete lagcy chat template saving

* fix tests

* fix qwen audio
2025-10-22 09:40:55 +00:00
48a36c96da fix: Gemma 3 weights conversion vision and multimodal projector paths (#41767)
fix: Gemma 3 vision and multimodal projector paths
2025-10-22 09:38:56 +00:00
9a27302803 Fix CUDA index out of bounds for q_idx in VLM token type masking for Gemma3, PaliGemma, and example modular (#41757)
* Fix CUDA index out of bounds for q_idx in Gemma3 token type masking

* Fix CUDA index out of bounds for q_idx in modular modeling_new_task_model

* Revert "Fix CUDA index out of bounds for q_idx in Gemma3 token type masking"

This reverts commit f8e5c2a42c305aebd00c46161bf22f520009c8fc.

* Fix CUDA index out of bounds for q_idx in PaliGemma token type masking

* Fix CUDA index out of bounds for q_idx in Gemma3 token type masking
2025-10-22 11:29:47 +02:00
4f8781f84f Remove invalid @staticmethod from module-level get_device_and_memory_breakdown (#41747)
Remove staticmethod decorator from function
2025-10-22 10:52:29 +02:00
a8cece13e2 Fix bark after #41445 (#41645)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-22 09:21:45 +02:00
2e67a9b602 Add LightGlue fast image processor (#41670)
* add fast image processor skel

* add working lightglue fast image processor + tests

* remove plot_keypoint_matching
2025-10-22 00:33:16 +02:00
264cce9e0a Chat response parsing (#40894)
* Initial commit

* Adding more tests, bugfixes, starting tool tests

* Add support for JSON parsers and some tool tests

* stash commit

* stash commit

* stash commit

* stash commit

* stash commit

* Fix cohere schema, fix a lot of the recursive parser code

* GPT-OSS passing too!

* Update tests

* make fixup

* Offset tracking partially done

* stash commit

* stash commit

* Assistant masking Just Works

* make fixup

* stash commit

* stash commit

* JMESPath approach

* stash commit before i rip this PR apart

* Remove broken offset code

* Remove broken offset code

* Update chat parsing code and add tests for Ernie + fix Cohere tests for new format

* Implement tokenizer method

* jmespath dependency handling

* Completed TODOs

* Add support to TextGenerationPipeline

* Update GPT-OSS schema and test cases

* make fixup

* Fix typing (??)

* missing future import

* Use old typing in tokenization_utils_base.py

* put jmespath in various extras

* Remove accidental newline

* Guard tests correctly

* Remove require_jinja on the schema tests since we don't actually apply chat templates there

* make fixup

* fix some bad linter changes

* Fix docstring

* Push draft documentation

* Extend tests, more documentation

* make fixup

* docs docs docs

* Add Processor support

* Add to toctree

* Flag markdown correctly

* Remove double backslashes in docs for simplicity

* Simplify node-regex-to-dict

* Add support to ImageTextToTextPipeline

* Add support to ImageTextToTextPipeline and save/loading support in Processors

* Begin reworking docs to start fitting in response parsing

* Fix rebase

* Expand documentation further

* Expand documentation further

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* Refactor x-regex-to-dict to x-regex-key-value, update the parser logic docs section

* More docs update

* Update TextGenerationPipeline to support tools properly

* Some rebase fixes

* Re-add is_jmespath_available

* Re-add is_jmespath_available

* Add Qwen3 parser and test, add maybe-json support

* Rollback processor changes - we'll wait for legacy saving to be deprecated

* Make fixup

* Revert ImageTextToText changes for now

* Add pipeline test

* make fixup

* Resolve a todo

* Resolve more TODOs and clean up the spec a little

* Add ref in the tools doc

* Update docs/source/en/chat_response_parsing.md

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>

* Update src/transformers/utils/chat_parsing_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add a docstring for parse_response

* Add function docstring and reference it in the docs

* Fix generate link

* Revert Processor changes for now

* Use updated GPT-OSS format

* Print the dict keys instead of the whole dict so the example doesn't become too big

---------

Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2025-10-21 17:26:18 +01:00
3f2db2c205 Simplify pipeline padding logic (#41667)
* Remove a lot of unnecessary pad logic

* Remove unnecessary clone() calls since we're just doing a slice assignment

* Just make the full tensor instead of adding to a zeros tensor
2025-10-21 17:01:48 +01:00
1d651c749e Modernize CLIP modeling code (#41546)
* stranded

* update modular

* modularities

* update

* fx broken

* fx stillb roken

* update

* missed this

* fix metaclip
2025-10-21 16:04:43 +02:00
f39355ec23 [v5] Remove deprecated tranformers.onnx (#41700)
* Remove deprecated tranformers.onnx

* Remove transformers.onnx related doc

* style

* shouldn't have been removed

* fix mismatch between metaclip2 modular en config file

* remove onnx config from not_doctested.txt

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-21 15:22:41 +02:00
5995435d96 Update type hints in modeling_rope_utils.py to use | syntax (#41714)
* Update type hints to use | syntax for Union types

- Replace Union[str, os.PathLike] with str | os.PathLike
- Replace Optional[Union[str, dict]] with str | dict | None
- Keep Union for forward references like 'torch.dtype'
- Update imports to remove unused Union import where possible

This modernizes the type hints to use Python 3.10+ syntax while maintaining
compatibility with forward references.

* Update type hints in modeling_rope_utils.py to use | syntax

- Replace Union[float, dict[str, float]] with float | dict[str, float]
- Remove unused Union import
- Maintain backward compatibility

This modernizes the type hints to use Python 3.10+ syntax.
2025-10-21 12:05:11 +00:00
2383f3fcbb Fix graphormer model compilation with Cython 3.1.4 (#41671)
Hitting this kind of error when running:

```
cython src/transformers/models/deprecated/graphormer/algos_graphormer.pyx
```

```
Error compiling Cython file:
------------------------------------------------------------
...
    (nrows, ncols) = path.shape
    assert nrows == ncols
    cdef unsigned int n = nrows
    cdef unsigned int max_dist_copy = max_dist

    path_copy = path.astype(long, order='C', casting='safe', copy=True)
                            ^
------------------------------------------------------------
src/transformers/models/deprecated/graphormer/algos_graphormer.pyx:88:28: undeclared name not builtin: long
```

This appears to have changed between cython==3.0 and cython==3.1.  AFAICT the
correct type to use here would be `int`.  Switching to it makes the command
succeed and generate an algos_graphormer.c file.
2025-10-21 12:02:23 +00:00
c4e88f78ca Reduce warning noise caused by Tensor.new_tensor (#41748) 2025-10-21 11:54:13 +00:00
2fe4a30340 [kernels] Add version to function mapping (#41685)
add version
2025-10-21 13:27:18 +02:00
ede7976cd2 Fixed incorrect model_type for qwen2vl and qwen2.5vl when config is saved and loaded again (#41758)
* fixed incorrect model_type for qwen2vl and qwen2.5vl

* added tests
2025-10-21 10:54:58 +00:00
ee3a1002e2 [v5] Delete videos from image processing classes (#41607)
* delete

* why there were video tests in image file

* fix tests and copies

* docs and autto class
2025-10-21 12:03:31 +02:00
4e50b8459d upgrade xpu docker file to torch 2.8 (#41551)
* upgrade xpu docker file to torch 2.8

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update Dockerfile

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-21 10:03:04 +02:00
9aab965b1e Add vision contribution guide (#41456)
* vision contrib guide

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* Update CONTRIBUTING.md

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>

* update tiny things

---------

Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
2025-10-20 18:56:47 +02:00
1a034ce1d2 Docs(zh-hans): Refine wording for professionalism in README (#40943)
* [docs] Polish Chinese README translation by replacing informal terms  with professional vocabulary

* [docs] Polish Simplified Chinese README for better professionalism and consistency

- Replace "抱抱脸" with "Hugging Face" to align with standard usage in Chinese developer community
- Replace "流水线" with "pipeline" to maintain consistency with code and technical terminology
- Add proper code formatting (`pipeline`) for API references to match Traditional Chinese version
- Update translation dictionary to reflect these standardized terms
- Improve overall readability and technical accuracy for Chinese developers

These changes enhance the professionalism of the documentation while maintaining consistency with established technical terminology used by the Chinese developer community.
2025-10-20 08:39:49 -07:00
6850ba853f Small Fix for imports (#41411)
small fix
2025-10-20 17:21:04 +02:00
bf0bce8d5f Apply RUFF PIE rules (#41727)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-20 13:32:23 +00:00
2cf8f833b0 Fix documentation issues (#41726)
Fix more documentation issues

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-20 13:31:02 +00:00
517197f795 Update type hints in tokenization_utils.py to use | syntax (#41713)
* Update type hints to use | syntax for Union types

- Replace Union[str, os.PathLike] with str | os.PathLike
- Replace Optional[Union[str, dict]] with str | dict | None
- Keep Union for forward references like 'torch.dtype'
- Update imports to remove unused Union import where possible

This modernizes the type hints to use Python 3.10+ syntax while maintaining
compatibility with forward references.

* Update type hints in tokenization_utils.py to use | syntax

- Replace Union[AddedToken, str] with AddedToken | str
- Replace Union[list[str], list[AddedToken]] with list[str] | list[AddedToken]
- Replace Union[str, list[str]] with str | list[str]
- Replace Union[int, list[int]] with int | list[int]
- Update error messages to use | syntax
- Maintain backward compatibility

This modernizes the type hints to use Python 3.10+ syntax.

* Fix error message formatting in tokenization_utils.py

- Fix error message to use Union syntax instead of | syntax in string
- This prevents potential issues with error message formatting
- Maintains type hint modernization while fixing error messages
2025-10-20 13:24:16 +00:00
9d4ee18e25 [doc] remove broken notebooks on AMD Dev Cloud (#41743)
revert
2025-10-20 14:36:53 +02:00
818f7f10e4 Revert "Remove upper version bound of pandas" (#41744)
Revert "Remove upper version bound of pandas (#41677)"

This reverts commit a15d77cd0c3e4c1d9f0a196da5996b735eead37e.
2025-10-20 14:25:32 +02:00
ce4ffeeb6c Fix typo in LFM-VL (#41742)
oops, remove untrelated commits
2025-10-20 13:55:41 +02:00
cb6f03fce4 Fix Qwen3-Omni inference when mixing video and image inputs in one batch (#41741)
* Fix qwen3omni inference when mixing video and image inputs in one batch

* Fix `router_aux_loss_coef`

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-20 11:35:02 +00:00
8fc5420913 Gemma3 conversion script maintenance (#41704)
* conversion: add include_vision_encoder flag (default true)

* conversion: update for inverted model.language_model weight path

* conversion: revert include_vision_encoder to True by default

* conversion: add chat template path flag
2025-10-20 12:52:22 +02:00
71db0d49e9 feat: add benchmark v2 ci with results pushed to dataset (#41672) 2025-10-20 08:56:58 +01:00
307c523854 further improve utils/check_bad_commit.py (#41658) (#41690)
* fix

* Update utils/check_bad_commit.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-10-17 23:07:00 +02:00
448c553ccb Update run_name docs in TrainingArguments (#41705)
Update run_name docs in TrainingArguments
2025-10-17 20:40:03 +00:00
cb4d4f5b75 pin torchcodec on CI docker image (#41703)
pin

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-17 20:50:18 +02:00
ac81541778 🌐 [i18n-KO] Translated gemma3n.md to Korean (#40873)
* fix: manual edits

* Apply suggestions from code review

Apply suggestions from code review and make additional revisions

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>

* Apply suggestions from code review

Apply suggestions from code review — updated inline links for related text

* Apply suggestions from code review

Apply suggestions from code review - final

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-17 09:57:05 -07:00
e7592f2508 [docs] Manual tp-plan (#41674)
* manual tp-plan

* feedback
2025-10-17 09:38:26 -07:00
347a0f9e83 Simplify GQA conditions in sdpa_attention.py (#41699)
Removed unnecessary checks for key being a torch.fx.Proxy in GQA conditions because fx tracing is no longer supported, and torch.export supports enable_gqa.
2025-10-17 16:36:38 +00:00
7e204ad121 [Attn] Allow dynamic causality in SDPA via Kwargs (#41692)
* is causal as kwarg

* Update src/transformers/integrations/sdpa_attention.py

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix comment

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-17 15:51:51 +00:00
a15d77cd0c Remove upper version bound of pandas (#41677)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 17:31:41 +02:00
12a50f294d Enable FURB rules in ruff (#41395)
* Apply ruff FURB rules

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Enable ruff FURB rules

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 15:00:40 +00:00
39b6d3bf7e Remove skipped tests without parents (#41691)
remove
2025-10-17 16:25:40 +02:00
75da795d8f 🚨 Remove torch.fx support (#41683)
* remove all

* fix comments

* better checks

* doc
2025-10-17 16:12:46 +02:00
080d704af1 Fix Pylint warnings (#41644)
* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Raise with an exception

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 13:09:42 +00:00
c01ceffeb4 Enable faiss-cpu on Windows (#41678)
faiss-cpu is supported on Windows

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 13:00:57 +00:00
10de06dace 🚨 [v5] Refactor RoPE for layer types (#39847)
* update

* batch update model code

* typos

* too many diffs, dump

* dump again

* another dump

* fix copies

* make `rope_scaling_dict` self attr

* fix a few more tests

* another update

* fix a few more tests, hopefully last ones

* fox copies

* fix copies again

* fix newly added models, I hate rebasing on main

* update config files

* modular files

* fix rope utils test

* docstring has to be indented more, why?

* oops forgot to update some modualr files

* copy from doesn't copy decorators?

* fix overriden test as well

* add a new test

* fix failing tests again

* update docstrings

* fix phi3

* fix two models

* fix copies

* forgot to add

* stupid bug from modular conversion

* fix slow tests

* update to call rotary emb once per model forward

* 3K tests failing?!

* update

* update more models

* fix copies

* fix the rest of tests hopefully

* fix after rebase

* fix the rope tests

* fix docs omni

* change a bit

* models with layer types

* why it was deleted?

* fix a few tests

* fix last test!

* delete extra empty lines

* add a test case

* more changes

* fix models

* typing hint for nested rope params

* missed when resolving conflicts

* delete layer types and fix typo

* fix copies

* fix copies

* update docs text

* docs

* huuge update all models

* fix copies

* rename attr to align with new format

* delete redundant rope tests

* trigger ci

* update the case

* this is why i hate rebasing

* maybe fixed?

* oops

* now fix?

* fix last tests and copies

* fix copies?

* fix minimax and gemma3n

* update typo

* deprecation end version

* final fix copies :fingers-crossed:

* oh my, add the docs in toctree

* oke, this is really the last fix

* fix copies and hope that tests won't start failing again

* use rope scaling if saved

* fix slow tests

* fix cwm and unrelated deepseek

* fix last

* update

* hope it works now, it took so long

* lets keep None for now, I will try to remove after checking tests

* some more fixes, i find and replace does not always find all cases

* last fix of tests

* arthur's comment for extra foreward kwargs

* delete unused code

* fix slow qwen tests

* delete layer types from models

* faulty modular conversion

* fix qwen omni

* fix copies and style

* address my comment

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-17 14:57:27 +02:00
def9a7ef05 Use | for Optional and Union typing (#41675)
Use | for Optional and Union typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 12:52:47 +00:00
0beda2aa3a Fix MarkDown syntax (#41676)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-17 12:44:27 +00:00
0b3aef1da9 🚨 Remove torchscript support (#41688)
* remove a lot

* remove the rest

* doc
2025-10-17 13:38:27 +02:00
7370a1babd path validation for security reason (#41256)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-17 12:36:04 +02:00
151d6adc86 Remove require_torch_bf16_gpu (#40979)
* More cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove more functions

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-17 12:35:19 +02:00
252d7cd952 Remove deprecated use_auth_token parameter (#41666)
* Remove deprecated use_auth_token

* code styl

* fix test

* Update examples/pytorch/speech-recognition/README.md
2025-10-17 09:57:46 +00:00
415cb37708 torch 2.9 still don't ❤️ torchcodec 0.8 💔 (#41686)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-17 11:57:28 +02:00
1eb45cd61d Fix ckpt in docs (#41659)
* fix ckpt in docs

* fix config ckpt
2025-10-17 11:00:34 +02:00
354567d955 Adding superglue fast image processing (#41394)
* Default implementation - no time improvement

* Improved implementation - apparently 2 times faster with only simple function refactor

* elementary torch first approach, still need further implementation of torch first method

* torch-first approach finished

* refactor processor

* refactor test

* partial doc update

* EfficientLoFTRImageProcessorFast based implementation

* EfficientLoFTRImageProcessorFast based implementation

* Logic checked - Test Passed - Validated execution speed

* use modular for efficientloftr

* fix import

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-16 19:34:09 +00:00
4dd4133d32 🌐 [i18n-KO] Translated ko-LFM2.md to Korean (#41502)
* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/lfm2.md

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/model_doc/lfm2.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/model_doc/lfm2.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

* Update docs/source/ko/model_doc/lfm2.md

Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>

---------

Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Ahnjj_DEV <ahnjj.dev@gmail.com>
2025-10-16 11:29:04 -07:00
eefbf4ac8b 🌐 [i18n-KO] Translated llama4.md to Korean (#40396)
* docs: ko: llama4.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

* Update docs/source/ko/model_doc/llama4.md

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>

---------

Co-authored-by: TaskerJang <bymyself103@naver.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
2025-10-16 11:28:27 -07:00
50ca781d78 🌐 [i18n-KO] Translated code_llama.md to Korean (#40558)
* docs: ko: code_llama.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>

---------

Co-authored-by: Harheem Kim <49297157+harheem@users.noreply.github.com>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
2025-10-16 11:27:46 -07:00
8739fc05c4 [i18n-KO] Translated big_bird.md to Korean (#40445)
* docs: ko: BigBird.md

* feat: nmt draft

* fix: manual edits
2025-10-16 11:23:56 -07:00
77b5ad65ee 🌐 [i18n-KO] Translated sam_hq.md to Korean (#41340)
* fix: manual edits

* Apply suggestions from code review

Apply suggestions from code review

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>

* Apply suggestions from code review

Apply suggestions from code review

Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>

---------

Co-authored-by: HyunSang Jang <tasker.dev103@gmail.com>
Co-authored-by: Woojun Jung <46880056+jungnerd@users.noreply.github.com>
2025-10-16 11:10:16 -07:00
a9731a725e 🌐 [i18n-KO] Translated chat_extras.md to Korean (#39863)
* docs: ko: chat_extras.md

* feat: nmt draft

* fix: manual edits

* Apply suggestions from code review

* Apply suggestions from code review

* Update docs/source/ko/chat_extras.md
2025-10-16 10:41:03 -07:00
bdbc2d037b [Trainer] [Breaking change] use_cache default to False (#41585)
* use_cache default to `False` when training

* style

* Fix comment

* add checks

* style

* set

* switch
2025-10-16 18:51:36 +02:00
fe11cbb808 Erroring when KernelConfig is passed without use_kernels = True (#41657)
* update

* update
2025-10-16 18:08:46 +02:00
6344371a91 improve utils/check_bad_commit.py (#41658)
* robust

* robust

* robust

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-16 15:51:19 +00:00
a408384a88 Improve package version check (#41661)
fix
2025-10-16 17:31:58 +02:00
f7c33abab3 Small changes to benchmarking script (#41662) 2025-10-16 17:25:49 +02:00
9839d57a02 Fix serving continuous batching (#41624)
* udpate-serving-cb

* style

* style

* check none

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-10-16 17:24:21 +02:00
e85d5ab2bb Fix dtype casting with quantization (#41665)
fix dtype casting
2025-10-16 17:19:32 +02:00
1c36d407d5 Add in-out modalities as class attribute per model (#41366)
* update all models

* fix copies

* explanation comment

* better notation in omni model

* style

* fix copies

* output_modalities under generation mixin

* fix copies

* oh, glm4v also needs conversion
2025-10-16 17:11:06 +02:00
0215846d98 Switch to CB if cache_implementation == paged (#41655)
* Add a switch to CB in case of paged cache

* Added paged as a valid cache implem

* Added a fallback on inputs_ids as a name

* Rookie mistake

* Removed paged from cache implems

* Added warning about some  beam search args

* Moved up CB warning
2025-10-16 17:00:18 +02:00
9e99198e5e Use | for Optional and Union typing (#41646)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 14:29:54 +00:00
bf815e9b5e [Masks] Fix mask handling in eager for vision models (#41625)
add mask handling in case of models that do use it
2025-10-16 16:27:26 +02:00
vb
4a43e3d57c purge HF_HUB_ENABLE_HF_TRANSFER; promote Xet (#41656) 2025-10-16 16:17:09 +02:00
8725ce10ed [Fix] Deepseek V3 expert bias routing (#41647)
* [Fix] Deepseek V3 expert bias routing

* [Fix] fix-copies

* [Fix] Run make style
2025-10-16 14:04:48 +00:00
1fb3fc4db0 [kernels] refactor function kernel calling (#41577)
* refactor function kernel callling

* nit

* don't pass the mapping

* use _kernels_available

* rm import
2025-10-16 15:43:02 +02:00
9176af574a Double router compute? (#41653)
* weird double router compute?

* flip it
2025-10-16 15:17:21 +02:00
503c933f36 Fix confusing cls assignment (#41642)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 13:01:07 +00:00
2aff20aff6 Fix typos in documentation (#41641)
Fix typos

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 12:58:46 +00:00
981370c038 Format MarkDown documentation and tiny fixes (#41638)
* Fix MarkDown syntax

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-16 12:58:06 +00:00
eef9fb2af3 Fix EncoderDecoder cache (#41612)
* Fix EncoderDecoder cache

* Add the option for the ddp data tuples to have 2 elems

* Modifiy the order of the KV and sliding

* Adapted RAG and Whisper to new EncoderDecoderCache

* A single comma

* Remove kwargs in map

* Fixed order in manual injection cache test

* Slight changes to support legacy format

* Removed Nonnes
2025-10-16 14:55:41 +02:00
35dc8f0a2e Adjust device logging level and add minor fixes (#41636)
This commit addresses a noisy warning and improves the robustness of the base pipeline implementation.

- The device placement message in the pipeline base class has been changed from a `warning` to a `debug` log. This reduces log noise for users who are aware of their device setup, while still providing the information for debugging purposes.

- Additionally, potential `UnboundLocalError` exceptions in the `_pad` and `check_model_type` functions have been prevented by initializing variables before their conditional assignment.
2025-10-16 12:47:39 +00:00
2935a1be19 Fix fp32_ln for various models (#41605)
* Add is_causal to KosmosTextAttention

* Move get target_dtype to be imported elsewhere

* Fix fp32 flash attention bug in bark

* Fix is_causal in mllama

* Fix fp32 issue on StableLM

* Fix repo-consistency
2025-10-16 14:18:49 +02:00
b9bd8c45a1 [CI] Build translated docs (#41632)
fix
2025-10-16 14:01:33 +02:00
baecdb8a97 [Ernie 4.5 Moe] Fix Moe and offloading (#41385)
fix
2025-10-16 13:59:01 +02:00
44539827d5 [Executorch] Simplify for encoder models (#41627)
* Trigger Build

* revert extra treatment for executorch as we default to no vmapping now
2025-10-16 13:57:52 +02:00
143acfe2ce fix check inputs for text2text pipeline (#41556)
fix check inputs

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-16 11:42:41 +00:00
67fae90519 Fix FP-Quant quantization fallback CPU dispatch. (#41619)
* fp_quant fix

* Update quantizer_fp_quant.py
2025-10-16 11:41:01 +00:00
af2a66ced9 Migrate transformers cli to Typer (#41487)
* Add typer-slim as explicit dependency

* Migrate CLI to Typer

* code quality

* bump release candidate

* adapt test_cli.py

* Remove ./commands + adapt tests

* fix quality

* consistency

* doctested

* do not serve model in chat

* style

* will it fix them?

* fix test

* capitalize classes

* Rebase

* Rebase

* tests + fixup

tests + fixup

* csutom error message

* fix ?

* should be good

* fix caplog globally

* inner caplog

* last attempt

* Retry

* Let's try with capsys disabled

---------

Co-authored-by: Lysandre <hi@lysand.re>
2025-10-16 13:29:42 +02:00
a59124e27e Add missing dates to docs (#41576)
add dates
2025-10-16 09:32:28 +00:00
81f97b17d2 Remove randomly added script (#41650)
remove
2025-10-16 11:23:53 +02:00
c0a5cf19ad Fix tokenization test (#41649)
fix
2025-10-16 11:14:20 +02:00
3ef6f2c415 Allow passing tp_plan in from_pretrained directly (#41435)
* start

* allow passing it

* fix plans

* fix

* fix

* style

* style

* fix

* add_test

* oupsi indent

* fix

* fix

* fix for CI without accelerator

* fix import
2025-10-16 11:12:07 +02:00
59efd86da2 Add aux loss for GLM-4.5V (#41564)
* add aux

* update

* update config to text_config

* use qwen data class to avoid repeat again

* format

* update

* use 1e-4

* update

* update for remove init

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
2025-10-16 09:04:21 +00:00
7b7d17f9bf 🚨 [v5] Toggle the serialization format in processors (#41474)
* toggle the serialization

* prob this fixes it

* fix tests

* typo

* delete legacy save entirely

* remove extra nesting in if

* revert test and serialzie a public attr instead of private
2025-10-16 10:19:22 +02:00
e20df45bf6 Add Backbone API fine-tuning tutorial (#41590)
---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-15 18:42:32 +02:00
19df66dcba Update executorch.md (#41582)
* Update executorch.md

* Update executorch.md

* Update executorch.md

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-15 09:01:46 -07:00
9f71e3a604 [docs] Duplicate entry (#41591)
fix
2025-10-15 17:02:36 +02:00
bc9900562d Fix quantization base class (#41613)
* fix

* fix

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-15 16:58:17 +02:00
72fd67929b Remove deprecated code (#41616)
remove

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-15 16:57:52 +02:00
da382917aa Remove the head masking block in some vision models (#41620)
* old

* new

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 15:51:01 +02:00
313afcc468 [chat template] update when "push_to_hub" (#39815)
* update templates push to hub

* rvert jinja suffix and move it to processor file
2025-10-15 13:49:59 +00:00
7bba4d1202 Fix video processing channel format (#41603)
fix
2025-10-15 15:48:01 +02:00
ab92534377 enable sdpa enable gqa logic for Ascend NPU (#41601)
* enable gqa logic for Ascend NPU

* remove redundant comments

* fix comments about Ascend NPU

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
2025-10-15 13:45:28 +00:00
56a727dde5 Add fast path for bidirectional mask creation to fix regression (#41586)
* fixed performance regression

* also fixed the older_torch function

* Update src/transformers/masking_utils.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

* more general

* fix slicing

* fix data dependent

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-15 15:30:39 +02:00
dc6fdeb705 Update a dataset reop link (#41618)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 14:41:38 +02:00
3953b65440 Reinstate early CUDA init fix (#41617)
* Reinstate early CUDA init fix

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Delay import further

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-15 14:41:10 +02:00
96d245a83d torch 2.9 don't ❤️ torchcodec 💔 (#41610)
pin

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-15 14:34:00 +02:00
bb0c3af995 More markdown file fixes (#41599)
* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-15 12:29:27 +00:00
70e871959c Fix trainer simple tests (#41449)
* fix

* fix ray

* train to tune

* breaking changes wrt generation config

* Fix !

* fix

* fix

* fix deepspeed !

* fix

* fix

* fix

* improve logic

* revert and fix

* revert comment

* oups

* revert change

* fix

* style

* typo in comment

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-15 14:09:00 +02:00
c4210796e0 Import expand_device_map instead of redefining it (#41608)
remove it
2025-10-15 14:00:09 +02:00
fcd1ccdb78 [Docs] Fix changed references (#41614)
* fix

* fix

* other ln
2025-10-15 13:59:13 +02:00
2b2c20f315 Update issue template (#41573)
* update

* fix
2025-10-15 13:54:37 +02:00
e2122c4bcb remove ray_scope and check_quantized_param (#41587)
remove
2025-10-15 13:10:35 +02:00
e89cef6625 fix some case failures lead by "torch.compile recompiled part of th… (#41558)
* fix some case failures lead by "`torch.compile` recompiled part of the forward pass" in xpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* update comment

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-10-15 10:45:29 +00:00
26b7f66850 Add logits_to_keep to many older CausalLM models (#41335)
* Add logits_to_keep to CausalLM models

* Skip failing test for git model

* Remove unused return_dict from kosmos2 signature

* Revert BlipForQuestionAnswering
2025-10-15 11:56:01 +02:00
5db730786d [device_map] Accelerate loading by computing device_map much faster (#41548)
* start

* add the important fix

* continue

* big cleanup

* type hints

* add method

* fix typehints

* typehints

* fix

* oupsi

* remove space

* improve function

* CI
2025-10-15 11:18:57 +02:00
13a35a5057 Enable non-streaming mode in transformers serve (#41446)
* Enable non-streaming in transformers serve

Remove typos

Remove typos

Remove typos

* Fix tests

* Arthur review
2025-10-15 09:37:26 +02:00
94df0e6560 Benchmark overhaul (#41408)
* Big refactor, still classes to move around and script to re-complexify

* Move to streamer, isolate benches, propagate num tokens

* Some refacto

* Added compile mode to name

* Re-order

* Move to dt_tokens

* Better format

* Fix and disable use_cache by default

* Fixed compile and SDPA backend default

* Refactor results format

* Added default compile mode

* Always use cache

* Fixed cache and added flex

* Plan for missing modules

* Experiments: no cg and shuffle

* Disable compile for FA

* Remove wall time, add sweep mode, get git commit

* Review compliance, start

* Apply suggestions from code review

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Update benchmark_v2/framework/benchmark_runner.py

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>

* Disable workflow

* Pretty print

* Added some pretty names to have pretty logs

* Review n2 compliance (end?)

* Style and end of PR

---------

Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
2025-10-14 21:41:43 +02:00
9e4199ede3 Gemma3 fixes (#41572)
* Multiple device error fix

* FA2 equivalence fix

* Move the train fwd in cfg test

* Style

* Added comment

* Made the comment more clear
2025-10-14 18:33:27 +02:00
4c8d293599 Fix typsetting and content of llm_tutorial_optimization.md (#41172)
* Fix typsetting of llm_tutorial_optimization

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix errors

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-14 08:40:26 -07:00
a99b1be3c7 Revert some breaking changes bnb (#41581)
fix
2025-10-14 16:28:16 +02:00
82cae9eb52 Add __iter__ to DynamicCache (#41569)
* Add __iter__ to DynamicCache

* Fix tests that use ddp init
2025-10-14 16:16:32 +02:00
4fad35ee4a [VisionEncoderDecoderModel] Update loss function (#40863)
Update loss function
2025-10-14 16:03:00 +02:00
ae6f6cc3e0 Revert "add rmsnorm kernels support for Intel XPU" (#41579)
Revert "add rmsnorm kernels support for Intel XPU (#41563)"

This reverts commit fd787c5f6d667d3e00def70f588972af4437f631.
2025-10-14 15:49:33 +02:00
fd787c5f6d add rmsnorm kernels support for Intel XPU (#41563)
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-10-14 13:26:09 +00:00
4e4f2af586 Add conditional checks to _check_and_adjust_attn_implementation() (#41542) 2025-10-14 13:00:07 +00:00
3648fde486 Add DINOv3Backbone for ConvNext variant (#40651)
---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-10-14 14:57:04 +02:00
abf5b57a68 delete some tokenizer tests using pickle (#41514)
* hate pickle

* hate pickle

* hate pickle

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-14 14:50:51 +02:00
8fe4db5399 [kernels] rm mra kernels (#41507)
* fix modeling

* remove kernel

* fix style
2025-10-14 13:34:04 +02:00
c620c38bb0 [Qwen3VLMoe] Fixed: Expected self.dtype to be equal to src.dtype - routing_weights casting (#41420)
* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* Fixed Expected self.dtype to be equal to src.dtype on eval

* generated modeling_qwen3_vl_moe.py file

* Fixed Ernie_4_5_MoE router casting

* Fixed routing_weights dtype casting (ernie4_5_moe, hunyuan_v1_moe, qwen2_moe, qwen3_moe, qwen3_next,qwen3_omni_moe)

* rollback hunyuan_v1_moe changes

---------

Co-authored-by: Daniel Oliveira <daniel-oliveira-11@hotmail.com>
Co-authored-by: Daniel Oliveira <36623265+daniel3303@users.noreply.github.com>
2025-10-14 13:14:49 +02:00
0798797ec9 Fix an import error with PreTrainModel (#41571) 2025-10-14 13:13:37 +02:00
0566b6f5bd Patch MistralCommonTokenizer (#41439)
* Fix token_to_id and add add_generation_prompt

* Fix spm download

* Refactor spm

* Try another possibly non-gated spm

* Improve get_vocab

* lint

* Improve get_vocab

* Add warn to piece_to_id

* Improve from_pretrained raise and revert model spm

* Revert fast
2025-10-14 11:13:19 +00:00
b3e3c3dc93 [Qwen3VL] fix device mismatch error for FSDP2 training (#41536)
For FSDP2, parameters might be on a meta device, and the weight.device attribute may
not accurately reflect where the actual computation will happen during forward passes.

```log
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 776, in forward
    pos_embeds = self.fast_pos_embed_interpolate(grid_thw)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 745, in fast_pos_embed_interpolate
    pos_embeds = self.pos_embed(idx_tensor) * weight_tensor[:, :, None]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/module.py", line 1879, in _call_impl
    return inner()
           ^^^^^^^
  File "torch/nn/modules/module.py", line 1827, in inner
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "torch/nn/modules/sparse.py", line 192, in forward
    return F.embedding(
           ^^^^^^^^^^^^
  File "torch/nn/functional.py", line 2546, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but got index is on cpu, different from other tensors on cuda:0 (when checking argument in method wrapper_CUDA__index_select)
```
https://github.com/volcengine/verl/pull/3686#issuecomment-3380981817

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-14 10:28:25 +00:00
b84c0b31c6 Remove references to AutoModelForVision2Seq (#41513)
* Since Vision2Seq is deprecated, remove it from pipelines and docstrings

* Catch some more references
2025-10-13 17:00:07 +01:00
1ee3b288a6 [from_pretrained] Small refactor from_pretrained: move around unrelated stuff (#41445)
* drafts

* up

* simplify modeling utils

* more simplifications

* type kwargs

* up

* move more accelerate related stuff

* safeguarding?

* nits

* remove func when func is NOPE

* more

* nits

* styling

* yups

* up

* ups

* revert

* protect trainer utils iport

* fix doc

* Update src/transformers/integrations/peft.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* review

* update

* ?

* fixx

* update

* super small update

* ups

* style

* this is stupid

* 🤦 well this was the issue

* small nit

* fix

* nit

* damn the missing return

* one last stupid fix

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-13 16:33:32 +02:00
cad74496ca [model] Add VideoLLaMA3 implementation (#40499)
* Add VideoLLaMA3 implementation

* Run style fix

* Switch to modular

* Fix config and smart_resize

* Fix

* Fix

* Fix style

* Fix

* Ruff fix

* Rename

* Rename

* Fix

* Clean

* Fix consistency

* Add doc

* Fix

* Fix

* Fix doc

* Update generated code

* remove test_initialization

* fix tests

* simplify

* tests

* Add VideoLlama3IntegrationTest

* replace asserts

* fix tests

---------

Co-authored-by: steven-ccq <55176896+steven-ccq@users.noreply.github.com>
Co-authored-by: steven-ccq <1456320989@qq.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 15:54:34 +02:00
3813a8e3a1 Add VideoMAE video processor (#41534)
* Add video processor for VideoMAE

* Document VideoMAE video processor

* Add regression tests for VideoMAE video processor

* refactor: Use direct batch key access for pixel_values_videos

* test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor

* docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)
2025-10-13 15:42:27 +02:00
66d8d7a077 Fixed typos and formatting (#34215)
#hacktoberfest
2025-10-13 13:38:06 +00:00
d621be8286 🚨 [v5] generate delegates default cache initialization to the model (#41505) 2025-10-13 13:20:48 +01:00
d7c9fbdb64 Enable modular files from other libraries (#41372)
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-13 13:48:32 +02:00
41e763decd Add AMD developer cloud support (#41126)
* Add AMD developer cloud support

* Add AMD remote svg link.

* Update notebooks/README.md

Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>

---------

Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com>
2025-10-13 12:17:24 +02:00
cf1e9834ec Restore cuda graphs to continuous batching (#41421)
* Type hints and small fixes

* Remove unusued params

* Made slice inputs the default

* ruffed

* Updated some var name and moved index slicing

* Logging arg in example

* Added some padding debug var and reformat out cg

* First working CG, fixe size

* Working flexible CG

* CG are compatible with all implementations

* Fixed CG API

* Update example

* Documentation

* Fix padding tokens in FA

* Review compliance

* Better doc around weird bug

* Style

* Fix for sliding with CG
2025-10-13 11:57:56 +02:00
6c901bdc0e [SAM] Fix typing hints (#41506)
fix
2025-10-13 11:52:00 +02:00
58f9e13313 Fixed Type-hints in function defintions (#41525)
* Explicitly annotate default None parameters as Optional

* make style.

* make style.

* Fixed check_copies.

* fix consistency.
2025-10-13 11:48:37 +02:00
eb28242251 Add MLlama fast image processor (#41391)
* Merge conflict

* add fast processor

* add fast processor

* make style

* add new convert rgb

* use nested group by shape in mllama fast, add support for multiple inputs in group by shape

* refactor after review

---------

Co-authored-by: Vincent <phamvinh257@gmail.com>
2025-10-13 09:16:05 +00:00
65cb8fac6d [Qwen3VL] fix: hidden_states in place modification error (#41535)
```
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 941, in forward
    hidden_states = self._deepstack_process(
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py", line 960, in _deepstack_process
    hidden_states[visual_pos_masks, :] = local_this
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Output 0 of SliceBackward0 is a view and is being modified inplace. This view was created inside a custom Function (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning the output of the custom Function.
```

Signed-off-by: Hollow Man <hollowman@opensuse.org>
2025-10-13 10:50:14 +02:00
3927ffed31 [testing] reduce runtime of HunYuanMoEV1IntegrationTest:test_model_generation (#41373)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 22:27:01 +02:00
7164924a7e Fix Latex typesetting in documentation (#41177)
Fix Latex typsetting in documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-10 08:54:27 -07:00
26a5368c44 Allow optuna's catch kwargs passthrough (#41496)
* allow optuna's catch kwargs passthrough

* apply ruff formatting

---------

Co-authored-by: nicha <nicha.api@nectec.or.th>
2025-10-10 13:58:07 +00:00
feca4f3de7 remove tpu_num_cores (#41383)
* remove-tpu-num-cores

* fix

* let's remove it

* style

* Update examples/legacy/seq2seq/finetune_tpu.sh

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-10 15:53:28 +02:00
c6042a4169 Remove outdated flags (#41512)
remove flags
2025-10-10 14:34:47 +02:00
dfd4121cd4 add Trainer import to .md in appropriate cell block for training.ipynb transformers_doc (#41484)
add Trainer import to .md in appropriate cell block for docs
2025-10-10 12:04:07 +00:00
60f6ec438a Fix detectron2 import (#41510)
* fix

* fix

* typo
2025-10-10 13:33:47 +02:00
f9f8bf5a10 Revert local_rank deletion and some cleaning (#41504)
* forgot those

* clean

* Fix

* merge

* fix

* fix
2025-10-10 12:23:04 +02:00
b4067472ae Bump to hfh 1.0.0.rc5 to fix test (#41508) 2025-10-10 12:12:08 +02:00
bc529a3368 More trainer cleaning (#41489)
clean
2025-10-10 11:55:43 +02:00
b92fc0c6e1 [QoL] modular conversion shows LoC saved (#41500)
smol qol conversion
2025-10-10 11:55:23 +02:00
2eae7c7452 Set truncation to False in Qwen3Omni to avoid default truncation (#41473)
* Set `truncation` to `False` in Qwen3Omni to avoid default truncation

* move `padding` and `truncation` to audio default args

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
2025-10-10 09:55:18 +00:00
c5094a4f97 [voxtral] language detection + skipping lang:xx (#41225)
* proc + doc update

* improve doc

* add lang:xx in decode

* update voxtral test

* nit

* nit

* update test value

* use regex
2025-10-10 09:18:30 +00:00
f4487ec521 fix gemma3n case failure (#41426)
* fix gemma3n case failure

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update dependency_versions_table.py

* change the case argument passing way to make the case PASS,
generation_config way need re-visit

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-10 09:15:27 +00:00
e8194fe84f Fix some tests (#41503)
* fix

* fix

* doc
2025-10-10 11:05:09 +02:00
9556b36b2f [causallm tester] automate pipeline mappings + bloom tests (#41318) 2025-10-10 10:02:00 +01:00
5aca530b34 [Parakeet] unnecessary warning & auto mapping (#41412)
* add parakeet to CONFIG_MAPPING_NAMES

* TOKENIZER_MAPPING_NAMES update

* fix auto tokenizer

* update

* fix
2025-10-10 11:00:15 +02:00
4f323369db Fixed tiny incorrect imports in glm4v (#41483)
Fixed tiny import issue in glm4v
2025-10-10 08:57:01 +00:00
f5f3457278 Try to remove pickle - BloomTokenizerFast (#41466)
* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

* pickle 1

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-10 10:52:51 +02:00
3585737746 [kernels] rm yoso kernel (#41495)
* disable kernel mapping

* rm kernel

* delete files

* style

* typo
2025-10-10 10:50:12 +02:00
b543679d0e [kernels] Remove RWKV kernel finally ! (#41493)
* rm kernel

* fix style
2025-10-10 10:32:05 +02:00
ac7777be16 fix bnb model loading (#41499) 2025-10-10 08:27:29 +00:00
17c31a98ac Streaming should be handled at the request-level rather than at the istance level (#41444)
* Streaming should be handled at the request-level rather than at the instance level

* Add tests

* Require torch GPU
2025-10-10 10:24:55 +02:00
b28902c86b Remove DISABLE_KERNEL_MAPPING flag (#41475)
rm disable
2025-10-10 10:19:25 +02:00
d0271be18f Update philosophy (#41438)
* update philosophy

* Update docs/source/en/philosophy.md

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/philosophy.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* emphasis

---------

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-10-10 06:52:18 +00:00
0419ff881d Remove local_rank arg from TrainingArguments (#41382) 2025-10-09 18:54:12 +02:00
081391b20e deprecate jit_mode_eval (#41376) 2025-10-09 18:50:45 +02:00
1ddbbdef48 [Trainer] deprecate ray scope (#41403) 2025-10-09 18:50:00 +02:00
c20849bad1 [CI] Fix copies on main (#41486)
fix copies
2025-10-09 18:38:14 +02:00
776eea8612 deprecate overwrite_output_dir (#41323)
* dep

* style

* rm

* wut

* style
2025-10-09 18:36:19 +02:00
3839d51013 report_to default changed to "none" + cleaning deprecated env var (#41375)
* reporting

* fix

* fix
2025-10-09 18:28:48 +02:00
78f79ba5af Update GLM-4.6 doc (#41471)
Update glm4_moe.md
2025-10-09 09:18:05 -07:00
11c597b1b8 Remove deprecated args in Trainer for v5 (#41404)
remove deprecated code
2025-10-09 18:10:14 +02:00
b450d55a91 Remove past_index (#41384)
* remove-tpu-num-cores

* fix

* rm past index

* Revert "fix"

This reverts commit 7608a6c059210957d3a77812e66178c8b79a9313.

* Revert "remove-tpu-num-cores"

This reverts commit ef08a51d71389849851518d67d8ad6c9ea8f04fc.
2025-10-09 18:06:46 +02:00
1a3a5f5289 Remove SigOpt (#41479)
* remove sigopt

* style
2025-10-09 18:05:55 +02:00
823fab4860 Fix bnb fsdp loading for pre-quantized checkpoint (#41415)
* fix

* fix

* get_param_name

* fix device name
2025-10-09 18:05:35 +02:00
42d4e13a0b RT-Detr correct 2d positional embeddings for non-square images (#41380)
* Correct 2d positional embeddings for non-square images

* Simplify bug fix propagate changes to other models

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-09 17:58:22 +02:00
0eae41ad36 Add Code World Model (CWM) (#41199)
* [wip][cwm] Code World Model stubs and setup in HF Transformers

* [wip] Get other things working

* [wip] Working

* Tokenizer pad

* fix: cwm window attn

* temp remove test

* temp remove test

* Fixes

* Temporarily add auto config remapping option until VLLM 0.11 is out

* Fix model type and add layer validation

* Lint, remove CwmForSequenceClassification

* Lint, tests

* Remove CwmForSequenceClassification

* Lint

* Remove intermediary layer expors/doc errorss, fix tests

* Lint

* run python utils/sort_auto_mappings.py --check_only

* Remove Cwm processor mapping, get check_repo passing

* Remove CwmTextConfig from test

* Add docstring for CwmConfig

* remove global_window and window_pattern params from config

* Fix docstrings

* Revert change to auto docstring util

* lint

* Fixes minus test improvements

* Alter tests to simply check logits

* lint

* Have slow tests use repo, make CwmPretrainedModel passthrough

* Remove decoder layer implementation, use Llama3Decoder + CwmAttetion

* Use linear w/o bias for CwmAttention, add token-level integration test

* Don't ignore config attention bias

* Remove attention bias parameter entirely from config

---------

Co-authored-by: galco <galco@meta.com>
2025-10-09 17:57:45 +02:00
589fc29c9d enhance patched_tearDown to support python 3.11+ (#41429)
* enhance to support python 3.11+

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-09 21:19:29 +05:30
26b5b52676 [Fix] Fix test file error (#40973)
Fix test file error
2025-10-09 15:30:53 +00:00
34b861abd1 🚨 [Attention Masks] Bidirectional masks for encoder and encoder-decoder models (#41265)
* new masks

* fixes

* adjust comments

* fix unnecessary mask creation on sdpa

* simplify masks more

* propogate to other models

* style + repo consistency

* copies

* no comment

* fix attempt

* finally fix grounding dinos

* fix distilbert

* fix executorch

* move to own module

* address first few comments WIP

* revert device comments, simplify executorch further

* fix typo

* add a test for cuda graphs

* move cleanup...

* fix conflict with new main

* fix esm and evolla
2025-10-09 16:56:11 +02:00
b44d91570f [v5] remove load_in_4bit and load_in_8bit (#41287)
* [v5] remove load_in_4bit and load_in_8bit

* fix

* reveert

* fix

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-09 16:34:04 +02:00
d99069195b Cleaning hub kernels (#41477)
* disable kernel mapping

* cleaning

* revert

* fix style
2025-10-09 16:32:18 +02:00
bf38b2d11d Change RT-Detr docs to reflect fixed 640x640 input size (#41364)
* Update rt_detr docs to mention 640x640 input size

The authors of RT-Detr mention that the model was trained on 640x640 images and was meant to be used for inference on 640x640 images.
Also, the current implementation has certain quirks that make training/inferring on images of different sizes problematic. For example,
the pixel masks used for batches of varying image sizes are discarded. I've added a few lines in the docs to notify the user about these issues.

* Batching not possible with variable image sizes

* Remove reference to batching

---------

Co-authored-by: Konstantinos Pitas <kostasp210@gmail.com>
2025-10-09 14:29:16 +00:00
72a3fc275c Remove infer_device (#41088)
* Remove infer_device

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix docs using accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 14:05:39 +00:00
9ef804472b Pickle - part 2 (#41476)
* pickle 2

* pickle 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-09 13:46:53 +00:00
2b5e4c0d13 Import Callable from collections.abc (#41130)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 12:12:43 +00:00
add4df62ba Fix tests fsdp (#41422)
* Fix tests

* fix !

* fix
2025-10-09 14:09:52 +02:00
3e87072666 Fix auto model configuration for encoder of perceptionlm (#41464)
* fix auto model configuration for encoder of perceptionlm

* delete perception_encoder auto registrations
2025-10-09 14:08:03 +02:00
f0544d7e7c Remove KERAS_NLP_IMPORT_ERROR (#41468)
Remove unused variables of error messages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-09 11:58:30 +00:00
d1c6310d6a 🚨 [v5] Rendundant code in nested configs (#41314)
* batch update models

* delete even more

* fix modular super init location

* fix

* fix copies

* fix again, these have force-set values in configs

* fix copies
2025-10-09 13:47:44 +02:00
927aa8bef2 [kernels] Cleanup deta kernel (#41470)
* cleanup deta kernel

* fix modeling
2025-10-09 13:17:42 +02:00
1951f3be8e Update GLM-4.1V MMRope implementation (#41182)
* update for 4D mask

* update

* Update modular_glm4v.py

* 1

* Revert "1"

This reverts commit d13a763e876fa049c5fb70a8b3447b335dbb6098.

* update as glm4v logtic

* update

* 1

* update

* Create convert_glm4v_moe_mgt_weights_to_hf.py

* update

* update
2025-10-09 12:15:47 +02:00
f50fd7fb6b [v5] rm utils/tf_ops/ (#41402)
rm utils/tf_ops/
2025-10-09 10:27:47 +01:00
be3fa93b29 Subconfig is a class attribute (#41308)
* delete

* fix this test

* fix copies

* oke, more tests to fix

* fix last tests on DPT

* deleted accidentally
2025-10-09 10:46:44 +02:00
8137dbdbbd 🚨 [v5] Rename left traces of past_key_value in BERT-like models (#41448)
rename everything
2025-10-09 10:44:44 +02:00
7aa888b7fa Fix doc (#41457)
* dummy

* remove
2025-10-08 20:13:21 +02:00
bfe2b623ef Fix generate outputs and simplify cache tests (#41440)
* start refactoring

* simplify

* tests

* tests

* fix

* zamba

* final fix

* fix
2025-10-08 19:04:18 +02:00
b9be8a8775 enable some falcon-mamba uts on xpu (#41428)
* enable some falcon-mamba uts on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 18:48:04 +02:00
bef73bf8d7 Update hqq.md (#41452)
mistake in loading model
2025-10-08 07:44:56 -07:00
89a4115a6b Validate processing kwargs with @strict from huggingface_hub (#40793)
* initial design draft

* delete

* fix a few tests

* fix

* fix the rest of tests

* common-kwargs

* why the runner complains about typing with "|"?

* revert

* forgot to delete

* update

* fix last issues

* add more detalis in docs

* pin the latest hub release

* fix tests for new models

* also fast image processor

* fix copies

* image processing ast validated

* fix more tests

* typo.and fix copies

* bump

* style

* fix some tests

* fix copies

* pin rc4 and mark all TypedDict as non-total

* delete typed dict adaptor

* address comments

* delete optionals
2025-10-08 16:14:09 +02:00
82ffeb28ad Add Top-H decoding (entropy-bounded truncation) as a LogitsWarper for text generation (#40837)
* init

* added TopH

* Update TopH logits_process.py

* Update logits_process.py

* Update test_logits_process.py

* Update test_logits_process.py

* added test No. 4

* Resolving __init__.py issues

* Resolving configuration_utils.py Issues

* Resolving logits_process.py Issues

* Resolving utils.py Issues

* Resolving test_logits_process.py Issues

* Resolving __init__.py issues

* Resolving logits_process.py Issues

* Resolving __init__.py issues

* Updated Docs

* Updated Docstring

* style: autoformat with make fixup

* Fixing Docstring

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Using torch.distributions.Categorical

* Improve torch_dtype checks (#40808)

* Improve torch_dtype checks

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Apply suggestions from code review

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Add VideoProcessors to auto-backend requirements (#40843)

* add it

* fix existing ones

* add perception to auto_mapping...

* Adds Causal Conv 1D kernel for mamba models (#40765)

* add kernel

* make style

* keep causal-conv1d

* small fix

* small fix

* fix modular converter

* modular fix + lazy loading

* revert changes modular

* nit

* hub kernels update

* update

* small nit

* Update no split modules in T5Gemma model (#40810)

* Update no split modules in T5Gemma model

* Update no_split_modules also for T5Gemma modular

* Remove model_split_percents from test cases

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Replace image classification loss functions to `self.loss_function` (#40764)

* Fix the misalignment between the l2norm in GDN of Qwen3-Next and the implementation in the FLA library. (#40842)

* align torch implementation of gdn with fla.

* fix fla import.

* fix

* remove unused attr

* fixes

* strictly align l2norm in Qwen3-Next with FLA implementation.

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Fixes for continuous batching (#40828)

* Fix for CB attn mask and refactor

* Tests for CB (not all passing)

* Passing tests and a logger fix

* Fixed the KV metrics that were broken when we moved to hybrid alloc

* Fix circular import and style

* Added tests for FA

* Unfolded test to have device expectations

* Fixes for H100

* more fixes for h100

* H100 are good

* Style

* Adding some comments from #40831

* Rename test

* Avoid 1 letter variables

* Dictonnary is only removed during kwargs

* Test for supported sample

* Fix a unvoluntary slice

* Fixes for non-sliced inputs and small example improvments

* Slice inputs is more understandabe

* Style

* [tests] re-enable aria fast tests (#40846)

* rise from the dead

* test

* [SAM2] Fix inconsistent results with original implementation with input boxes (#40800)

* Fix inconsistencies with box input inference with original repo

* remove print

* always pad

* fix modular

* [Sam2Video] Fix video inference with batched boxes and add test (#40797)

fix video inference with batched boxes and add test

* add: differential privacy research model (#40851)

* VaultGemma

* Removing Sequence and Token classification models. Removing integration tests for now

* Remove pass-only modular code. style fixes

* Update vaultgemma.md

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Update docs/source/en/model_doc/vaultgemma.md

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Add links to model doc

* Correct model doc usage examples

* Updating model doc to describe differences from Gemma 2

* Update model_doc links

* Adding integration tests

* style fixes

* repo consistency

* attribute exception

---------

Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* [test] Fix test_eager_matches_sdpa incorrectly skipped (#40852)

* ouput_attentions in typed kwargs

* correct typing in GenericForTokenClassification

* improve

* [tests] move generative tests away from `test_modeling_common.py` (#40854)

move tests

* [generate] Always use decoder config to init cache (#40772)

* mega derp

* fix

* always use the decoder

* Use checkpoint in auto_class_docstring (#40844)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix TrainingArguments.parallelism_config NameError with accelerate<1.10.1 (#40818)

Fix ParallelismConfig type for accelerate < 1.10.1

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Redirect MI355 CI results to dummy dataset (#40862)

* [Bug fix #40813] Fix base_model_tp_plan of Starcoder2 model. (#40814)

Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>

* [docstrings / type hints] Update outdated annotations for `past_key_values`  (#40803)

* some fixes

* nits

* indentation

* indentation

* a bunch of type hints

* bulk changes

* fix florence kwargs  (#40826)

* fix: XIELU act parameters not being casted to correct dtype (#40812)

* Update model tags and integration references in bug report (#40881)

* [Qwen3 Next] Use numerically stable `rsqrt` (#40848)

use numerically stable inverse

* Adding Support for Qwen3-VL Series (#40795)

* add qwen3vl series

* make fixup

* fix import

* re-protect import

* fix it finally (need to merge main into the branch)

* skip processor test (need the checkpoint)

* oups typo

* simplify modular

* remove unecesary attr

* fix layer

* remove unused rope_deltas args

* reuse image def

* remove unnesesary imports

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* [`VaultGemma`] Update expectations in integration tests (#40855)

* fix tests

* style

* Fix modular consistency (#40883)

* reapply modular

* add missing one

* 🔴 Move variable output controls to `_prepare_generation_config ` (#40715)

* move checks to validate steps where possible

* fix csm and other models that override _sample

* ops dia you again

* opsie

* joao review

* Move variable output controls to `prepare_inputs_for_generation`

* fix a bunch of models

* back to basics

* final touches

* Clarify passing is_causal in sdpa_attention_paged_forward (#40838)

* Correctly pass is_causal in sdpa_attention_paged_forward

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add comment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.expm1 and torch.log1p for better numerical results (#40860)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Fast PromptDepthAnything Processor (#40602)

* Test & import setup

* First version passing tests

* Ruff

* Dummy post processing

* Add numerical test

* Adjust

* Doc

* Ruff

* remove unused arg

* Refine interpolation method and push test script

* update bench

* Comments

* Update src/transformers/models/auto/image_processing_auto.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Remove benchmrk script

* Update docstrings

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Update src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything_fast.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* doc

* further process kwargs

* remove it

* remove

* Remove to dict

* remove crop middle

* Remove param specific handling

* Update testing logic

* remove ensure multiple of as kwargs

* fix formatting

* Remove none default and get image size

* Move stuff to _preprocess_image_like_inputs and refacto

* Clean

* ruff

* End of file & comments

* ruff again

* Padding fixed

* Remove comments to pass tests

* Remove prompt depth from kwargs

* Adjust output_size logic

* Docstring for preprocess

* auto_docstring for preprocess

* pass as an arg

* update test batched

* stack images

* remove prompt scale to meter

* return tensors back in preprocess

* remove copying of images

* Update behavior to match old processoer

* Fix batch size of tests

* fix test and fast

* Fix slow processor

* Put tests back to pytorch

* remove check and modify batched tests

* test do_pad + slow processor fix

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* Fix deta loading & dataclass (#40878)

* fix

* fix 2

* Remove dict branch of attention_mask in sdpa_attention_paged_forward (#40882)

Remove dict branch of attention_mask

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* 🌐 [i18n-KO] Translated smolvlm.md to Korean (#40414)

* fix: manual edits

* Apply suggestions from code review

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/model_doc/smolvlm.md

* Update docs/source/ko/_toctree.yml

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* 🌐 [i18n-KO] Translated `imageprocessor.md` to Korean (#39557)

* feat: manual translation

* docs: fix ko/_toctree.yml

* Apply suggestions from code review

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>

* Update docs/source/ko/image_processors.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* [generate] remove docs of a feature that no longer exists (#40895)

* Make debugging failing tests (check and update expect output values) easier 🔥  (#40727)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fixing the call to kernelize (#40628)

* fix

* style

* overload train and eval

* add getter and setter

* Fix getter  regression (#40824)

* test things

* style

* move tests to a sane place

* Fix flaky `Gemma3nAudioFeatureExtractionTest::test_dither` (#40902)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [cache] Merge static sliding and static chunked layer (#40893)

* merge

* get rid of tensors in get_mask_sizes!!

* remove branch

* add comment explanation

* re-add the class with deprecation cycle

* Harmonize CacheLayer names (#40892)

* unify naming

* style

* doc as well

* post rebase fix

* style

* style

* revert

* [cache] Only use scalars in `get_mask_sizes` (#40907)

* remove tensor ops

* style

* style

* Set seed for `Glm4vIntegrationTest` (#40905)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add Olmo3 model (#40778)

* transformers add-new-model-like for Olmo3

* Implement modular Olmo3

* Update Olmo3 tests

* Copy Olmo2 weight converter to Olmo3

* Implement Olmo3 weight converter

* Fix code quality errors

* Remove unused import

* Address rope-related PR comments

* Update Olmo3 model doc with minimal details

* Fix Olmo3 rope test failure

* Fix 7B integration test

* remove dummy EncodingFast (#40864)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve module name handling for local custom code (#40809)

* Improve module name handling for local custom code

* Use `%lazy` in logging messages

* Revert "Use `%lazy` in logging messages"

This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.

* Add notes for sanitization rule in docstring

* Remove too many underscores

* Update src/transformers/dynamic_module_utils.py

* Update src/transformers/dynamic_module_utils.py

---------

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Remove `runner_map` (#40880)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* disable `test_fast_is_faster_than_slow` (#40909)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [gemma3] `Gemma3ForConditionalGeneration` compatible with assisted generation (#40791)

* gemma3vision compatible with assisted generation

* docstring

* BC

* docstring

* failing checks

* make fixup

* apply changes to modular

* misc fixes

* is_initialized

* fix poor rebase

* [generate] misc fixes (#40906)

misc fixes

* 🔴Make `center_crop` fast equivalent to slow (#40856)

make center_crop fast equivalent to slow

* Fix dtype in Paligemma (#40912)

* fix dtypes

* fix copies

* delete unused attr

* [Docs] Adding documentation of MXFP4 Quantization (#40885)

* adding mxfp4 quantization docs

* review suggestions

* Apply suggestions from code review

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Processor load with multi-processing (#40786)

push

* [Llama4] Remove `image_sizes` arg and deprecate `vision_feature_layer` (#40832)

* Remove unused arg

* deprecate

* revrt one change

* get set go

* version correction

* fix

* make style

* comment

* Fix #40067: Add dedicated UMT5 support to GGUF loader (config, tokenizer, test) (#40218)

* Fix #40067 : add UMT5 support in GGUF loader (config, tokenizer, test)

* chore: fix code formatting and linting issues

* refactor: move UMT5 GGUF test to quantization directory and clean up comments

* chore: trigger CI pipeline

* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.

* Add regression check to UMT5 encoder GGUF test

Verify encoder output against reference tensor values with appropriate tolerances for stability.

* Update tests/quantization/ggml/test_ggml.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/ggml/test_ggml.py

remove comments

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* [torchao safetensors] renaming get_state_dict function (#40774)

renaming get_state_dict function

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Adding activation kernels (#40890)

* first commit

* add mode

* revert modeling

* add compile

* rm print

* Minor fix for #40727 (#40929)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Add support for Florence-2 training (#40914)

* Support training florence2

* update doc and testing model to florence-community

* fix florence-2 test, use head dim 16 instead of 8 for fa2

* skip test_sdpa_can_dispatch_on_flash

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add LongCat-Flash (#40730)

* working draft for LongCat

* BC changes to deepseek_v3 for modular

* format

* various modularities

* better tp plan

* better init

* minor changes

* make modular better

* clean up patterns

* Revert a couple of modular commits, because we won't convert in the end

* make things explicit.

* draft test

* toctree, tests and imports

* drop

* woops

* make better things

* update test

* update

* fixes

* style and CI

* convert stuff

* up

* ah, yes, that

* enable gen tests

* fix cache shape in test (sum of 2 things)

* fix tests

* comments

* re-Identitise

* minimize changes

* better defaults

* modular betterment

* fix configuration, add documentation

* fix init

* add integration tests

* add info

* simplify

* update slow tests

* fix

* style

* some additional long tests

* cpu-only long test

* fix last tests?

* urg

* cleaner tests why not

* fix

* improve slow tests, no skip

* style

* don't upcast

* one skip

* finally fix parallelism

* [DOC] Add missing dates in model cards (#40922)

add missing dates

* [models] remove unused `import torch.utils.checkpoint`  (#40934)

* Intel CPU dockerfile (#40806)

* upload intel cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update cpu dockerfile

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* update label name

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>

* docs(i18n): Correct the descriptive text in the README_zh-hans.md (#40941)

* Fix trainer tests (#40823)

* fix liger

* fix

* more

* fix

* fix hp

* fix

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* Fix `Glm4vMoeIntegrationTest` (#40930)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Raise error instead of warning when using meta device in from_pretrained (#40942)

* raise instead of warning

* add timm

* remove

* Consistent naming for images kwargs (#40834)

* use consistent naming for padding

* no validation on pad size

* add warnings

* fix

* fox copies

* another fix

* fix some tests

* fix more tests

* fix lasts tests

* fix copies

* better docstring

* delete print

* Remove nested import logic for torchvision (#40940)

* remove nested import logic for torchvision

* remove unnecessary protected imports

* remove unnecessarry protected import in modular (and modeling)

* fix wrongly remove protected imports

* Fix `Glm4vModelTest::test_eager_matches_fa2_generate` (#40947)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update expected values for some `test_speculative_generation` (#40949)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Standardize audio embedding function name for audio multimodal models (#40919)

* Standardize audio embedding function name for audio multimodal models

* PR review

* Add FlexOlmo model (#40921)

* transformers add-new-model-like

* Add FlexOlmo implementation

* Update FlexOlmo docs

* Set default tokenization for flex olmo

* Update FlexOlmo tests

* Update attention comment

* Remove unneeded use of `sliding_window`

* Don't list dropout in eager_paged_attention_forward (#40924)

Remove dropout argument

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update expected values for one more `test_speculative_generation` after #40949 (#40967)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)

* fix(trainer): ensure final checkpoint is saved when resuming training

* add test

* make style && slight fix of test

* make style again

* move test code to test_trainer

* remove outdated test file

* Apply style fixes

---------

Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Add new model LFM2-VL (#40624)

* Add LFM2-VL support

* add tests

* linting, formatting, misc review changes

* add siglip2 to auto config and instantiate it in lfm2-vl configuration

* decouple image processor from processor

* remove torch import from configuration

* replace | with Optional

* remove layer truncation from modeling file

* fix copies

* update everything

* fix test case to use tiny model

* update the test cases

* fix finally the image processor and add slow tests

* fixup

* typo in docs

* fix tests

* the doc name uses underscore

* address comments from Yoni

* delete tests and unsuffling

* relative import

* do we really handle imports better now?

* fix test

* slow tests

* found a bug in ordering + slow tests

* fix copies

* dont run compile test

---------

Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>

* Fix outdated version checks of accelerator (#40969)

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix outdated version checks of accelerator

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)

use skip_predictor in vjepa2 `get_vision_features`

* [Trainer] Fix DP loss (#40799)

* fix

* style

* Fix fp16

* style

---------

Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>

* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)

* fix(timm): Add exception handling for unknown Gemma3n model

* nit: Let’s cater to this specific issue

* nit: Simplify error handling

* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)

* fix merge conflicts

* change token typing

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>

* [tests] Really use small models in all fast tests (#40945)

* start

* xcodec

* chameleon

* start

* layoutlm2

* layoutlm

* remove skip

* oups

* timm_wrapper

* add default

* doc

* consistency

* Add captured actual outputs to CI artifacts (#40965)

* fix

* fix

* Remove `# TODO: ???` as it make me `???`

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Revert change in `compile_friendly_resize` (#40645)

fix

* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Using torch.distributions.Categorical

* Remove `set_model_tester_for_less_flaky_tests` (#40982)

remove

* Benchmarking v2 GH workflows (#40716)

* WIP benchmark v2 workflow

* Container was missing

* Change to sandbox branch name

* Wrong place for image name

* Variable declarations

* Remove references to file logging

* Remove unnecessary step

* Fix deps install

* Syntax

* Add workdir

* Add upload feature

* typo

* No need for hf_transfer

* Pass in runner

* Runner config

* Runner config

* Runner config

* Runner config

* Runner config

* mi325 caller

* Name workflow runs properly

* Copy-paste error

* Add final repo IDs and schedule

* Review comments

* Remove wf params

* Remove parametrization from worfkflow files

* Fix callers

* Change push trigger to pull_request + label

* Add back schedule event

* Push to the same dataset

* Simplify parameter description

* 🔴[`Attention`] Bert-based Models Attention Refactor (#38301)

* clean start to bert refactor

* some test fixes

* style

* fix last tests

* be strict on positional embeddings, fixup according tests

* cache support

* more cache fixes, new causal API

* simplify masks, fix tests for gen

* flex attn, static cache support, round of fixes

* ?

* this time

* style

* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)

* roberta

* fixup sdpa remains

* attention split, simplify args and kwargs, better typing

* fix encoder decoder

* fix test

* modular roberta

* albert

* data2vectext, making it modular tomorrow

* modular data2vec text

* tmp disable

* xmod + cache position fixes

* whoops

* electra + markuplm, small fixes

* remove wrong copy

* xlm_roberta + some embedding fixes

* roberta prelayernorm

* RemBert: remove copy, maybe doing it later

* ernie

* fix roberta offloading

* camembert

* copy fixes

* bert generation + fixes on eager

* xlm roberta xl

* bridgetower (text) + seamlessv2 copy fixes

* rocbert + small fixes

* whoops

* small round of fixups

* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)

* the end of the tunnel?

* fixup nllbmoe + style

* we dont need this anymore

* megatron bert is barely used, low prio skip for now

* Modernize bert (template for others)

NOTE: trying to push this through, might be overdue if not in time possible

* check inputs for all others (if checkmarked)

* fix bridgetower

* style

* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)

* proper fix for bert to force intermediate dict outputs

* propagate to others

* style

* xlm roberta xl investigation, its the layernorm...

* mobile bert

* revert this, might cause issues with composed models

* review

* style

* Remove [[autodoc]] refs to TF/Flax objects (#40996)

* remove refs

* more

* ENH: Enable readline support for transformers chat (#40911)

ENH Enable readline support for chat

This small change enables GNU readline support for the transformers chat
command. This includes, among others:

- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
  ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n  ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l

Implementation

Although it may look strange, just importing readline is enough to
enable it in Python, see:

https://docs.python.org/3/library/functions.html#input

As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.

Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).

* [testing] test `num_hidden_layers` being small in model tester (#40992)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* blt wip (#38579)

* blt wip

* cpu version

* cpu friendly with full entropy model (real time patching)

* adding config file instead of args file

* enable MPS

* refactoring unused code

* single config class in config file

* inherit from PreTrainedModel

* refactor LMTransformer --> BLTPatcher

* add conversion script

* load from new checkpoing with form_pretrained

* fixed demo from_pretrained

* clean up

* clean a few comments

* cleanup folder

* clean up dir

* cleaned up modeling further

* rename classes

* adding transformers Attention class and RotaryEmbedding class

* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc

* seperate out patcher config, update modeling and conversion script

* rename vars to be more transformers-like

* rm unused functions

* adding cross attention from transformers

* pass arg

* rename weights

* updated conversion script

* overwritten commit! fixing PR

* apply feedback

* adding BLTRMSNorm like Llama

* add repeat_kv and eager_attention_forward copied from

* BLTMLP identical to MllamTextMLP

* clean up some args'

* more like mllama, but busier inits

* BLTTransformerLayer config

* decoder, encoder, global configs

* wip working on modular file

* cleaning up patch and configs

* clean up patcher helpers

* clean up patcher helpers further

* clean up

* some config renaming

* clean up unused configs

* clean up configs

* clean up configs

* update modular

* clean

* update demo

* config more like mllama, seperated subconfigs from subdicts

* read from config instead of self args

* update demo file

* model weights to causal lm weights

* missed file

* added tied weights keys

* BLTForCausalLM

* adding files after add-new-model-like

* update demo

* working on tests

* first running integration tests

* added integration tests

* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff

* tokenizer clean up

* modular file

* fixing rebase

* ruff

* adding correct basemodel output and updating config with checkpoint vals (for testing)

* BLTModelTests git status

* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic

* fix sdpa == causal tests

* fix small model test and some gradient checkpointing

* skip training GC tests

* fix test

* updated modular

* update modular

* ruff

* adding modular + modeling

* modular

* more modern is_casual check

* cleaning up modular

* more modular reduction

* ruff

* modular fix

* fix styling

* return 2

* return 2

* fix some tests

* fix bltcrossattention after modular break

* some fixes / feedback

* try cache generate fix

* try cache generate fix

* fix generate tests

* attn_impl workaround

* refactoring to use recent TransformersKwargs changes

* fix hidden_states shape test

* refactor to new outputs

* simplify outputs a bit

* rm unneeded decoderlayer overwriting

* rename blt

* forgot tokenizer test renamed

* Reorder

* Reorder

* working on modular

* updates from modular

* new modular

* ruff and such

* update pretrainedmodel modular

* using cohere2 apply_rotary_pos_emb

* small changes

* apply feedback r2

* fix cross_attention

* apply more feedback

* update modeling fix

* load submodules from pretrainedmodel

* set initializer_range to subconfigs

* rm cross_attnetion_states pass when not needed

* add 7b projection layer support

* check repo

* make copies

* lost cohere2 rotate_half

* ruff

* copies?

* don't tie weights for submodules

* tie weights setting

* check docstrings

* apply feedback

* rebase

* rebased modeling

* update docs

* applying feedback

* few more fixes

* fix can_record_outputs

* fast tokenizer

* no more modulelist

* tok auto

* rm tokenizersss

* fix docs

* ruff

* fix after rebase

* fix test, configs are not subscriptable

---------

Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>

* [docs] rm stray tf/flax autodocs references (#40999)

rm tf references

* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)

* fix

* fixup inits

* oops

* fixup gemma

* fixup modular order

* how does this keep happen lol

* vaultgemma is new i forgot

* remove init check

* Make `EfficientLoFTRModelTest` faster (#41000)

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typoes in src and tests (#40845)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)

* Fix model cards and modalities in toctree

* fix new models

* RUFF fix on CI scripts (#40805)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix dict like init for ModelOutput (#41002)

* fix dict like init

* style

* 🚨 [v5] remove generate output retrocompatibility aliases (#40998)

remove old type aliases

* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)

* update test (and overwrites)

* better test comment

* 0 as a default for

* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* 🚨 [v5] remove deprecated entry point (#40997)

* remove old entry point

* update references to transformers-cli

* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)

* fix: bug that made early stop change order of matches

* fix: applied code suggestion

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* fix: applied code suggestion to modular

* fix: integration tests

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Fix `PhimoeIntegrationTest` (#41007)

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix Glm4v test (#41011)

fix

* Update after #41007 (#41014)

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix benchmark runner argument name (#41012)

* Adding support for Qwen3Omni (#41025)

* Add Qwen3Omni

* make fix-copies, import properly

* nit

* fix wrong setup. Why was audio_token_id renamed ?

* upds

* more processing fixes

* yup

* fix more generation tests

* down to 1?

* fix import issue

* style, update check repo

* up

* fix quality at my best

* final quality?

* fix doc building

* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE

* SKIP THE TEMPLATE ONE

---------

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>

* Making compute_loss_func always take priority in Trainer (#40632)

* logger warn, if-else logic improved

* redundant if condition fix

* Modify Qwen3Omni parameter name since VL changed it (#41045)

Modify parameter name since VL changed it

Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>

* Fix Qwen video tests (#41049)

fix test

* [testing] Fix `qwen2_audio` (#41018)

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix typing of tuples (#41028)

* Fix tuple typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove optax (#41030)

Remove optax dep

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in English/Chinese documentation (#41031)

* Fix typos and formatting in English docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos and formatting in Chinese docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use torch.autocast (#40975)

* Use torch.autocast

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* docs: improved RoPE function Docstrings (#41004)

* docs: improved RoPE functuon docstrings

* Update src/transformers/modeling_rope_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix condition for emitting warning when generation exceeds max model length (#40775)

correct warning when generation exceeds max model length

Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>

* Fix outdated torch version check (#40925)

Update torch minimum version check to 2.2

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove doc of tf and flax (#41029)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)

* Add whole word masking

* Vectorize whole word masking functions

* Unit test whole word masking

* Remove support for TF in whole word masking

* [testing] Fix `seed_oss` (#41052)

* fix

* fix

* fix

* fix

* fix

* fix

* Update tests/models/seed_oss/test_modeling_seed_oss.py

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Remove repeated import (#40937)

* Remove repeated import

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix conflict

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify unnecessary Optional typing (#40839)

Remove Optional

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add write token for uploading benchmark results to the Hub (#41047)

* Separate write token for Hub upload

* Address review comments

* Address review comments

* Ci utils (#40978)

* Add CI reports dir to gitignore

* Add utils to run local CI

* Review compliance

* Style

* License

* Remove <frameworkcontent> and <pt> tags from documentation (#41055)

* Remove <frameworkcontent> and <pt> tags

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Revert changes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update docs/source/en/model_doc/madlad-400.md

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Fix CI jobs being all red 🔴 (false positive) (#41059)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Update quantization CI (#41068)

* fix

* new everything

* fix

* [i18n-bn] Add Bengali language README file (#40935)

* [i18n-bn] Add Bengali language README file and update links in existing language files

* Update Bengali README for clarity and consistency in model descriptions

* Improve documentation and errors in Mamba2-based models (#41063)

* fix bug in Mamba2 docs

* correct 'because on of' issue

* link to other Mamba2 model types

* github URL is not changed

* update error message in generated files

* Update team member list for some CI workflows (#41094)

* update list

* update list

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* fix crash when using chat to send 2+ request to gptoss (#40536)

Signed-off-by: Wang, Yi <yi.a.wang@intel.com>

* Minor addition, no split modules for VideoMAEE (#41051)

* added no split modules

* fixed typo

---------

Co-authored-by: Raushan Turganbay <raushan@huggingface.co>

* Switch to `python:3.10-slim` for CircleCI docker images (#41067)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix argument name in benchmarking script (#41086)

* Fix argument name in benchmarking script

* Adjust vars

* Remove mention of TensorFlow/Flax/JAX from English documentation (#41058)

Remove mention of TensorFlow from English documentation

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typos in documentation (#41087)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing (#40788)

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix optional typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix schema typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

* Fix typing

* Fix typing

* Fix typing

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Improve typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix quote string of np.ndarray

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

* Format

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove unused arguments (#40916)

* Fix unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove tf and flax from Chinese documentation (#41057)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* fix wrong height and width when read video use torchvision (#41091)

* docs: Fix Tool Use links and remove dead RAG links (#41104)

docs: Fix tool use links. Remove dead RAG links. Fix style

* 🚨 [generate] update paligemma mask updates (and other assisted generation-related fixes) (#40917)

* tmp

* fix modular inheritance

* nit

* paligemma 1 doesn't have swa

* use same pattern as in models with hybrid layers

* PR comments

* helium also needs layer_typed (bc it relies on gemma)

* paligemma/gemma3: same mask creation fn in fwd and generate

* propagate changes to helium (gemma-based)

* tmp commit

* slow paligemma tests passing, let's see what breaks

* fix test_left_padding_compatibility

* tmp commit

* tmp commit

* rebase error

* docs

* reduce diff

* like this?

* t5gemma

* better comment

* shorter diff

* exception

* ffs type

* optional

* shorter modular_gemma.py

* helium model actually needs no changes -- the tester is the issue

* t5gemma modular config

* a few more modular; paligemma BC

* fix processor issues?

* rm config exception

* lift warning in gemma

* [tests] gpt2 + `CausalLMModelTester` (#41003)

* tmp commit

* tmp commit

* tmp commit

* rm old GPT2ModelTester

* nit bug

* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns

* vision_encoder_decoder

* Fix `_get_test_info` for inherited tests (#41106)

* fix _get_test_info

* fix patched

* add comment

* ruff

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Remove bad test skips (#41109)

* remove bad skips

* remove more

* fix inits

* Format empty lines and white space in markdown files. (#41100)

* Remove additional white space and empty lines from markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Add empty lines around code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)

Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>

* 🚨 [V5] Remove deprecated training arguments  (#41017)

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove deprecated training arguments from V5

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix comments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Support loading LFM2 GGUF (#41111)

* add gguf config mapping for lfm2

* add lfm2 tensor process to unsqueeze conv weights

* adjust values from gguf config to HF config

* add test for lfm2 gguf

* ruff

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [torchao safetensors] integrate torchao safetensors support with transformers  (#40735)

* enable torchao safetensors

* enable torchao safetensors support

* add more version checking

* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)

* fix mismatched dims for qwen3 next

* propagate changes

* chore: renamed tot_heads to total_sequence_length

* Apply suggestion from @vasqu

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* minor fix to modular qwen3 next file

---------

Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>

* Fix the error where a keyword argument appearing before *args (#41099)

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix broken `` expressions in markdown files (#41113)

Fix broken expressions in markdown files

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove self-assignment (#41062)

* Remove self-assignment

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Update src/transformers/integrations/flash_paged.py

Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Clear pass

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>

* 🚨Refactor: Update text2text generation pipelines to use max_new_tokens… (#40928)

* Refactor: Update text2text generation pipelines to use max_new_tokens and resolve max_length warning

* docs(text2text_generation): 更新参数注释以反映现代生成实践

将max_length参数注释更新为max_new_tokens,以符合现代生成实践中指定生成新token数量的标准做法

* refactor(text2text_generation): Remove outdated input validation logic

* docs(text2text_generation): Revert incorrectly modified comment

* docs(text2text_generation): Revert incorrectly modified comment

* Fixed MXFP4 model storage issue (#41118)

* Fixed loading LongT5 from legacy checkpoints (#40724)

* Fixed loading LongT5 from legacy checkpoints

* Adapted the fix to work with missing lm_head

* dummy commit (#41133)

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

* dummy commit, nothing interesting

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix loading logic flaw with regards to unexpected and missing keys (#40850)

* Unexpected keys should be ignored at load with device map

* remove them all

* fix logic flaw

* fix

* simplify

* style

* fix

* revert caching allocator change

* add other test

* add nice doc

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* Using torch.distributions.Categorical

* Resolving logits_process.py Issues

* style: autoformat with make fixup

* Update logits_process.py removed defaults

* Variable H name -> cumulative_entropy

* Resolving format error

* Correction of the loop variables in logit processor

* Vectorized the loop in logits_process

* formatted  logits_process

* paper reference and stopping rule comment logits_process

* Trigger CI rerun

* Update logits_process.py

* added test_TopH_example_integration

* added test_TopH_example_integration

* Update README.md

* Restore CI config to match main (remove accidental changes)

* Restore CI config to match upstream main (no diffs)

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: ArminAzizi98 <147081650+ArminAzizi98@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Yuchao Zhang <418121364@qq.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Bo Zheng <368586905@qq.com>
Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Amer <amersinha@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: NanoCode012 <nano@axolotl.ai>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: 艾力可 <178652170+thalahors@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Manuel de Prada Corral <6536835+manueldeprada@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: HyunZ118 <156191095+HyunZ118@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: YONGSANG <71686691+4N3MONE@users.noreply.github.com>
Co-authored-by: Yijun Lee <119404328+yijun-lee@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
Co-authored-by: Shane A <shanea@allenai.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: vb <vaibhavs10@gmail.com>
Co-authored-by: Yaswanth Gali <82788246+yaswanth19@users.noreply.github.com>
Co-authored-by: Akshay Babbar <priv.akshay@outlook.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: Duc-Viet Hoang <vietyb00@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: lilin-1 <256404019@qq.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Jack <32371937+jackzhxng@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: souvikku <107592858+souvikku@users.noreply.github.com>
2025-10-08 13:37:51 +00:00
e064dc05c2 [testing] Fix JetMoeIntegrationTest (#41377)
* fix

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-08 13:11:53 +00:00
20282f13fa [JetMoe] Fix KV head repetition and padding free (#41423)
fix jetmoe
2025-10-08 14:27:22 +02:00
c528f50663 Remove Python 3.9 classifier (#41410)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:20:36 +00:00
8dfc8e8cfc 🤦 CB nit! (#41413)
* 🤦

* updates

* update cb simple

* merge

* up

* update

* fix

* up

* nit

* rumble this is annoying

* update

* update

* up

* fix

* ....

* cleanup a bit

* nit

* typo

* typing and typo

* nit

* updates

* up

* final fix!

* update

* fix more import issues

* nuke is paged

* up
2025-10-08 13:36:27 +02:00
2166e26cb1 [torchao] Add regex support for ModuleFqnToConfig (#41242)
* Add regex support for ModuleFqnToConfig

Summary:
Similar to https://github.com/pytorch/ao/pull/3084 we added regex support
in transformers so people can use regex to quantize the models.

See https://github.com/pytorch/ao/pull/3084 for docs and precedence of different
configurations

Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev

Test Plan:
pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex

Reviewers:

Subscribers:

Tasks:

Tags:

* Apply style fixes

* add assert for

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-08 11:05:15 +00:00
b13ee63b5a enable new model uts to xpu and fix some failures on xpu (#41386)
* enable new model uts to xpu and fix some failures on xpu

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add more

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_internvl.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* Update test_modeling_llava_next_video.py

* Update test_modeling_qwen3.py

* Update test_modeling_whisper.py

* Update test_modeling_whisper.py

* Update test_modeling_llava.py

* Update test_modeling_llava.py

* Update test_modeling_qwen2_5_omni.py

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-08 10:14:50 +00:00
1c5ac899e8 Use accelerator API to free device memory (#41195)
* Use accelerator API to free device memory

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Use clear_device_cache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Cleanup

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-08 12:11:18 +02:00
957b1f3696 Fixing comments in __init__ file (#41414)
nit
2025-10-08 12:07:26 +02:00
13791d8f48 [v5] Bump min version of bitsandbytes to 0.46.1 (#41283)
* bump bitsandbytes to 0.46.1

* huge cleanup

* style

* fix

* req

* fix

* importerror

* fix
2025-10-08 12:04:26 +02:00
7e475552be 🚨 [v5] Prune prune_heads (#41417)
* remove _prune_heads

* remove prune_heads

* finalize the purge

* remove another patterns
2025-10-08 10:25:13 +01:00
46db0edf3b 🚨🚨 Remove all traces of legacy cache format (#41378)
* remove

* more

* add back

* tests

* revert classes

* tests

* add exceptions

* reapply modular

* rename

* oupsi

* start with whisper

* fix tests

* fix

* fix

* fix

* typing
2025-10-08 11:14:44 +02:00
ee5488440b Tiny Cleanup - Removed duplicate class field definition's (#41293)
* Removed duplicate-class-field-definition
's using RUFF PIE794

* Removed duplicate-class-field-definition
's using RUFF PIE794

* Ruff format.

* Removed duplicate-class-field-definition

* Added New ruff rule to detect duplicate class field defs

* remove comment

* order

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-08 10:49:34 +02:00
34dcd73b57 v5 dev version (#41436) 2025-10-08 10:45:33 +02:00
3553f0bc23 Fix overriding common_kwargs defaults in processor calls (#41381)
* set common_kwargs defaults before updating with kwargs

* change order to override defaults common_kwargs
2025-10-07 23:13:56 -04:00
242eb9cbdc Remove deprecation warning (#41425)
* remove

* fix space
2025-10-07 19:21:14 +02:00
50090c3fc8 [v5] Delete left traces of feature extractor (#41321)
delete the left traces
2025-10-07 18:24:08 +02:00
ccbaa1670a Fix incorrect assignment in update_device_map for GPTQ quantizer (#41328)
Fix incorrect assignment in update_device_map for GPTQ quantizer

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-07 17:28:55 +02:00
c562c5d801 [v5] Bump accelerate to 1.1.0 (#41234)
* bump to 1.1.0 !

* bump accelerate

* fix

* None

* fixed !

* style
2025-10-07 17:18:32 +02:00
88e946e062 Fix early CUDA initialisation (#41409)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-07 14:37:17 +01:00
93464a0279 Prefer raising TypeError exception for invalid type (#41346)
* Fixed raising of TypeError exception for invalid type

* Fixed failing tests.
2025-10-07 13:11:42 +00:00
0c9a72e457 [Model] Lfm2Moe (#41401)
* [new-models] LFM2-MoE

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] add in template lfm2_moe doc files

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration] update configuration class

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm] minor: fix rotary_emb typo

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling] modular/modeling files for Lfm2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] fix Lfm2Moe modular/modeling

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] update configuration keys with latest config changes

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make fixup

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] address comments: dtype, mlp, buffers

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [configuration][lfm2_moe] add initializer_range

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modular][lfm2_moe] include init_weights to pass test_initialization

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests][causal_lm] include pos_emb as possible rope attribute

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] remove load_balancing_loss_func due to lack of support for hooking expert biases

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] make style

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [modeling][lfm2_moe] MoE refactor PR update in LFM2Moe

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2_moe: unit tests

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] update LFM2-8B-A1B repo id

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [tests] lfm2: update ModelTests for lfm2

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* Update LFM2 documentation

Updated the LFM2 documentation to reflect the addition of a new model size and clarified architectural details.

* Add Lfm2Moe documentation

Add Lfm2Moe model documentation with overview and example usage.

* [misc] fix ci

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [docs] remove trust_remote_code

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [misc] ci: fix modular

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* reapply modular

* simplify

* remove static address and inplace op

* simplify

* simplify a bit more the modular

* imports

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-07 15:09:58 +02:00
b4428d545f Fix test for model with dotted name and relative imports (#41343) 2025-10-07 13:55:54 +01:00
0464d9eb37 [Cache] lfm2 cache: allocate empty kv layers during init (#41396)
* [Cache] lfm2 cache: allocate empty kv layers during init

Signed-off-by: Paul Pak <paulpak58@gmail.com>

* [Cache] lfm2_cache: update modular file

Signed-off-by: Paul Pak <paulpak58@gmail.com>

---------

Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-10-07 14:01:31 +02:00
da7b8ce11f [kernels] Kernel Config (#41232)
* first config

* add kernel_config

* add import logic

* fixing style

* compare class name

* add comments

* rm import

* adding kernel md files

* add to toctree

* adding to main_classes

* simplify required config

* add to doc

* style

* store the mapping

* remove nested func

* add hub mixin

* fix

* imports

* fix
2025-10-07 13:58:20 +02:00
4763b8c5b8 Correct numerical regression in vision embeddings (#41374)
created modeling file
2025-10-07 13:43:24 +02:00
caa14e7dab fix resample in asr pipeline (#41298) 2025-10-06 17:31:10 +00:00
73f8c4b8ad fix asr ut failures (#41332)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-06 17:12:19 +00:00
57e82745f9 [v5] Sync Bert and Bart eager attention (#41248)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops

* sync attn impl

* fix style/copies

* fix distilbert

* remove dim check
2025-10-06 18:49:01 +02:00
505387c05b Update from pretrained error when loading (#33380)
* init commit

* style

* take comments into account

* mrege with main and simplify

* nits

* final

* small fixes

* fix

* super small update!

* add another test

* up up

* update

* fixes

* sort them by default
2025-10-06 16:10:19 +00:00
e00f46f16e serve: add non-streaming mode to /v1/responses; stream event parity; remove placeholder logprobs (#41353) 2025-10-06 16:04:17 +00:00
0395ed52ae [CB] Refactors the way we access paged (#41370)
* up

* refactor the way we handle paged attention

* affect serve as well

* update

* fix

* cup
2025-10-06 17:55:31 +02:00
39b0c9491b Remove unused function patameters (#41358)
Remove unused arguments

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 15:38:17 +00:00
11e4b5e5ee make some ut cases pass on xpu w/ latest torch (#41337)
* make some ut cases pass on xpu w/ latest torch

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* Update test_modeling_llava_onevision.py

* Apply style fixes

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-06 15:38:00 +00:00
fa36c973fc Remove unnecessary list comprehension (#41305)
Remove unnecessary comprehension

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-06 14:49:02 +00:00
7a1aeec36e Fixes in check_model_inputs, GPTBigCodeModel and ImageGPTModel (#40811)
* misc fixes

* fix

* Update src/transformers/models/imagegpt/modeling_imagegpt.py

* Apply suggestion from @IlyasMoutawwakil

* pickup use_cache from args input as well

* fix
2025-10-06 16:34:24 +02:00
297a41a6cf Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939 (#41284)
* Use canonical get_size_with_aspect_ratio (with max_size) from transformers.image_transforms to fix #37939

* Fix import sorting/style

* Fix import order

* Refactor: use canonical get_size_with_aspect_ratio across image processors (except YOLOS)

This commit updates image processing utilities in multiple model processors to use the shared
transformers.image_transforms.get_size_with_aspect_ratio for consistent resizing logic and
aspect ratio handling.

YOLOS processors are intentionally left unchanged in this commit to preserve their current
behavior and avoid breaking model-specific padding/resizing assumptions. YOLOS will be updated
in a dedicated follow-up PR once compatibility is fully verified.

* ruff fixes

* Fix check_copies.py references for get_size_with_aspect_ratio to use canonical transformers.image_transforms version

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
2025-10-06 10:15:56 -04:00
ae60c77689 Fix flash_attention.py: wrong argument passing for attn_implementation (#41347)
* Fix flash_attention.py: wrong argument passing for attn_implementation

The name of the attn type argument for `_flash_attention_forward()` should be `implementation`, instead of `attn_implementation` which currently uses in the function call. This would result in wrong type specification.

* modify the kwargs inside _flash_attention_forward

* fix the doc

* fix typo

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-06 15:36:40 +02:00
6bf6e36d3b [testing] update test_longcat_generation_cpu (#41368)
* fix

* Update tests/models/longcat_flash/test_modeling_longcat_flash.py

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2025-10-06 13:21:29 +00:00
4903cd4087 🚨 Remove BetterTransformer (#41367)
remove
2025-10-06 15:18:12 +02:00
a5700c497e Better typehints for apply_chat_template (#41355) 2025-10-06 13:14:03 +00:00
089d573aca Fix typo in model proposal template (#41352) 2025-10-06 13:06:50 +00:00
c27b67f0cd 🚨 [v5] Remove relative position embeddings (for bert like models) (#41170)
* remove from modeling files

* remaining changes

* style / copies

* revert deprecated models and fixup some models

* oops
2025-10-06 14:21:41 +02:00
a89bdcf5f1 Fixing a typo for BLT model (#41325) 2025-10-06 12:16:45 +00:00
0452f28544 [ModularChecker] QOL for the modular checker (#41361)
* update

* fancy table fancy prints

* download to cache folder, never need it everagain

* stule

* update based on review
2025-10-06 12:52:10 +02:00
9db58abd6e Check model inputs - hidden states (#40994)
* update all models

* fix copies

* skip aria tests

* update other models

* skip should be in test, not tester

* i think this is more descriptive as a name

* find and replace for new models
2025-10-06 11:48:52 +02:00
db711210d2 Fix trainer for py3.9 (#41359)
fix
2025-10-06 11:36:05 +02:00
163601c619 Standardize PretrainedConfig to PreTrainedConfig (#41300)
* replace

* add metaclass for full BC

* doc

* consistency

* update deprecation message

* revert
2025-10-06 11:34:02 +02:00
55b172b8eb 🚨 Bump to Python 3.10 and rework how we check 3rd-party libraries existence (#41268)
* cleanup

* add check

* fix

* remove all global variables

* fix

* add lru caches everywhere

* fix

* fix

* style

* improve

* reorder all functions

* fix order

* improve

* fix

* fix

* fix
2025-10-06 11:04:19 +02:00
1ec0b54414 Rope for Qwen2--5-vl (#41173)
qwen2--5-vl
2025-10-06 10:56:29 +02:00
0947b9042c Fixed tiny incorrect import in gemma3 (#41354)
Fixed tiny import issue in gemma3
2025-10-06 10:55:42 +02:00
e11a00a16f JetMoe Fix jetmoe after #40132 (#41324)
* update

* up
2025-10-04 11:02:13 +02:00
1bc75db9bd Fix lr_scheduler_parsing (#41322)
* fix

* fix
2025-10-03 17:51:17 +02:00
c2b3cc3e64 Fix jamba (#41309)
* reactivate tests

* first pass

* fix

* fix bias

* fix and simplify

* finally fix this stupid bug

* add skips

* remove bad stuff

* fix copies

* simplify
2025-10-03 16:54:19 +02:00
5abfa43f02 Security/fuyu (#41320)
remove reference to compromised repo
2025-10-03 14:13:41 +00:00
217ff1e4ef AutoAWQ tests (#41295)
* initial commit

* fix

* fix multi gpu

* fix expected output

* fix

* latest

* add comment

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-03 15:17:10 +02:00
5339f72b9b 🚨 [unbloating] unify TypedDict usage in processing (#40931)
* just squash commits into one

* fix style
2025-10-03 14:17:59 +02:00
42bcc81ba2 Minor security fix for ssh-runner.yml (#41317)
security issue

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 14:14:34 +02:00
cd4422922e Add modular detector (#41289)
* doc

* doc

* no remote code

* safe-ize the release + remove remote

* fixes

* add some documentation as well
2025-10-03 14:11:10 +02:00
59eba49237 download and use HF Hub Cache (#41181)
use hub cache

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-03 11:11:37 +02:00
de3ee737cf Fix README.md error when installing from source (#41303) 2025-10-02 16:08:27 -07:00
b914445f77 Italian translation for README.md (#41269)
chore: add Italian translation for README.md
2025-10-02 15:59:28 -07:00
41e5abac5c FIX: Bug in PEFT integration delete_adapter method (#41252)
The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.

Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.

(Note: I tested locally with that the test will pass with PEFT 0.18.0)

While working on this, I also cleaned up the following:

- The active_adapter property has been deprecated for more than 2 years
  (#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
  the docstrings, which have been addressed.

When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.
2025-10-02 18:36:57 +02:00
da3c7d1d36 🚨 [DistilBert] Refactor Attention (#41163)
* refactor

* allow pos ids for flattened sequences
2025-10-02 17:50:48 +02:00
e54defcfc2 [Flex Attn] Fix lse x attention sinks logic (#41249)
fix
2025-10-02 17:49:39 +02:00
b3bd815786 Fix mxfp4 dequantization (#41292)
fix
2025-10-02 16:47:42 +02:00
e4930d6bde 🚨 [V5] Remove deprecated resume_download (#41122)
Remove deprecated `resume_download`

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 16:44:34 +02:00
7adb43e60a Build doc in 2 jobs: en and other languages (#41290)
* separate

* separate

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 14:33:57 +00:00
e1f1d32af0 Remove some previous team members from allow list of triggering Github Actions (#41263)
* delete

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-10-02 16:32:28 +02:00
1d7ebff398 Fix - remove deprecated args checking in deepspeed intergrations (#41282)
Remove deprecated args checking in deepspeed intergrations

Signed-off-by: nguyen599 <pnvmanh2123@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-10-02 13:59:50 +00:00
9d02602f0f Remove test_initialization (#41261)
remove it
2025-10-02 15:23:43 +02:00
248e7ef8bc [docs] remove references to recently deleted classes in non-en docs (onnx, feature processors) (#41286)
remove references to old classes
2025-10-02 12:59:28 +00:00
bc33fd3fc2 Add processor and intergration test for qwen3vl (#41277)
* support aux loss in qwen3vlmoe

* update qwen3vl processor test!

* add integration tests for qwen3vl-30a3

* remove duplicated decorator

* code clean

* fix consistency

* do not inherit from nn.Linear for better quantization

* pass check
2025-10-02 14:59:04 +02:00
639ad8ccd9 feat: use aws-highcpu-32-priv for amd docker img build (#41285)
* feat: use `aws-highcpu-32-priv` for amd docker img build

* feat: add `workflow_dispatch` event to docker build CI
2025-10-02 12:53:14 +00:00
894a2bdd8c Fix pylint generator warnings (#41258)
Fix pylint generator warnings

Signed-off-by: cyy <cyyever@outlook.com>
2025-10-02 12:35:42 +00:00
1cc9069551 Fix unnecessary single-item container checks (#41279)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:35:11 +00:00
4f286fbbf8 Biogptlogits (#41270)
added logits slicing to BioGpt for seq classifier

Signed-off-by: Aviral <aviralkamaljain@gmail.com>
2025-10-02 12:33:48 +00:00
1d91a8a454 Use max/min (#41280)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 12:15:27 +00:00
f1b64c5b06 Unify is_torchvision_v2_available with is_torchvision_available (#41259)
Fix is_torchvision_v2_available

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 11:56:37 +00:00
2f3e266692 fix async client for transformers chat (#41255)
* fix-client

* fix
2025-10-02 13:23:37 +02:00
313504bcdd 🚨 [v5] remove deprecated generate classes (constraints and beam scorers) (#41223)
rm
2025-10-02 12:11:11 +01:00
8f14300663 Allow private Space id for Trackio (#40948)
* allow prive space id for trackio

* complete docstring
2025-10-02 12:38:25 +02:00
734732140a Deprecate Trackio environment variables and deploy to Spaces by default (#40950)
* allow prive space id for trackio

* complete docstring

* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default

* style

* Enhance documentation for Trackio Space ID in TrainingArguments
2025-10-02 12:37:55 +02:00
7938e91faa MoE + vllm = 😻 (#40132)
* update modeling mixtral

* oups[13;2u

* fix

* better naming?

* compute softmax and top_k inside the experts

* update minamax as well

* models that will need an update

* more models that need a fix

* stash

* fix mixtral

* update olmoe

* update

* update

* current changes

* nits

* molmoe is now fixed

* olmoe is good to go!

* refactor qwen2_moe

* fixes

* fixed moe

* fix qwen2 modular

* nit

* qwen2_moie test script works

* tricky rope !

* fix qwen3

* DeepSeek v3 MoE Standardization (#40538)

* DeepSeek-v3

Shared

Shared

* Dependents of DS3

* Standardize GLM4V MoE (#40539)

* up

* Standardize VitPose's MoE (#40549)

* VitPose

* outside

* outside

* outside

* fix

* update dbrx

* dbrx... the magix

* Refactor Ernie 4.5's MoE (#40547)

* Isolate Ernie fixes

* fix moe

---------

Co-authored-by: Vasqu <antonprogamer@gmail.com>

* fix style

* style

* fix copies

* style

* latest changes

* fixes

* had to stage

* current updaters

* up

* another modular

* modular graniteMoe

* some update

* draft another modular moe

* updaters

* up

* fix nit

* q3 nit

* fix phi moe

* we're going up up up up its our mooooment

* fix switch transformers this time around

* up

* gptsan japanese is deprecated forget about it

* fix mixtral to not be a linear (gives us more freedom)

* update

* fix copies gone wrong try catch nothing

* fix mixtral

* new refactor again

* update aria as well

* up dbrx and deepseekv3

* nit

* fix phimoe?

* fix deepseek v3

* nits

* don't bother with this one please

* up olmoe

* ??

* fix olmoe

* yups

* fiupx

* ish

* hot patch

* new qwen3

* updates

* up

* nit

* fix copies

* fix

* nits

* we're going up up up

* nits

* switch_transformesr edge case

* lol modular gptsan?

* fix deepseek

* finally all modeling match modular

* update

* up

* up

* dang

* up

* up aria

* fix dbrx

* nits here and there

* finish fixing dbrx

* fix deepseek

* upd

* up

* fix flex olmo

* updated

* update jamba

* JAMBA is stil a bit todo

* forward forward

* fix dots11

* update

* fix hunyuan

* fix some other

* update phimoe

* fuck you phimoe you are now submitted

* submit granitemoe as well

* try to fix some other models, reduces some of the failures

* fix olmoe and qwem2moe

* up

* up

* fix qwen2_moe

* update modular make it again, simpler

* nits

* up

* up

* fix

* someswitch reductions

* up

* fix qwen3vl

* some fixes to jetmo

* these should be shipped to the modular to fix jetmoe

* fix most of the nllb failures

* more nllb fixes

* fix the modular

* remove nllb modular as it sucks for now

* ?

* fix granitemoe

* granitemoehybrid don't have rope

* use rope when rope, no rope when no rope

* updates

* finish fixing dumbgrainite

* fix most of minimax

* fix

* update modular

* ?

* up

* up jetmoe still broken

* up

* fix, now align the moe

* fix jetmoe

* fix styling and qwen3 repo consitency

* updatge

* up up

* update ruff?

* nits

* modeling is goot now for switch

* fix

* more fixses to switch!

* fix some siwtch test

* ?

* ?

* up

* fix switch modular!

* nit?

* uip

* subtest

* can't believe I wasted so much time on this...

* fix

* updates

* nits

* nit jamba is fucking annoying

* ?

* fix?

* oups

* good good

* styling

* up

* make sure qwen2 sliding works!

* fix dbrx small

* lol

* nits

* fix one test

* fix load balancing loss issue

* fix jamba

* fix nllbmoe

* fix jamba consistency and doc?

* up

* thse are correct

* up

* up

* up

* some of the final cleanup

* update

* up

* fix some revert in granimoe

* bring back attention multipliers for the granite family we'll see later on if they need removal

* small jamba fix docstring and typing

* fix phimoe

* yup

* fix unk returndict in granitemoes

* up

* fix qwen config

* fix phiemoe check quality

* nits

* update based on caught non relative imports!

* fix dbrx

* Apply suggestions from code review

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix copies

* fiuxp

* fix dot1 regression!

* fix phimoe issue

* fix phi moe

* fix float() for some models

* fix jamba regression

* ui

* more dtype issues

* fix deepseek2 and 3?

* proper update

* fix modular deepseek!

* jamba jambaaaaaa

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-02 12:12:44 +02:00
e6a8e7debe Fix binding of video frames to video placeholder in InternVL model (#41237)
* Fix binding video frames to video placeholder in prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add test on binding video frames to prompt

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix code style issues

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Fix broken tests on `InternVLProcessor`

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

* Add `return_tensors` to video processor defaults

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>

---------

Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
2025-10-02 09:43:35 +00:00
30b79effb5 Remove SageMakerTrainer (#41267)
* Remove SageMakerTrainer

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More removal

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 09:16:32 +00:00
aabf0a03cb Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)
* fix multi-video timestamp bug in qwen3vl,glm4v

* run make fix-copies to sync modular files

* run make fix-copies to sync modular files

---------

Co-authored-by: UBT <daqin.luo@ubtrobot.com>
2025-10-02 11:15:57 +02:00
bcdd5532bf Use regex defailed flags (#41264)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-02 08:34:09 +00:00
55d63e86ea fix asr pipeline ut failures (#41275)
* fix asr pipeline ut failures

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* make style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-02 10:32:03 +02:00
522b79a346 add more activation kernels, follow up (#40944)
* add more activation kernels

* fixing style

* fix version
2025-10-02 08:45:05 +02:00
9f2d5666f8 docs: update bitsandbytes platform support (#41266) 2025-10-01 14:27:19 -04:00
9d8f693c7e add peft team members to issue/pr template (#41262)
* add

* Update .github/PULL_REQUEST_TEMPLATE.md

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2025-10-01 17:26:59 +00:00
94bbf8e199 Resolve remote custom module path warnings (#41243) 2025-10-01 15:55:42 +00:00
c4b505d0f7 Don't convert to safetensors on the fly if the call is from testing (#41194)
* don't convert

* disable

* Update src/transformers/modeling_utils.py

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>

* fix

* disable

* disable

* disable

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-10-01 17:46:21 +02:00
01c9e1ba68 [t5gemma] fix get_text_config and related fixes (#40939)
* tmp commit

* t5gemma fixes
2025-10-01 15:55:26 +01:00
025531981c [FA3] Fix masking and loading logic in same process (#41217)
fix loading and fa3 masking
2025-10-01 16:36:12 +02:00
3256773974 FP-Quant NVFP4 and Python 3.9 support (#39876)
* quartet

* quartet qat -> quartet

* format

* bf16 backward

* interfaces

* forward_method

* quartet -> fp_quant

* style

* List -> list

* list typing

* fixed format and annotations

* test_fp_quant

* docstrings and default dtypes

* better docstring and removed noop checks

* docs

* pseudoquantization support to test on non-blackwell

* pseudoquant

* Pseudoquant docs

* Update docs/source/en/quantization/fp_quant.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update docs/source/en/quantization/fp_quant.md

* Update docs/source/en/quantization/fp_quant.md

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

* Update tests/quantization/fp_quant_integration/test_fp_quant.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* small test fixes

* dockerfile update

* spec link

* removed `_process_model_after_weight_loading`

* toctree

* nvfp4

* nvfp4 tests

* FP-Quant version bumped

* nvfp4 default and docs update

* trainable

* cpu if pseudoquant

* proper group size selection

* gsr

* qutlass requirement version bumo

* Upstream docker copy

* docs update

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
2025-10-01 13:58:22 +00:00
d848a3953a Remove all instances of is_safetensors_available (#41233)
* safetensors is a core dep

* fix

* ok

* simplify branching

* keep it for now

---------

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-10-01 13:57:28 +00:00
e4913bdf50 🚨 [v5] Remove SinkCache (#41107)
Remove SinkCache

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:46:55 +00:00
1c8f206ecc Fix pylint warnings (#41222)
* Remove unused variables

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove reimported packages

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Fix pylint warnings

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Simplify

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:16:22 +00:00
3016717f0d Use removeprefix and removesuffix (#41240)
* Use removeprefix and removesuffix

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* More fixes

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 13:13:04 +00:00
ca975f1cb8 [V5] Remove deprecated transformers.onnx (#41214)
* Remove deprecated transformers.onnx

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Remove onnx docs

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
2025-10-01 12:17:04 +00:00
1d1ac07893 [repo utils] Update models_to_deprecate.py (#41231)
* update models_to_deprecate

* exclude this file

* handle typos and aliases

* don't commit files

* PR suggestions; make fixup
2025-10-01 12:01:52 +00:00
bcec3e2175 fix TrainerIntegrationDeepSpeed UT failures (#41236)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-10-01 13:55:01 +02:00
ae879f67f8 🚨 [v5] Delete feature extractors used for vision (#41174)
* bye bye

* remove from docs

* do not use feature extractor here

* fix docs

* do not delete it

* forgot these
2025-10-01 13:20:58 +02:00
1c4d9982d3 Use math.log2 (#41241)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-10-01 09:52:31 +00:00
db1cc65c06 Video processor accepts single frames on cuda (#41218)
* fix

* why was is np if input is in torch
2025-10-01 10:55:11 +02:00
2927 changed files with 73094 additions and 108007 deletions

View File

@ -29,6 +29,7 @@ COMMON_ENV_VARIABLES = {
"RUN_PIPELINE_TESTS": False,
# will be adjust in `CircleCIJob.to_dict`.
"RUN_FLAKY": True,
"DISABLE_SAFETENSORS_CONVERSION": True,
}
# Disable the use of {"s": None} as the output is way too long, causing the navigation on CircleCI impractical
COMMON_PYTEST_OPTIONS = {"max-worker-restart": 0, "vvv": None, "rsfE":None}
@ -185,6 +186,7 @@ class CircleCIJob:
# During the CircleCI docker images build time, we might already (or not) download the data.
# If it's done already, the files are inside the directory `/test_data/`.
{"run": {"name": "fetch hub objects before pytest", "command": "cp -r /test_data/* . 2>/dev/null || true; python3 utils/fetch_hub_objects_for_ci.py"}},
{"run": {"name": "download and unzip hub cache", "command": 'curl -L -o huggingface-cache.tar.gz https://huggingface.co/datasets/hf-internal-testing/hf_hub_cache/resolve/main/huggingface-cache.tar.gz && apt-get install pigz && tar --use-compress-program="pigz -d -p 8" -xf huggingface-cache.tar.gz && mv -n hub/* /root/.cache/huggingface/hub/ && ls -la /root/.cache/huggingface/hub/'}},
{"run": {
"name": "Run tests",
"command": f"({timeout_cmd} python3 -m pytest {marker_cmd} -n {self.pytest_num_workers} {junit_flags} {repeat_on_failure_flags} {' '.join(pytest_flags)} $(cat splitted_tests.txt) | tee tests_output.txt)"}

View File

@ -48,19 +48,19 @@ body:
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- trainer: @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- distributed: @3outeille @ArthurZucker
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @SunMarc @zach-huggingface
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- quantization: @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:

View File

@ -51,19 +51,19 @@ Library:
- continuous batching: @remi-or @ArthurZucker @McPatate
- pipelines: @Rocketknight1
- tokenizers: @ArthurZucker and @itazap
- trainer: @zach-huggingface @SunMarc
- trainer: @SunMarc
- attention: @vasqu @ArthurZucker @CyrilVallez
- model loading (from pretrained, etc): @CyrilVallez
- distributed: @3outeille @ArthurZucker @S1ro1
- distributed: @3outeille @ArthurZucker
- CIs: @ydshieh
Integrations:
- deepspeed: HF Trainer/Accelerate: @SunMarc @zach-huggingface
- ray/raytune: @richardliaw, @amogkam
- Big Model Inference: @SunMarc
- quantization (bitsandbytes, autogpt): @SunMarc @MekkCyber
- quantization: @SunMarc @MekkCyber
- kernels: @MekkCyber @drbh
- peft: @BenjaminBossan @githubnemo
Devices/Backends:

View File

@ -22,7 +22,6 @@ tests/generation/ @gante
/src/transformers/models/auto/ @ArthurZucker
/src/transformers/utils/ @ArthurZucker @Rocketknight1
/src/transformers/loss/ @ArthurZucker
/src/transformers/onnx/ @michaelbenayoun
# Specific files come after the sections/globs, so they take priority
/.circleci/config.yml @ArthurZucker @ydshieh

View File

@ -12,6 +12,8 @@ concurrency:
env:
HF_HOME: /mnt/cache
DATASET_ID: hf-benchmarks/transformers
MODEL_ID: meta-llama/Llama-3.1-8B-Instruct
jobs:
benchmark:
@ -26,7 +28,7 @@ jobs:
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )||
(github.event_name == 'push' && github.ref == 'refs/heads/main')
container:
image: huggingface/transformers-pytorch-gpu
image: huggingface/transformers-all-latest-gpu
options: --gpus all --privileged --ipc host
steps:
- name: Get repo
@ -34,26 +36,12 @@ jobs:
with:
ref: ${{ github.event.pull_request.head.sha || github.sha }}
- name: Install libpq-dev & psql
run: |
apt update
apt install -y libpq-dev postgresql-client
- name: Install benchmark script dependencies
run: python3 -m pip install -r benchmark/requirements.txt
run: python3 -m pip install -r benchmark_v2/requirements.txt kernels
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]"
- name: Run database init script
run: |
psql -f benchmark/utils/init_db.sql
env:
PGDATABASE: metrics
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e ".[torch]" && python3 -m pip uninstall -y torchvision # temp fix
- name: Run benchmark
run: |
@ -64,13 +52,11 @@ jobs:
commit_id=$GITHUB_SHA
fi
commit_msg=$(git show -s --format=%s | cut -c1-70)
python3 benchmark/benchmarks_entrypoint.py "huggingface/transformers" "$BRANCH_NAME" "$commit_id" "$commit_msg"
python3 benchmark_v2/run_benchmarks.py -b 32 -s 128 -n 256 --level 2 --branch-name "$BRANCH_NAME" --commit-id "$commit_id" --commit-message "$commit_msg" --model-id "$MODEL_ID" --log-level INFO --push-result-to-dataset "$DATASET_ID"
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
PUSH_TO_HUB_TOKEN: ${{ secrets.PUSH_TO_HUB_TOKEN }}
# Enable this to see debug logs
# HF_HUB_VERBOSITY: debug
# TRANSFORMERS_VERBOSITY: debug
PGHOST: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGHOST }}
PGUSER: transformers_benchmarks
PGPASSWORD: ${{ secrets.TRANSFORMERS_BENCHMARKS_PGPASSWORD }}
BRANCH_NAME: ${{ github.head_ref || github.ref_name }}

View File

@ -1,35 +1,7 @@
name: Benchmark v2 Framework
on:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
container_image:
description: 'Docker image to use'
required: true
type: string
container_options:
description: 'Container options to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''
workflow_dispatch:
env:
HF_HOME: /mnt/cache
@ -82,4 +54,4 @@ jobs:
--token '${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

View File

@ -1,11 +1,7 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
workflow_dispatch:
jobs:
benchmark-v2-default:
@ -13,9 +9,9 @@ jobs:
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
container_image: huggingface/transformers-pytorch-gpu
container_image: huggingface/transformers-all-latest-gpu
container_options: --gpus all --privileged --ipc host --shm-size "16gb"
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
secrets: inherit

View File

@ -1,11 +1,7 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU
on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]
workflow_dispatch:
jobs:
benchmark-v2-default:
@ -18,4 +14,4 @@ jobs:
commit_sha: ${{ github.sha }}
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
secrets: inherit

View File

@ -5,6 +5,7 @@ on:
branches:
- build_ci_docker_image*
repository_dispatch:
workflow_dispatch:
workflow_call:
inputs:
image_postfix:
@ -44,33 +45,59 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-all-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu-push-ci docker build
title: 🤗 Results of the transformers-all-latest-gpu docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
flash-attn-ci-image:
name: "PyTorch with Flash Attn [dev]"
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-all-latest-gpu
build-args: |
REF=main
PYTORCH=2.8.0
TORCHCODEC=0.7.0
FLASH_ATTN=yes
push: true
tags: huggingface/transformers-all-latest-gpu${{ inputs.image_postfix }}:flash-attn
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-all-latest-gpu docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-torch-deepspeed-docker:
name: "Latest PyTorch + DeepSpeed"
runs-on:
group: aws-g4dn-2xlarge-cache
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
@ -103,51 +130,8 @@ jobs:
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
# Can't build 2 images in a single job `latest-torch-deepspeed-docker` (for `nvcr.io/nvidia`)
latest-torch-deepspeed-docker-for-push-ci-daily-build:
name: "Latest PyTorch + DeepSpeed (Push CI - Daily Build)"
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-latest-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the transformers-pytorch-deepspeed-latest-gpu-push-ci docker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
doc-builder:
name: "Doc builder"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
@ -180,48 +164,10 @@ jobs:
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch:
name: "Latest PyTorch [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
-
name: Check out code
uses: actions/checkout@v4
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Build and push
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-gpu
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-gpudocker build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
latest-pytorch-amd:
name: "Latest PyTorch (AMD) [dev]"
runs-on:
group: aws-general-8-plus
group: aws-highcpu-32-priv
steps:
-
name: Set up Docker Buildx
@ -244,29 +190,47 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-amd-gpu-push-ci
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
slack_channel: ${{ secrets.CI_SLACK_CHANNEL_DOCKER }}
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu-push-ci build
title: 🤗 Results of the huggingface/transformers-pytorch-amd-gpu build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
cache-latest-pytorch-amd:
name: "Cache Latest Pytorch (AMD) Image"
needs: latest-pytorch-amd
runs-on:
group: amd-mi325-1gpu
steps:
-
name: Login to DockerHub
uses: docker/login-action@v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_PASSWORD }}
-
name: Pull and save docker image to cache
run: |
image="huggingface/transformers-pytorch-amd-gpu"
final_path="/mnt/image-cache/transformers-pytorch-amd-gpu.tar"
tmp_path="${final_path}.tmp"
echo "Pulling image: ${image}"
docker pull "${image}"
echo "Saving to temp file: ${tmp_path}"
docker save "${image}" -o "${tmp_path}"
echo "Moving to final path: ${final_path}"
mv -f "${tmp_path}" "${final_path}"
echo "Cache populated successfully at ${final_path}"
latest-pytorch-deepspeed-amd:
name: "PyTorch + DeepSpeed (AMD) [dev]"
runs-on:
@ -293,19 +257,6 @@ jobs:
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu${{ inputs.image_postfix }}
# Push CI images still need to be re-built daily
-
name: Build and push (for Push CI) in a daily basis
# This condition allows `schedule` events, or `push` events that trigger this workflow NOT via `workflow_call`.
# The later case is useful for manual image building for debugging purpose. Use another tag in this case!
if: inputs.image_postfix != '-push-ci'
uses: docker/build-push-action@v5
with:
context: ./docker/transformers-pytorch-deepspeed-amd-gpu
build-args: |
REF=main
push: true
tags: huggingface/transformers-pytorch-deepspeed-amd-gpu-push-ci
- name: Post to Slack
if: always()
@ -318,8 +269,6 @@ jobs:
latest-quantization-torch-docker:
name: "Latest Pytorch + Quantization [dev]"
# Push CI doesn't need this image
if: inputs.image_postfix != '-push-ci'
runs-on:
group: aws-general-8-plus
steps:

View File

@ -16,8 +16,20 @@ jobs:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de en es fr hi it ja ko pt zh
languages: en
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
build_other_lang:
uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
with:
commit_sha: ${{ github.sha }}
package: transformers
notebook_folder: transformers_doc
languages: ar de es fr hi it ja ko pt zh
custom_container: huggingface/transformers-doc-builder
secrets:
token: ${{ secrets.HUGGINGFACE_PUSH }}
hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}

View File

@ -6,9 +6,6 @@ on:
docker:
required: true
type: string
start_sha:
required: true
type: string
job:
required: true
type: string
@ -24,7 +21,13 @@ on:
commit_sha:
required: false
type: string
pr_number:
required: false
type: string
outputs:
report:
description: "Content of the report of new failures"
value: ${{ jobs.process_new_failures_with_commit_info.outputs.report }}
env:
HF_HOME: /mnt/cache
@ -35,16 +38,20 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
check_new_failures:
name: " "
name: "Find commits for new failing tests"
strategy:
matrix:
run_idx: [1]
runs-on:
group: aws-g5-4xlarge-cache
outputs:
process: ${{ steps.check_file.outputs.process }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
@ -55,14 +62,19 @@ jobs:
path: /transformers/ci_results_${{ inputs.job }}
- name: Check file
id: check_file
working-directory: /transformers
env:
job: ${{ inputs.job }}
run: |
if [ -f ci_results_${{ inputs.job }}/new_failures.json ]; then
echo "`ci_results_${{ inputs.job }}/new_failures.json` exists, continue ..."
if [ -f "ci_results_${job}/new_failures.json" ]; then
echo "\`ci_results_${job}/new_failures.json\` exists, continue ..."
echo "process=true" >> $GITHUB_ENV
echo "process=true" >> $GITHUB_OUTPUT
else
echo "`ci_results_${{ inputs.job }}/new_failures.json` doesn't exist, abort."
echo "\`ci_results_${job}/new_failures.json\` doesn't exist, abort."
echo "process=false" >> $GITHUB_ENV
echo "process=false" >> $GITHUB_OUTPUT
fi
- uses: actions/download-artifact@v4
@ -81,27 +93,62 @@ jobs:
echo "PREV_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
if [ -f setup_values/other_workflow_run_id.txt ]; then
echo "OTHER_WORKFLOW_RUN_ID=$(cat setup_values/other_workflow_run_id.txt)" >> $GITHUB_ENV
else
echo "OTHER_WORKFLOW_RUN_ID=" >> $GITHUB_ENV
fi
- name: Update clone
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
- name: Get target commit
- name: Get `START_SHA`
working-directory: /transformers/utils
if: ${{ env.process == 'true' }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
echo "END_SHA=$(TOKEN=${{ secrets.ACCESS_REPO_INFO_TOKEN }} python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
echo "START_SHA=$commit_sha" >> $GITHUB_ENV
- name: Checkout to `start_sha`
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: git fetch && git checkout ${{ inputs.start_sha }}
# This is used if the CI is triggered from a pull request `self-comment-ci.yml` (after security check is verified)
- name: Extract the base commit on `main` (of the merge commit created by Github) if it is a PR
id: pr_info
if: ${{ env.process == 'true' && inputs.pr_number != '' }}
uses: actions/github-script@v6
with:
script: |
const { data: pr } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: ${{ inputs.pr_number }}
});
const { data: merge_commit } = await github.rest.repos.getCommit({
owner: pr.base.repo.owner.login,
repo: pr.base.repo.name,
ref: pr.merge_commit_sha,
});
core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
# Usually, `END_SHA` should be the commit of the last previous workflow run of the **SAME** (scheduled) workflow.
# (This is why we don't need to specify `workflow_id` which would be fetched automatically in the python script.)
- name: Get `END_SHA` from previous CI runs of the same workflow
working-directory: /transformers/utils
if: ${{ env.process == 'true' && inputs.pr_number == '' }}
env:
ACCESS_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
run: |
echo "END_SHA=$(TOKEN="$ACCESS_TOKEN" python3 -c 'import os; from get_previous_daily_ci import get_last_daily_ci_run_commit; commit=get_last_daily_ci_run_commit(token=os.environ["TOKEN"], workflow_run_id=os.environ["PREV_WORKFLOW_RUN_ID"]); print(commit)')" >> $GITHUB_ENV
# However, for workflow runs triggered by `issue_comment` (for pull requests), we want to check against the
# parent commit (on `main`) of the `merge_commit` (dynamically created by GitHub). In this case, the goal is to
# see if a reported failing test is actually ONLY failing on the `merge_commit`.
- name: Set `END_SHA`
if: ${{ env.process == 'true' && inputs.pr_number != '' }}
env:
merge_commit_base_sha: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
run: |
echo "END_SHA=$merge_commit_base_sha" >> $GITHUB_ENV
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -119,6 +166,10 @@ jobs:
run: |
python3 utils/print_env.py
- name: Install pytest-flakefinder
if: ${{ env.process == 'true' }}
run: python3 -m pip install pytest-flakefinder
- name: Show installed libraries and their versions
working-directory: /transformers
if: ${{ env.process == 'true' }}
@ -127,37 +178,78 @@ jobs:
- name: Check failed tests
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: python3 utils/check_bad_commit.py --start_commit ${{ inputs.start_sha }} --end_commit ${{ env.END_SHA }} --file ci_results_${{ inputs.job }}/new_failures.json --output_file new_failures_with_bad_commit.json
env:
job: ${{ inputs.job }}
run_idx: ${{ matrix.run_idx }}
run: python3 utils/check_bad_commit.py --start_commit "$START_SHA" --end_commit "$END_SHA" --file "ci_results_${job}/new_failures.json" --output_file "new_failures_with_bad_commit_${job}_${run_idx}.json"
- name: Show results
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
ls -l new_failures_with_bad_commit.json
cat new_failures_with_bad_commit.json
- name: Checkout back
working-directory: /transformers
if: ${{ env.process == 'true' }}
run: |
git checkout ${{ inputs.start_sha }}
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
JOB_NAME: ${{ inputs.job }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
job: ${{ inputs.job }}
run_idx: ${{ matrix.run_idx }}
run: |
python3 utils/process_bad_commit_report.py
ls -l "new_failures_with_bad_commit_${job}_${run_idx}.json"
cat "new_failures_with_bad_commit_${job}_${run_idx}.json"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}_${{ matrix.run_idx }}.json
process_new_failures_with_commit_info:
name: "process bad commit reports"
needs: check_new_failures
if: needs.check_new_failures.outputs.process == 'true'
runs-on:
group: aws-g5-4xlarge-cache
outputs:
report: ${{ steps.set_output.outputs.report }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- uses: actions/download-artifact@v4
with:
name: ci_results_${{ inputs.job }}
path: /transformers/ci_results_${{ inputs.job }}
- uses: actions/download-artifact@v4
with:
pattern: new_failures_with_bad_commit_${{ inputs.job }}*
path: /transformers/new_failures_with_bad_commit_${{ inputs.job }}
merge-multiple: true
- name: Check files
working-directory: /transformers
env:
job: ${{ inputs.job }}
run: |
ls -la /transformers
ls -la "/transformers/new_failures_with_bad_commit_${job}"
# Currently, we only run with a single runner by using `run_idx: [1]`. We might try to run with multiple runners
# to further reduce the false positive caused by flaky tests, which requires further processing to merge reports.
- name: Merge files
shell: bash
working-directory: /transformers
env:
job: ${{ inputs.job }}
run: |
cp "/transformers/new_failures_with_bad_commit_${job}/new_failures_with_bad_commit_${job}_1.json" new_failures_with_bad_commit.json
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
- name: Process report
shell: bash
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN: ${{ secrets.TRANSFORMERS_CI_RESULTS_UPLOAD_TOKEN }}
@ -170,15 +262,40 @@ jobs:
echo EOF
} >> "$GITHUB_ENV"
# The output is useful if a caller needs more processing, for example, we have a chain
# self-comment-ci.yml -> self-scheduled.yml -> this one (check_failed_tests.yml),
# and `self-comment-ci.yml` needs further processing before sending a GitHub comment to the pull request page.
- name: Show results & Set outputs
id: set_output
working-directory: /transformers
run: |
ls -l new_failures_with_bad_commit.json
cat new_failures_with_bad_commit.json
{
echo 'report<<EOF'
cat new_failures_with_bad_commit.json
echo '' # Force a newline
echo EOF
} >> "$GITHUB_OUTPUT"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: new_failures_with_bad_commit_${{ inputs.job }}
path: /transformers/new_failures_with_bad_commit.json
- name: Prepare Slack report title
working-directory: /transformers
if: ${{ env.process == 'true' }}
env:
ci_event: ${{ inputs.ci_event }}
job: ${{ inputs.job }}
run: |
pip install slack_sdk
echo "title=$(python3 -c 'import sys; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = "${{ inputs.ci_event }}"; job = "${{ inputs.job }}"; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
echo "title=$(python3 -c 'import sys; import os; sys.path.append("utils"); from utils.notification_service import job_to_test_map; ci_event = os.environ["ci_event"]; job = os.environ["job"]; test_name = job_to_test_map[job]; title = f"New failed tests of {ci_event}" + ":" + f" {test_name}"; print(title)')" >> $GITHUB_ENV
- name: Send processed report
if: ${{ env.process == 'true' && !endsWith(env.REPORT_TEXT, '{}') }}
if: ${{ !endsWith(env.REPORT_TEXT, '{}') }}
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
with:
# Slack channel id, channel name, or user id to post message.

22
.github/workflows/codeql.yml vendored Normal file
View File

@ -0,0 +1,22 @@
---
name: CodeQL Security Analysis
on:
push:
branches: ["main"]
# pull_request:
# branches: ["main"]
workflow_dispatch:
jobs:
codeql:
name: CodeQL Analysis
uses: huggingface/security-workflows/.github/workflows/codeql-reusable.yml@main
permissions:
security-events: write
packages: read
actions: read
contents: read
with:
languages: '["actions"]'
queries: 'security-extended,security-and-quality'

View File

@ -16,7 +16,6 @@ env:
RUN_SLOW: yes
OMP_NUM_THREADS: 16
MKL_NUM_THREADS: 16
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
jobs:

View File

@ -39,6 +39,9 @@ on:
PR_MERGE_COMMIT_SHA:
description: "The sha of the merge commit for the pull request (created by GitHub) in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
PR_MERGE_COMMIT_BASE_SHA:
description: "The sha of the parent commit of the the merge commit on the target branch in the base repository"
value: ${{ jobs.get-pr-info.outputs.PR_MERGE_COMMIT_BASE_SHA }}
PR_HEAD_COMMIT_DATE:
description: "The date of the head sha of the pull request branch in the head repository"
value: ${{ jobs.get-pr-info.outputs.PR_HEAD_COMMIT_DATE }}
@ -74,6 +77,7 @@ jobs:
PR_BASE_REF: ${{ steps.pr_info.outputs.base_ref }}
PR_HEAD_SHA: ${{ steps.pr_info.outputs.head_sha }}
PR_BASE_SHA: ${{ steps.pr_info.outputs.base_sha }}
PR_MERGE_COMMIT_BASE_SHA: ${{ steps.pr_info.outputs.merge_commit_base_sha }}
PR_MERGE_COMMIT_SHA: ${{ steps.pr_info.outputs.merge_commit_sha }}
PR_HEAD_COMMIT_DATE: ${{ steps.pr_info.outputs.head_commit_date }}
PR_MERGE_COMMIT_DATE: ${{ steps.pr_info.outputs.merge_commit_date }}
@ -122,6 +126,7 @@ jobs:
core.setOutput('base_ref', pr.base.ref);
core.setOutput('head_sha', pr.head.sha);
core.setOutput('base_sha', pr.base.sha);
core.setOutput('merge_commit_base_sha', merge_commit.parents[0].sha);
core.setOutput('merge_commit_sha', pr.merge_commit_sha);
core.setOutput('pr', pr);
@ -142,16 +147,21 @@ jobs:
date: merge_commit.commit.committer.date
});
console.log('PR Info:', {
pr_info: pr
});
- name: Convert dates to timestamps
id: get_timestamps
env:
head_commit_date: ${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date: ${{ steps.pr_info.outputs.merge_commit_date }}
run: |
head_commit_date=${{ steps.pr_info.outputs.head_commit_date }}
merge_commit_date=${{ steps.pr_info.outputs.merge_commit_date }}
echo $head_commit_date
echo $merge_commit_date
echo "$head_commit_date"
echo "$merge_commit_date"
head_commit_timestamp=$(date -d "$head_commit_date" +%s)
merge_commit_timestamp=$(date -d "$merge_commit_date" +%s)
echo $head_commit_timestamp
echo $merge_commit_timestamp
echo "$head_commit_timestamp"
echo "$merge_commit_timestamp"
echo "head_commit_timestamp=$head_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT
echo "merge_commit_timestamp=$merge_commit_timestamp" >> $GITHUB_OUTPUT

View File

@ -15,13 +15,19 @@ jobs:
steps:
- name: Get PR number
shell: bash
env:
issue_number: ${{ github.event.issue.number }}
is_pull_request_issue: ${{ github.event.issue.pull_request != null }}
pr_number: ${{ github.event.pull_request.number }}
is_pull_request: ${{ github.event.pull_request != null }}
event_number: ${{ github.event.number }}
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request.number }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.pull_request.number }}" >> $GITHUB_ENV
elif [[ "${{ github.event.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.number }}" >> $GITHUB_ENV
if [[ "$issue_number" != "" && "$is_pull_request_issue" == "true" ]]; then
echo "PR_NUMBER=$issue_number" >> $GITHUB_ENV
elif [[ "$pr_number" != "" ]]; then
echo "PR_NUMBER=$pr_number" >> $GITHUB_ENV
elif [[ "$is_pull_request" == "true" ]]; then
echo "PR_NUMBER=$event_number" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
@ -29,8 +35,8 @@ jobs:
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
echo "$PR_NUMBER"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"
run: echo "PR_NUMBER=$PR_NUMBER" >> "$GITHUB_OUTPUT"

View File

@ -28,6 +28,9 @@ on:
report_repo_id:
required: false
type: string
pytest_marker:
required: false
type: string
env:
HF_HOME: /mnt/cache
@ -38,7 +41,6 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -60,25 +62,33 @@ jobs:
steps:
- name: Echo input and matrix info
shell: bash
env:
folder_slices: ${{ inputs.folder_slices }}
matrix_folders: ${{ matrix.folders }}
slice_data: ${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
echo "$folder_slices"
echo "$matrix_folders"
echo "$slice_data"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
env:
matrix_folders_raw: ${{ matrix.folders }}
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders_raw"
matrix_folders="${matrix_folders_raw/'models/'/'models_'}"
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch origin "$commit_sha" && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -113,15 +123,17 @@ jobs:
id: set_machine_type
working-directory: /transformers
shell: bash
env:
input_machine_type: ${{ inputs.machine_type }}
run: |
echo "${{ inputs.machine_type }}"
echo "$input_machine_type"
if [ "${{ inputs.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
if [ "$input_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
elif [ "$input_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
machine_type="$input_machine_type"
fi
echo "$machine_type"
@ -130,15 +142,21 @@ jobs:
- name: Create report directory if it doesn't exist
shell: bash
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
echo "dummy" > /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/dummy.txt
ls -la /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
mkdir -p "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
echo "dummy" > "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/dummy.txt"
ls -la "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
- name: Run all tests on GPU
working-directory: /transformers
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
pytest_marker: ${{ inputs.pytest_marker }}
model: ${{ matrix.folders }}
run: |
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports python3 -m pytest -rsfE -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports tests/${{ matrix.folders }}" test_outputs.txt
script -q -c "PATCH_TESTING_METHODS_TO_COLLECT_OUTPUTS=yes _PATCHED_TESTING_METHODS_OUTPUT_DIR=/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports python3 -m pytest -rsfE -v -m '${pytest_marker}' --make-reports=${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports tests/${model}" test_outputs.txt
ls -la
# Extract the exit code from the output file
EXIT_CODE=$(tail -1 test_outputs.txt | grep -o 'COMMAND_EXIT_CODE="[0-9]*"' | cut -d'"' -f2)
@ -149,19 +167,25 @@ jobs:
# This step is only to show information on Github Actions log.
# Always mark this step as successful, even if the report directory or the file `failures_short.txt` in it doesn't exist
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/failures_short.txt
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/failures_short.txt"
- name: Captured information
if: ${{ failure() }}
continue-on-error: true
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
cat /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports/captured_info.txt
cat "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports/captured_info.txt"
- name: Copy test_outputs.txt
if: ${{ always() }}
continue-on-error: true
env:
report_name_prefix: ${{ inputs.report_name_prefix }}
run: |
cp /transformers/test_outputs.txt /transformers/reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
cp /transformers/test_outputs.txt "/transformers/reports/${machine_type}_${report_name_prefix}_${matrix_folders}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
@ -172,7 +196,7 @@ jobs:
collated_reports:
name: Collated Reports
if: ${{ always() }}
if: ${{ always() && inputs.runner_type != '' }}
needs: run_models_gpu
uses: huggingface/transformers/.github/workflows/collated-reports.yml@main
with:

View File

@ -26,7 +26,6 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -14,7 +14,7 @@ permissions: {}
jobs:
get-pr-number:
name: Get PR number
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'build-doc')) }}
uses: ./.github/workflows/get-pr-number.yml
get-pr-info:
@ -98,7 +98,7 @@ jobs:
commit_sha: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
package: transformers
languages: ar de en es fr hi it ko pt tr zh ja te
languages: ar de en es fr hi it ja ko pt zh
update_run_status:
name: Update Check Run Status

View File

@ -1,4 +1,4 @@
name: PR slow CI
name: PR slow CI - Suggestion
on:
pull_request_target:
types: [opened, synchronize, reopened]
@ -23,11 +23,26 @@ jobs:
outputs:
jobs: ${{ steps.get_jobs.outputs.jobs_to_run }}
steps:
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
env:
PR_FILES: ${{ needs.get-pr-info.outputs.PR_FILES }}
run: |
cat > pr_files.txt << EOF
$PR_FILES
EOF
- name: Get repository content
id: repo_content
uses: actions/github-script@v6
with:
script: |
const fs = require('node:fs');
const { data: tests_dir } = await github.rest.repos.getContent({
owner: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_OWNER }}',
repo: '${{ needs.get-pr-info.outputs.PR_HEAD_REPO_NAME }}',
@ -49,38 +64,10 @@ jobs:
ref: '${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}',
});
core.setOutput('tests_dir', tests_dir);
core.setOutput('tests_models_dir', tests_models_dir);
core.setOutput('tests_quantization_dir', tests_quantization_dir);
# This checkout to the main branch
- uses: actions/checkout@v4
with:
fetch-depth: "0"
- name: Write pr_files file
run: |
cat > pr_files.txt << 'EOF'
${{ needs.get-pr-info.outputs.PR_FILES }}
EOF
- name: Write tests_dir file
run: |
cat > tests_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_dir }}
EOF
- name: Write tests_models_dir file
run: |
cat > tests_models_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_models_dir }}
EOF
- name: Write tests_quantization_dir file
run: |
cat > tests_quantization_dir.txt << 'EOF'
${{ steps.repo_content.outputs.tests_quantization_dir }}
EOF
// Write to files instead of outputs
fs.writeFileSync('tests_dir.txt', JSON.stringify(tests_dir, null, 2));
fs.writeFileSync('tests_models_dir.txt', JSON.stringify(tests_models_dir, null, 2));
fs.writeFileSync('tests_quantization_dir.txt', JSON.stringify(tests_quantization_dir, null, 2));
- name: Run script to get jobs to run
id: get_jobs

View File

@ -149,9 +149,9 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-push"
docker: huggingface/transformers-all-latest-gpu
docker: huggingface/transformers-all-latest-gpu:flash-attn
ci_event: push
report_repo_id: hf-internal-testing/transformers_ci_push
commit_sha: ${{ github.sha }}
models: ${{ needs.get_modified_models.outputs.matrix }}
subdirs: ${{ needs.get_modified_models.outputs.matrix }}
secrets: inherit

View File

@ -20,66 +20,37 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
get-pr-number:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:
- name: Get PR number
shell: bash
run: |
if [[ "${{ github.event.issue.number }}" != "" && "${{ github.event.issue.pull_request }}" != "" ]]; then
echo "PR_NUMBER=${{ github.event.issue.number }}" >> $GITHUB_ENV
else
echo "PR_NUMBER=" >> $GITHUB_ENV
fi
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "eustlb", "MekkCyber", "vasqu", "ivarflakstad", "stevhliu", "ebezzam", "remi-or", "itazap"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
uses: ./.github/workflows/get-pr-number.yml
- name: Check PR number
shell: bash
run: |
echo "${{ env.PR_NUMBER }}"
- name: Set PR number
id: set_pr_number
run: echo "PR_NUMBER=${{ env.PR_NUMBER }}" >> "$GITHUB_OUTPUT"
get-sha:
runs-on: ubuntu-22.04
get-pr-info:
name: Get PR commit SHA
needs: get-pr-number
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
outputs:
PR_HEAD_SHA: ${{ steps.get_sha.outputs.PR_HEAD_SHA }}
PR_MERGE_SHA: ${{ steps.get_sha.outputs.PR_MERGE_SHA }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
uses: ./.github/workflows/get-pr-info.yml
with:
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
- name: Get SHA (and verify timestamps against the issue comment date)
id: get_sha
check-timestamps:
name: Check timestamps (security check)
runs-on: ubuntu-22.04
needs: get-pr-info
outputs:
PR_HEAD_SHA: ${{ needs.get-pr-info.outputs.PR_HEAD_SHA }}
PR_MERGE_SHA: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_SHA }}
steps:
- name: Verify `merge_commit` timestamp is older than the issue comment timestamp
env:
PR_NUMBER: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
COMMENT_DATE: ${{ github.event.comment.created_at }}
PR_MERGE_COMMIT_TIMESTAMP: ${{ needs.get-pr-info.outputs.PR_MERGE_COMMIT_TIMESTAMP }}
run: |
git fetch origin refs/pull/$PR_NUMBER/head:refs/remotes/pull/$PR_NUMBER/head
git checkout refs/remotes/pull/$PR_NUMBER/head
echo "PR_HEAD_SHA: $(git log -1 --format=%H)"
echo "PR_HEAD_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
git fetch origin refs/pull/$PR_NUMBER/merge:refs/remotes/pull/$PR_NUMBER/merge
git checkout refs/remotes/pull/$PR_NUMBER/merge
echo "PR_MERGE_SHA: $(git log -1 --format=%H)"
echo "PR_MERGE_SHA=$(git log -1 --format=%H)" >> "$GITHUB_OUTPUT"
PR_MERGE_COMMIT_TIMESTAMP=$(git log -1 --date=unix --format=%cd)
echo "PR_MERGE_COMMIT_TIMESTAMP: $PR_MERGE_COMMIT_TIMESTAMP"
COMMENT_TIMESTAMP=$(date -d "${COMMENT_DATE}" +"%s")
echo "COMMENT_DATE: $COMMENT_DATE"
echo "COMMENT_TIMESTAMP: $COMMENT_TIMESTAMP"
@ -88,13 +59,10 @@ jobs:
exit -1;
fi
# use a python script to handle this complex logic
# case 1: `run-slow` (auto. infer with limited number of models, but in particular, new model)
# case 2: `run-slow model_1, model_2`
# use a python script to handle this complex logic.
get-tests:
runs-on: ubuntu-22.04
needs: [get-pr-number, get-sha]
if: ${{ needs.get-pr-number.outputs.PR_NUMBER != ''}}
needs: [get-pr-number, check-timestamps]
outputs:
models: ${{ steps.models_to_run.outputs.models }}
quantizations: ${{ steps.models_to_run.outputs.quantizations }}
@ -102,11 +70,11 @@ jobs:
- uses: actions/checkout@v4
with:
fetch-depth: "0"
ref: "refs/pull/${{needs.get-pr-number.outputs.PR_NUMBER}}/merge"
ref: "refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge"
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
VERIFIED_PR_MERGE_SHA: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
@ -127,11 +95,33 @@ jobs:
- name: Show models to test
id: models_to_run
run: |
echo "${{ env.models }}"
echo "models=${{ env.models }}" >> $GITHUB_ENV
echo "models=${{ env.models }}" >> $GITHUB_OUTPUT
echo "${{ env.quantizations }}"
echo "quantizations=${{ env.quantizations }}" >> $GITHUB_OUTPUT
echo "$models"
echo "models=$models" >> $GITHUB_OUTPUT
echo "$quantizations"
echo "quantizations=$quantizations" >> $GITHUB_OUTPUT
# Report back if we are not able to get the tests (for example, security check is failing)
report_error_earlier:
name: Report error earlier
if: ${{ always() && needs.get-pr-info.result == 'success' && needs.get-tests.result != 'success' }}
needs: [get-pr-number, get-pr-info, get-tests]
permissions:
pull-requests: write
runs-on: ubuntu-22.04
steps:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/issues/${pr_number}/comments" \
-f body="💔 This comment contains \`run-slow\`, but unknown error occurred and [the workflow run]($GITHUB_RUN_URL) aborted!"
reply_to_comment:
name: Reply to the comment
@ -144,20 +134,20 @@ jobs:
- name: Reply to the comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
MODELS: ${{ needs.get-tests.outputs.models }}
BODY: "\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}"
BODY: '\n\nmodels: ${{ needs.get-tests.outputs.models }}\nquantizations: ${{ needs.get-tests.outputs.quantizations }}'
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/issues/${{ needs.get-pr-number.outputs.PR_NUMBER }}/comments \
-f "body=This comment contains run-slow, running the specified jobs: ${{ env.BODY }} ..."
"repos/${github_repository}/issues/${pr_number}/comments" \
-f body="This comment contains \`run-slow\`, running the specified jobs: $(echo -e "$BODY")"
create_run:
name: Create run
if: ${{ needs.get-tests.outputs.models != '[]' || needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-sha, get-tests, reply_to_comment]
needs: [check-timestamps, reply_to_comment]
permissions:
statuses: write
runs-on: ubuntu-22.04
@ -169,248 +159,196 @@ jobs:
# Create a commit status (pending) for a run of this workflow. The status has to be updated later in `update_run_status`.
# See https://docs.github.com/en/rest/commits/statuses?apiVersion=2022-11-28#create-a-commit-status
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
"repos/${github_repository}/statuses/${pr_head_sha}" \
-f "target_url=$GITHUB_RUN_URL" -f "state=pending" -f "description=Slow CI job" -f "context=pytest/custom-tests"
run_models_gpu:
name: Run all tests for the model
model-ci:
name: Model CI
if: ${{ needs.get-tests.outputs.models != '[]' }}
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.models) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ matrix.folders }}"
uses: ./.github/workflows/self-scheduled.yml
needs: [get-pr-number, check-timestamps, get-tests, create_run]
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-pr"
docker: huggingface/transformers-all-latest-gpu
ci_event: PR Comment CI
report_repo_id: hf-internal-testing/transformers_pr_ci
commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
subdirs: ${{ needs.get-tests.outputs.models }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
secrets: inherit
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: |
export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})"
echo $CUDA_VISIBLE_DEVICES
python3 -m pytest -v -rsfE --make-reports=${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
run_quantization_torch_gpu:
name: Run all tests for a quantization
quantization-ci:
name: Quantization CI
if: ${{ needs.get-tests.outputs.quantizations != '[]' }}
needs: [get-pr-number, get-sha, get-tests, create_run]
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.get-tests.outputs.quantizations) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-quantization-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'quantization/'/'quantization_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
uses: ./.github/workflows/self-scheduled.yml
needs: [get-pr-number, check-timestamps, get-tests, create_run]
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-pr"
docker: huggingface/transformers-quantization-latest-gpu
ci_event: PR Comment CI
report_repo_id: hf-internal-testing/transformers_pr_ci
commit_sha: ${{ needs.check-timestamps.outputs.PR_MERGE_SHA }}
subdirs: ${{ needs.get-tests.outputs.quantizations }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
secrets: inherit
- name: Checkout to PR merge commit
working-directory: /transformers
run: |
git fetch origin refs/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge:refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git checkout refs/remotes/pull/${{ needs.get-pr-number.outputs.PR_NUMBER }}/merge
git log -1 --format=%H
- name: Verify merge commit SHA
env:
VERIFIED_PR_MERGE_SHA: ${{ needs.get-sha.outputs.PR_MERGE_SHA }}
working-directory: /transformers
run: |
PR_MERGE_SHA=$(git log -1 --format=%H)
if [ $PR_MERGE_SHA != $VERIFIED_PR_MERGE_SHA ]; then
echo "The merged commit SHA is not the same as the verified one! Security issue detected, abort the workflow!";
exit -1;
fi
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run quantization tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Make sure report directory exists
shell: bash
run: |
mkdir -p /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_run_quantization_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports
update_run_status:
name: Update Check Run Status
needs: [get-sha, create_run, run_models_gpu, run_quantization_torch_gpu]
report:
name: Check & Report
needs: [get-pr-number, check-timestamps, create_run, model-ci, quantization-ci]
permissions:
pull-requests: write
statuses: write
if: ${{ always() && needs.create_run.result == 'success' }}
runs-on: ubuntu-22.04
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
STATUS_OK: ${{ contains(fromJSON('["skipped", "success"]'), needs.run_models_gpu.result) && contains(fromJSON('["skipped", "success"]'), needs.run_quantization_torch_gpu.result) }}
steps:
- name: Get `run_models_gpu` job status
- name: Show reports from jobs
env:
MODEL_REPORT: ${{ needs.model-ci.outputs.report }}
QUANT_REPORT: ${{ needs.quantization-ci.outputs.report }}
run: |
echo "${{ needs.run_models_gpu.result }}"
echo "${{ needs.run_quantization_torch_gpu.result }}"
echo $STATUS_OK
if [ "$STATUS_OK" = "true" ]; then
echo "STATUS=success" >> $GITHUB_ENV
else
echo "STATUS=failure" >> $GITHUB_ENV
fi
echo "$MODEL_REPORT"
echo "$QUANT_REPORT"
- name: Update PR commit statuses
- name: Process and filter reports
env:
MODEL_REPORT: ${{ needs.model-ci.outputs.report }}
QUANT_REPORT: ${{ needs.quantization-ci.outputs.report }}
run: |
echo "${{ needs.run_models_gpu.result }}"
echo "${{ env.STATUS }}"
# Preprocess with Python
python3 << 'PYTHON_SCRIPT'
import json
import os
def filter_and_format_report(data):
"""
Filter out entries where commit is `None` (failing tests who status is not certain) and format as text
"""
lines = []
for model, model_result in data.items():
model_lines = []
for device, failures in model_result.items():
# Filter out None commits and extract just the test names
test_names = [
failure['test']
for failure in failures
if isinstance(failure, dict) and failure.get('commit') is not None
]
# Add tests to model lines
for idx, test_name in enumerate(test_names):
if idx == 0:
job_link = failures[idx]['job_link']
model_lines.append(f"- [{model}]({job_link}):")
model_lines.append(f" {test_name}")
# Only add model section if it has tests
if len(model_lines) > 0:
lines.extend(model_lines)
lines.append("") # Empty line between models
return "\n".join(lines).strip()
# Load and filter reports
model_report_str = os.environ.get('MODEL_REPORT', '{}')
quant_report_str = os.environ.get('QUANT_REPORT', '{}')
model_report = json.loads(model_report_str) if model_report_str else {}
quant_report = json.loads(quant_report_str) if quant_report_str else {}
formatted_model = filter_and_format_report(model_report)
formatted_quant = filter_and_format_report(quant_report)
# Write to files
with open('model_ci.txt', 'w') as f:
f.write(formatted_model)
if formatted_model:
f.write('\n')
with open('quantization_ci.txt', 'w') as f:
f.write(formatted_quant)
if formatted_quant:
f.write('\n')
PYTHON_SCRIPT
- name: Post results as PR comment
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_number: ${{ needs.get-pr-number.outputs.PR_NUMBER }}
model_ci_result: ${{ needs.model-ci.result }}
quantization_ci_result: ${{ needs.quantization-ci.result }}
run: |
{
echo '## CI Results'
echo "[Workflow Run ⚙️]($GITHUB_RUN_URL)"
echo ''
# Check if both jobs were skipped or cancelled
if [[ "$model_ci_result" == "skipped" || "$model_ci_result" == "cancelled" ]] && \
[[ "$quantization_ci_result" == "skipped" || "$quantization_ci_result" == "cancelled" ]]; then
echo '⚠️ No test being reported (jobs are skipped or cancelled)!'
echo "STATUS=error" >> $GITHUB_ENV
# Check if either file has content
elif [ -s model_ci.txt ] || [ -s quantization_ci.txt ]; then
echo "STATUS=failure" >> $GITHUB_ENV
# Check if model_ci.txt has content
if [ -s model_ci.txt ]; then
echo '### Model CI Report'
echo ''
echo '#### ❌ Failed tests'
echo ''
cat model_ci.txt
echo ''
fi
# Check if quantization_ci.txt has content
if [ -s quantization_ci.txt ]; then
echo '### Quantization CI Report'
echo ''
echo '#### ❌ Failed tests'
echo ''
cat quantization_ci.txt
echo ''
fi
else
echo "STATUS=success" >> $GITHUB_ENV
echo '✅ No failing test specific to this PR 🎉 !'
fi
} > comment_body.txt
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
repos/${{ github.repository }}/statuses/${{ needs.get-sha.outputs.PR_HEAD_SHA }} \
-f "target_url=$GITHUB_RUN_URL" -f "state=${{ env.STATUS }}" -f "description=Slow CI job" -f "context=pytest/custom-tests"
"repos/${github_repository}/issues/${pr_number}/comments" \
-F body=@comment_body.txt
- name: Update PR commit statuses
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_RUN_URL: https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }}
github_repository: ${{ github.repository }}
pr_head_sha: ${{ needs.check-timestamps.outputs.PR_HEAD_SHA }}
# The env. variable `STATUS` used here is set in the previous step
run: |
gh api \
--method POST \
-H "Accept: application/vnd.github+json" \
-H "X-GitHub-Api-Version: 2022-11-28" \
"repos/${github_repository}/statuses/${pr_head_sha}" \
-f "target_url=$GITHUB_RUN_URL" -f "state=$STATUS" -f "description=Slow CI job" -f "context=pytest/custom-tests"

View File

@ -51,6 +51,7 @@ jobs:
slack_report_channel: "#transformers-ci-past-future"
docker: huggingface/transformers-all-latest-torch-nightly-gpu
ci_event: Nightly CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci_with_torch_nightly
commit_sha: ${{ github.event.workflow_run.head_sha || github.sha }}
secrets: inherit

View File

@ -1,25 +0,0 @@
name: Self-hosted runner (AMD mi210 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi210
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi210
secrets: inherit

View File

@ -1,25 +0,0 @@
name: Self-hosted runner (AMD mi250 CI caller)
on:
#workflow_run:
# workflows: ["Self-hosted runner (push-caller)"]
# branches: ["main"]
# types: [completed]
push:
branches:
- run_amd_push_ci_caller*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
run_amd_ci:
name: AMD mi250
if: (cancelled() != true) && ((github.event_name == 'workflow_run') || ((github.event_name == 'push') && startsWith(github.ref_name, 'run_amd_push_ci_caller')))
uses: ./.github/workflows/self-push-amd.yml
with:
gpu_flavor: mi250
secrets: inherit

View File

@ -1,334 +0,0 @@
name: Self-hosted runner AMD GPU (push)
on:
workflow_call:
inputs:
gpu_flavor:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
jobs:
check_runner_status:
name: Check Runner Status
runs-on: ubuntu-22.04
steps:
- name: Checkout transformers
uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Check Runner Status
run: python utils/check_self_hosted_runner.py --target_runners amd-mi210-single-gpu-ci-runner-docker --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
check_runners:
name: Check Runners
needs: check_runner_status
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
setup_gpu:
name: Setup
needs: check_runners
strategy:
matrix:
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
env:
# `CI_BRANCH_PUSH`: The branch name from the push event
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
# `CI_SHA_PUSH`: The commit SHA from the push event
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Fetch the tests to run
working-directory: /transformers
# TODO: add `git-python` in the docker images
run: |
pip install --upgrade git-python
python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v4
with:
name: test_fetched
path: /transformers/test_preparation.txt
- id: set-matrix
name: Organize tests into models
working-directory: /transformers
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "matrix=$keys" >> $GITHUB_OUTPUT
echo "test_map=$test_map" >> $GITHUB_OUTPUT
run_models_gpu:
name: Model tests
needs: setup_gpu
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup_gpu.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup_gpu.outputs.matrix) }}
machine_type: [single-gpu, multi-gpu]
runs-on: [self-hosted, amd-gpu, '${{ matrix.machine_type }}', '${{ inputs.gpu_flavor }}']
container:
image: huggingface/transformers-pytorch-amd-gpu-push-ci # <--- We test only for PyTorch for now
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports ${{ fromJson(needs.setup_gpu.outputs.test_map)[matrix.folders] }} -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ matrix.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
needs: [
check_runner_status,
check_runners,
setup_gpu,
run_models_gpu,
# run_tests_torch_cuda_extensions_single_gpu,
# run_tests_torch_cuda_extensions_multi_gpu
]
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Runner availability: ${{ needs.check_runner_status.result }}"
echo "Setup status: ${{ needs.setup_gpu.result }}"
echo "Runner status: ${{ needs.check_runners.result }}"
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- uses: actions/checkout@v4
# To avoid failure when multiple commits are merged into `main` in a short period of time.
# Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ...
# (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit)
with:
fetch-depth: 20
- name: Update clone using environment variables
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- uses: actions/download-artifact@v4
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_ID_AMD: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID_AMD }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: Push CI (AMD) - ${{ inputs.gpu_flavor }}
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
RUNNER_STATUS: ${{ needs.check_runner_status.result }}
RUNNER_ENV_STATUS: ${{ needs.check_runners.result }}
SETUP_STATUS: ${{ needs.setup_gpu.result }}
# We pass `needs.setup_gpu.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ needs.setup_gpu.outputs.matrix }}"

View File

@ -1,54 +0,0 @@
# Used to trigger self-push CI
name: Self-hosted runner (push-caller)
on:
push:
branches:
- main
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
jobs:
check-for-setup:
runs-on: ubuntu-22.04
name: Check if setup was changed
outputs:
changed: ${{ steps.was_changed.outputs.changed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: "2"
- name: Get changed files
id: changed-files
uses: tj-actions/changed-files@1c8e6069583811afb28f97afeaf8e7da80c6be5c
- name: Was setup changed
id: was_changed
run: |
for file in ${{ steps.changed-files.outputs.all_changed_files }}; do
if [ `basename "${file}"` = "setup.py" ]; then
echo "changed=1" >> $GITHUB_OUTPUT
fi
done
build-docker-containers:
needs: check-for-setup
if: (github.event_name == 'push') && (needs.check-for-setup.outputs.changed == '1')
uses: ./.github/workflows/build-docker-images.yml
with:
image_postfix: "-push-ci"
secrets: inherit
run_push_ci:
name: Trigger Push CI
runs-on: ubuntu-22.04
if: ${{ always() }}
needs: build-docker-containers
steps:
- name: Trigger push CI via workflow_run
run: echo "Trigger push CI via workflow_run"

View File

@ -1,652 +0,0 @@
name: Self-hosted runner (push)
on:
workflow_run:
workflows: ["Self-hosted runner (push-caller)"]
branches: ["main"]
types: [completed]
push:
branches:
- ci_*
- ci-*
paths:
- "src/**"
- "tests/**"
- ".github/**"
- "templates/**"
- "utils/**"
repository_dispatch:
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
PYTEST_TIMEOUT: 60
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
setup:
name: Setup
strategy:
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
test_map: ${{ steps.set-matrix.outputs.test_map }}
env:
# `CI_BRANCH_PUSH`: The branch name from the push event
# `CI_BRANCH_WORKFLOW_RUN`: The name of the branch on which this workflow is triggered by `workflow_run` event
# `CI_SHA_PUSH`: The commit SHA from the push event
# `CI_SHA_WORKFLOW_RUN`: The commit SHA that triggers this workflow by `workflow_run` event
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# `CI_BRANCH`: The non-empty branch name from the above two (one and only one of them is empty)
# `CI_SHA`: The non-empty commit SHA from the above two (one and only one of them is empty)
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Cleanup
working-directory: /transformers
run: |
rm -rf tests/__pycache__
rm -rf tests/models/__pycache__
rm -rf reports
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Fetch the tests to run
working-directory: /transformers
# TODO: add `git-python` in the docker images
run: |
pip install --upgrade git-python
python3 utils/tests_fetcher.py --diff_with_last_commit | tee test_preparation.txt
- name: Report fetched tests
uses: actions/upload-artifact@v4
with:
name: test_fetched
path: /transformers/test_preparation.txt
- id: set-matrix
name: Organize tests into models
working-directory: /transformers
# The `keys` is used as GitHub actions matrix for jobs, i.e. `models/bert`, `tokenization`, `pipeline`, etc.
# The `test_map` is used to get the actual identified test files under each key.
# If no test to run (so no `test_map.json` file), create a dummy map (empty matrix will fail)
run: |
if [ -f test_map.json ]; then
keys=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); d = list(test_map.keys()); print(d)')
test_map=$(python3 -c 'import json; fp = open("test_map.json"); test_map = json.load(fp); fp.close(); print(test_map)')
else
keys=$(python3 -c 'keys = ["dummy"]; print(keys)')
test_map=$(python3 -c 'test_map = {"dummy": []}; print(test_map)')
fi
echo $keys
echo $test_map
echo "matrix=$keys" >> $GITHUB_OUTPUT
echo "test_map=$test_map" >> $GITHUB_OUTPUT
run_tests_single_gpu:
name: Model tests
needs: setup
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_multi_gpu:
name: Model tests
needs: setup
# `dummy` means there is no test to run
if: contains(fromJson(needs.setup.outputs.matrix), 'dummy') != true
strategy:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.matrix) }}
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-all-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
echo "${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
env:
MKL_SERVICE_FORCE_INTEL: 1
working-directory: /transformers
run: |
python3 -m pytest -n 2 --dist=loadfile -v --make-reports=${{ env.machine_type }}_tests_gpu_${{ matrix.folders }} ${{ fromJson(needs.setup.outputs.test_map)[matrix.folders] }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_all_tests_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_tests_gpu_${{ matrix.folders }}
run_tests_torch_cuda_extensions_single_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
run_tests_torch_cuda_extensions_multi_gpu:
name: Torch CUDA extension tests
needs: setup
if: contains(fromJson(needs.setup.outputs.matrix), 'deepspeed') || contains(fromJson(needs.setup.outputs.matrix), 'extended')
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-12xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-deepspeed-latest-gpu-push-ci
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- name: Set `machine_type` for report and artifact names
working-directory: /workspace/transformers
shell: bash
run: |
echo "${{ matrix.machine_type }}"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Update clone using environment variables
working-directory: /workspace/transformers
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /workspace/transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Remove cached torch extensions
run: rm -rf /github/home/.cache/torch_extensions/
# To avoid unknown test failures
- name: Pre build DeepSpeed *again*
working-directory: /workspace
run: |
python3 -m pip uninstall -y deepspeed
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
nvidia-smi
- name: Environment
working-directory: /workspace/transformers
run: |
python utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /workspace/transformers
run: pip freeze
- name: Run all non-slow selected tests on GPU
working-directory: /workspace/transformers
# TODO: Here we pass all tests in the 2 folders for simplicity. It's better to pass only the identified tests.
run: |
python -m pytest -n 1 --dist=loadfile -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
path: /workspace/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports
send_results:
name: Send results to webhook
runs-on: ubuntu-22.04
if: always()
needs: [
setup,
run_tests_single_gpu,
run_tests_multi_gpu,
run_tests_torch_cuda_extensions_single_gpu,
run_tests_torch_cuda_extensions_multi_gpu
]
env:
# For the meaning of these environment variables, see the job `Setup`
CI_BRANCH_PUSH: ${{ github.event.ref }}
CI_BRANCH_WORKFLOW_RUN: ${{ github.event.workflow_run.head_branch }}
CI_SHA_PUSH: ${{ github.event.head_commit.id }}
CI_SHA_WORKFLOW_RUN: ${{ github.event.workflow_run.head_sha }}
steps:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
echo "Setup status: ${{ needs.setup.result }}"
# Necessary to get the correct branch name and commit SHA for `workflow_run` event
# We also take into account the `push` event (we might want to test some changes in a branch)
- name: Prepare custom environment variables
shell: bash
# For the meaning of these environment variables, see the job `Setup`
run: |
CI_BRANCH_PUSH=${CI_BRANCH_PUSH/'refs/heads/'/''}
echo $CI_BRANCH_PUSH
echo $CI_BRANCH_WORKFLOW_RUN
echo $CI_SHA_PUSH
echo $CI_SHA_WORKFLOW_RUN
[[ ! -z "$CI_BRANCH_PUSH" ]] && echo "CI_BRANCH=$CI_BRANCH_PUSH" >> $GITHUB_ENV || echo "CI_BRANCH=$CI_BRANCH_WORKFLOW_RUN" >> $GITHUB_ENV
[[ ! -z "$CI_SHA_PUSH" ]] && echo "CI_SHA=$CI_SHA_PUSH" >> $GITHUB_ENV || echo "CI_SHA=$CI_SHA_WORKFLOW_RUN" >> $GITHUB_ENV
- name: print environment variables
run: |
echo "env.CI_BRANCH = ${{ env.CI_BRANCH }}"
echo "env.CI_SHA = ${{ env.CI_SHA }}"
- uses: actions/checkout@v4
# To avoid failure when multiple commits are merged into `main` in a short period of time.
# Checking out to an old commit beyond the fetch depth will get an error `fatal: reference is not a tree: ...
# (Only required for `workflow_run` event, where we get the latest HEAD on `main` instead of the event commit)
with:
fetch-depth: 20
- name: Update clone using environment variables
run: |
echo "original branch = $(git branch --show-current)"
git fetch && git checkout ${{ env.CI_BRANCH }}
echo "updated branch = $(git branch --show-current)"
git checkout ${{ env.CI_SHA }}
echo "log = $(git log -n 1)"
- uses: actions/download-artifact@v4
- name: Send message to Slack
env:
CI_SLACK_BOT_TOKEN: ${{ secrets.CI_SLACK_BOT_TOKEN }}
CI_SLACK_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
CI_SLACK_CHANNEL_ID_DAILY: ${{ secrets.CI_SLACK_CHANNEL_ID_DAILY }}
CI_SLACK_CHANNEL_DUMMY_TESTS: ${{ secrets.CI_SLACK_CHANNEL_DUMMY_TESTS }}
CI_SLACK_REPORT_CHANNEL_ID: ${{ secrets.CI_SLACK_CHANNEL_ID }}
ACCESS_REPO_INFO_TOKEN: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
CI_EVENT: push
CI_TITLE_PUSH: ${{ github.event.head_commit.message }}
CI_TITLE_WORKFLOW_RUN: ${{ github.event.workflow_run.head_commit.message }}
CI_SHA: ${{ env.CI_SHA }}
SETUP_STATUS: ${{ needs.setup.result }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
run: |
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
python utils/notification_service.py "${{ needs.setup.outputs.matrix }}"

View File

@ -2,7 +2,7 @@ name: Self-hosted runner (AMD scheduled CI caller)
on:
schedule:
- cron: "17 2 * * *"
- cron: "17 5 * * *"
jobs:
run_scheduled_amd_ci:

View File

@ -21,7 +21,7 @@ jobs:
job: run_models_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
@ -33,7 +33,7 @@ jobs:
job: run_pipelines_torch_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit
@ -45,7 +45,7 @@ jobs:
job: run_examples_gpu
slack_report_channel: "#amd-hf-ci"
runner_group: hfc-amd-mi355
docker: huggingface/testing-rocm7.0-preview
docker: huggingface/transformers-pytorch-amd-gpu
ci_event: Scheduled CI (AMD) - mi355
report_repo_id: hf-transformers-bot/transformers-ci-dummy
secrets: inherit

View File

@ -6,7 +6,7 @@ on:
- cron: "17 2 * * *"
push:
branches:
- run_nvidia_ci*
- check_cleanup_workflow
workflow_dispatch:
inputs:
prev_workflow_run_id:
@ -23,7 +23,7 @@ on:
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
prev_workflow_run_id: "19056134459"
other_workflow_run_id: ""
@ -33,10 +33,13 @@ jobs:
runs-on: ubuntu-22.04
steps:
- name: Setup
env:
prev_workflow_run_id: ${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}
other_workflow_run_id: ${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
echo "$prev_workflow_run_id" > "setup_values/prev_workflow_run_id.txt"
echo "$other_workflow_run_id" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
@ -49,72 +52,10 @@ jobs:
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-models"
slack_report_channel: "#transformers-ci-dummy"
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
torch-pipeline:
name: Torch pipeline CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-torch"
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-examples"
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
trainer-fsdp-ci:
name: Trainer/FSDP CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_trainer_and_fsdp_gpu
slack_report_channel: "#transformers-ci-daily-training"
docker: huggingface/transformers-all-latest-gpu
runner_type: "a10"
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-training"
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Daily CI
working-directory-prefix: /workspace
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit
quantization-ci:
name: Quantization CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-daily-quantization"
docker: huggingface/transformers-quantization-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
commit_sha: ${{ github.sha }}
secrets: inherit

View File

@ -0,0 +1,60 @@
name: Nvidia CI - Flash Attn
on:
repository_dispatch:
schedule:
- cron: "17 2 * * *"
push:
branches:
- run_nvidia_ci_flash_attn*
workflow_dispatch:
inputs:
prev_workflow_run_id:
description: 'previous workflow run id to compare'
type: string
required: false
default: ""
other_workflow_run_id:
description: 'other workflow run id to compare'
type: string
required: false
default: ""
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
jobs:
setup:
name: Setup
runs-on: ubuntu-22.04
steps:
- name: Setup
run: |
mkdir "setup_values"
echo "${{ inputs.prev_workflow_run_id || env.prev_workflow_run_id }}" > "setup_values/prev_workflow_run_id.txt"
echo "${{ inputs.other_workflow_run_id || env.other_workflow_run_id }}" > "setup_values/other_workflow_run_id.txt"
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: setup_values
path: setup_values
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled.yml
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-flash-attn"
docker: huggingface/transformers-all-latest-gpu:flash-attn
ci_event: Daily CI
runner_type: "a10"
report_repo_id: hf-internal-testing/transformers_flash_attn_ci
commit_sha: ${{ github.sha }}
pytest_marker: "flash_attn_test or flash_attn_3_test"
secrets: inherit

View File

@ -26,7 +26,6 @@ env:
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:

View File

@ -34,10 +34,20 @@ on:
runner_type:
required: false
type: string
models:
subdirs:
default: ""
required: false
type: string
pytest_marker:
required: false
type: string
pr_number:
required: false
type: string
outputs:
report:
description: "Content of the report of new failures"
value: ${{ jobs.check_new_failures.outputs.report }}
env:
HF_HOME: /mnt/cache
@ -48,10 +58,8 @@ env:
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
NUM_SLICES: 2
jobs:
setup:
@ -59,7 +67,7 @@ jobs:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu", "run_quantization_torch_gpu"]'), inputs.job)
strategy:
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -72,8 +80,11 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: |
git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
git fetch origin $commit_sha
git fetch && git checkout $commit_sha
- name: Cleanup
working-directory: /transformers
@ -90,11 +101,17 @@ jobs:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: /transformers/tests
env:
job: ${{ inputs.job }}
subdirs: ${{ inputs.subdirs }}
NUM_SLICES: 2
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --models '${{ inputs.models }}' --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
if [ "$job" = "run_models_gpu" ]; then
python3 ../utils/split_model_tests.py --subdirs "$subdirs" --num_splits "$NUM_SLICES" > folder_slices.txt
echo "folder_slices=$(cat folder_slices.txt)" >> $GITHUB_OUTPUT
python3 -c "import ast; folder_slices = ast.literal_eval(open('folder_slices.txt').read()); open('slice_ids.txt', 'w').write(str(list(range(len(folder_slices)))))"
echo "slice_ids=$(cat slice_ids.txt)" >> $GITHUB_OUTPUT
elif [ "$job" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
@ -103,8 +120,10 @@ jobs:
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
name: Identify quantization method to test
working-directory: /transformers/tests
env:
subdirs: ${{ inputs.subdirs || 'None' }}
run: |
echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT
echo "quantization_matrix=$(python3 -c 'import ast; import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); subdirs = ast.literal_eval(os.environ["subdirs"]); quantization_tests = [x.removeprefix("quantization/") for x in subdirs] if subdirs is not None else quantization_tests; d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))); print(d)')" >> $GITHUB_OUTPUT
- name: NVIDIA-SMI
run: |
@ -117,7 +136,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
@ -128,6 +147,7 @@ jobs:
commit_sha: ${{ inputs.commit_sha || github.sha }}
runner_type: ${{ inputs.runner_type }}
report_repo_id: ${{ inputs.report_repo_id }}
pytest_marker: ${{ inputs.pytest_marker }}
secrets: inherit
run_trainer_and_fsdp_gpu:
@ -137,7 +157,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
slice_id: [0, 1]
uses: ./.github/workflows/model_jobs.yml
with:
@ -157,16 +177,18 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: huggingface/transformers-pytorch-gpu
image: huggingface/transformers-all-latest-gpu
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -188,15 +210,17 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "${{ matrix.machine_type }}"
echo "$matrix_machine_type"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
@ -205,12 +229,12 @@ jobs:
- name: Run all pipeline tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -n 1 -v --dist=loadfile --make-reports=${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports tests/pipelines
python3 -m pytest -n 1 -v --dist=loadfile --make-reports="${machine_type}_run_pipelines_torch_gpu_test_reports" tests/pipelines
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports/failures_short.txt
run: cat "/transformers/reports/${machine_type}_run_pipelines_torch_gpu_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_torch_gpu_test_reports"
if: ${{ always() }}
@ -234,7 +258,9 @@ jobs:
steps:
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -256,15 +282,17 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "${{ matrix.machine_type }}"
echo "$matrix_machine_type"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
@ -274,12 +302,12 @@ jobs:
working-directory: /transformers
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_examples_gpu_test_reports examples/pytorch
python3 -m pytest -v --make-reports="${machine_type}_run_examples_gpu_test_reports" examples/pytorch
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
run: cat "/transformers/reports/${machine_type}_run_examples_gpu_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
@ -294,7 +322,7 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -303,7 +331,9 @@ jobs:
steps:
- name: Update clone
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: ${{ inputs.working-directory-prefix }}/transformers
@ -325,7 +355,7 @@ jobs:
working-directory: ${{ inputs.working-directory-prefix }}/
run: |
python3 -m pip uninstall -y deepspeed
DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
DS_DISABLE_NINJA=1 DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
# To avoid unknown test failures
- name: Pre build DeepSpeed *again* (for nightly & Past CI)
@ -335,7 +365,7 @@ jobs:
python3 -m pip uninstall -y deepspeed
rm -rf DeepSpeed
git clone https://github.com/deepspeedai/DeepSpeed && cd DeepSpeed && rm -rf build
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check
DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install . --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check
- name: NVIDIA-SMI
run: |
@ -353,15 +383,17 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: ${{ inputs.working-directory-prefix }}/transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "${{ matrix.machine_type }}"
echo "$matrix_machine_type"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
@ -370,12 +402,14 @@ jobs:
- name: Run all tests on GPU
working-directory: ${{ inputs.working-directory-prefix }}/transformers
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports tests/deepspeed tests/extended
python3 -m pytest -v --make-reports="${machine_type}_run_torch_cuda_extensions_gpu_test_reports" tests/deepspeed tests/extended
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat ${{ inputs.working-directory-prefix }}/transformers/reports/${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt
env:
working_directory_prefix: ${{ inputs.working-directory-prefix }}
run: cat "${working_directory_prefix}/transformers/reports/${machine_type}_run_torch_cuda_extensions_gpu_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_torch_cuda_extensions_gpu_test_reports"
if: ${{ always() }}
@ -393,7 +427,7 @@ jobs:
fail-fast: false
matrix:
folders: ${{ fromJson(needs.setup.outputs.quantization_matrix) }}
machine_type: [aws-g5-4xlarge-cache, aws-g5-12xlarge-cache]
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
@ -402,16 +436,19 @@ jobs:
steps:
- name: Echo folder ${{ matrix.folders }}
shell: bash
env:
matrix_folders_raw: ${{ matrix.folders }}
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'quantization/'/'quantization_'}
echo "$matrix_folders_raw"
matrix_folders="${matrix_folders_raw/'quantization/'/'quantization_'}"
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ inputs.commit_sha || github.sha }}
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
@ -433,15 +470,17 @@ jobs:
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "${{ matrix.machine_type }}"
echo "$matrix_machine_type"
if [ "${{ matrix.machine_type }}" = "aws-g5-4xlarge-cache" ]; then
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "aws-g5-12xlarge-cache" ]; then
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
@ -449,20 +488,96 @@ jobs:
- name: Run quantization tests on GPU
working-directory: /transformers
env:
folders: ${{ matrix.folders }}
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
python3 -m pytest -v --make-reports="${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports" tests/${folders}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
run: cat "/transformers/reports/${machine_type}_run_quantization_torch_gpu_${matrix_folders}_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ matrix.folders }}_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_quantization_torch_gpu_${{ env.matrix_folders }}_test_reports
run_kernels_gpu:
if: ${{ inputs.job == 'run_kernels_gpu' }}
name: Kernel tests
strategy:
fail-fast: false
matrix:
machine_type: [aws-g5-4xlarge-cache]
runs-on:
group: '${{ matrix.machine_type }}'
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ inputs.commit_sha || github.sha }}
run: git fetch && git checkout "$commit_sha"
- name: Reinstall transformers in edit mode
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .[testing]
- name: Install kernels
working-directory: /transformers
run: python3 -m pip install -U kernels
- name: NVIDIA-SMI
run: nvidia-smi
- name: Environment
working-directory: /transformers
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Set `machine_type` for report and artifact names
working-directory: /transformers
shell: bash
env:
matrix_machine_type: ${{ matrix.machine_type }}
run: |
echo "$matrix_machine_type"
if [ "$matrix_machine_type" = "aws-g5-4xlarge-cache" ]; then
machine_type=single-gpu
elif [ "$matrix_machine_type" = "aws-g5-12xlarge-cache" ]; then
machine_type=multi-gpu
else
machine_type="$matrix_machine_type"
fi
echo "$machine_type"
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run kernel tests on GPU
working-directory: /transformers
run: |
python3 -m pytest -v --make-reports="${machine_type}_run_kernels_gpu_test_reports" tests/kernels/test_kernels.py
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat "/transformers/reports/${machine_type}_run_kernels_gpu_test_reports/failures_short.txt"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_kernels_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_kernels_gpu_test_reports
path: /transformers/reports/${{ env.machine_type }}_run_kernels_gpu_test_reports
run_extract_warnings:
# Let's only do this for the job `run_models_gpu` to simplify the (already complex) logic.
@ -495,9 +610,12 @@ jobs:
working-directory: warnings_in_ci
- name: Extract warnings in CI artifacts
env:
github_run_id: ${{ github.run_id }}
access_token: ${{ secrets.ACCESS_REPO_INFO_TOKEN }}
run: |
python3 utils/extract_warnings.py --workflow_run_id ${{ github.run_id }} --output_dir warnings_in_ci --token ${{ secrets.ACCESS_REPO_INFO_TOKEN }} --from_gh
echo "$(python3 -c 'import os; import json; fp = open("warnings_in_ci/selected_warnings.json"); d = json.load(fp); d = "\n".join(d) ;print(d)')"
python3 utils/extract_warnings.py --workflow_run_id "$github_run_id" --output_dir warnings_in_ci --token "$access_token" --from_gh
echo "$(python3 -c 'import os; import json; fp = open("warnings_in_ci/selected_warnings.json"); d = json.load(fp); d = "\n".join(d); print(d)')"
- name: Upload artifact
if: ${{ always() }}
@ -516,6 +634,7 @@ jobs:
run_examples_gpu,
run_torch_cuda_extensions_gpu,
run_quantization_torch_gpu,
run_kernels_gpu,
run_extract_warnings
]
if: always() && !cancelled()
@ -535,16 +654,17 @@ jobs:
secrets: inherit
check_new_failures:
if: ${{ always() && inputs.ci_event == 'Daily CI' && needs.send_results.result == 'success' }}
if: ${{ always() && needs.send_results.result == 'success' }}
name: Check new failures
needs: send_results
uses: ./.github/workflows/check_failed_tests.yml
with:
docker: ${{ inputs.docker }}
start_sha: ${{ inputs.commit_sha || github.sha }}
commit_sha: ${{ inputs.commit_sha || github.sha }}
job: ${{ inputs.job }}
slack_report_channel: ${{ inputs.slack_report_channel }}
ci_event: ${{ inputs.ci_event }}
report_repo_id: ${{ inputs.report_repo_id }}
pr_number: ${{ inputs.pr_number }}
secrets: inherit

View File

@ -41,8 +41,10 @@ jobs:
- name: Preliminary job status
shell: bash
# For the meaning of these environment variables, see the job `Setup`
env:
setup_status: ${{ inputs.setup_status }}
run: |
echo "Setup status: ${{ inputs.setup_status }}"
echo "Setup status: $setup_status"
- uses: actions/checkout@v4
with:
@ -81,6 +83,8 @@ jobs:
CI_TEST_JOB: ${{ inputs.job }}
SETUP_STATUS: ${{ inputs.setup_status }}
REPORT_REPO_ID: ${{ inputs.report_repo_id }}
quantization_matrix: ${{ inputs.quantization_matrix }}
folder_slices: ${{ inputs.folder_slices }}
# We pass `needs.setup.outputs.matrix` as the argument. A processing in `notification_service.py` to change
# `models/bert` to `models_bert` is required, as the artifact names use `_` instead of `/`.
# For a job that doesn't depend on (i.e. `needs`) `setup`, the value for `inputs.folder_slices` would be an
@ -89,10 +93,10 @@ jobs:
pip install huggingface_hub
pip install slack_sdk
pip show slack_sdk
if [ "${{ inputs.quantization_matrix }}" != "" ]; then
python utils/notification_service.py "${{ inputs.quantization_matrix }}"
if [ "$quantization_matrix" != "" ]; then
python utils/notification_service.py "$quantization_matrix"
else
python utils/notification_service.py "${{ inputs.folder_slices }}"
python utils/notification_service.py "$folder_slices"
fi
# Upload complete failure tables, as they might be big and only truncated versions could be sent to Slack.

View File

@ -4,7 +4,7 @@ on:
workflow_dispatch:
inputs:
runner_type:
description: 'Type of runner to test (a10 or t4)'
description: 'Type of runner to test (a10)'
required: true
docker_image:
description: 'Name of the Docker image'
@ -20,7 +20,6 @@ env:
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes # For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
@ -33,15 +32,14 @@ jobs:
steps:
- name: Get runner to use
shell: bash
env:
NUM_GPUS: ${{ github.event.inputs.num_gpus }}
RUNNER_TYPE: ${{ github.event.inputs.runner_type }}
run: |
if [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
echo "RUNNER=aws-g4dn-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "t4" ]]; then
echo "RUNNER=aws-g4dn-12xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "single" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache" >> $GITHUB_ENV
elif [[ "${{ github.event.inputs.num_gpus }}" == "multi" && "${{ github.event.inputs.runner_type }}" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache" >> $GITHUB_ENV
if [[ "$NUM_GPUS" == "single" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-4xlarge-cache-ssh" >> $GITHUB_ENV
elif [[ "$NUM_GPUS" == "multi" && "$RUNNER_TYPE" == "a10" ]]; then
echo "RUNNER=aws-g5-12xlarge-cache-ssh" >> $GITHUB_ENV
else
echo "RUNNER=" >> $GITHUB_ENV
fi
@ -49,8 +47,8 @@ jobs:
- name: Set runner to use
id: set_runner
run: |
echo ${{ env.RUNNER }}
echo "RUNNER=${{ env.RUNNER }}" >> $GITHUB_OUTPUT
echo "$RUNNER"
echo "RUNNER=$RUNNER" >> $GITHUB_OUTPUT
ssh_runner:
name: "SSH"
@ -59,13 +57,13 @@ jobs:
group: ${{ needs.get_runner.outputs.RUNNER }}
container:
image: ${{ github.event.inputs.docker_image }}
options: --gpus all --privileged --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Update clone
working-directory: /transformers
env:
commit_sha: ${{ github.sha }}
run: |
git fetch && git checkout ${{ github.sha }}
git fetch && git checkout "$commit_sha"
- name: Cleanup
working-directory: /transformers
@ -85,9 +83,11 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
GITHUB_ACTOR: ${{ github.actor }}
run: |
echo "${{ github.actor }}"
github_actor=${{ github.actor }}
echo "$GITHUB_ACTOR"
github_actor=$GITHUB_ACTOR
github_actor=${github_actor/'-'/'_'}
echo "$github_actor"
echo "github_actor=$github_actor" >> $GITHUB_ENV
@ -95,14 +95,17 @@ jobs:
- name: Store Slack infos
#because the SSH can be enabled dynamically if the workflow failed, so we need to store slack infos to be able to retrieve them during the waitforssh step
shell: bash
env:
user_slack_id: ${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}
default_slack_channel: ${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}
run: |
echo "${{ env.github_actor }}"
if [ "${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}" != "" ]; then
echo "SLACKCHANNEL=${{ secrets[format('{0}_{1}', env.github_actor, 'SLACK_ID')] }}" >> $GITHUB_ENV
echo "$github_actor"
if [ "$user_slack_id" != "" ]; then
echo "SLACKCHANNEL=$user_slack_id" >> $GITHUB_ENV
else
echo "SLACKCHANNEL=${{ secrets.SLACK_CIFEEDBACK_CHANNEL }}" >> $GITHUB_ENV
echo "SLACKCHANNEL=$default_slack_channel" >> $GITHUB_ENV
fi
- name: Tailscale # In order to be able to SSH when a test fails
uses: huggingface/tailscale-action@main
with:

4
.gitignore vendored
View File

@ -98,6 +98,7 @@ celerybeat-schedule
# Environments
.env
.venv
.venv*
env/
venv/
ENV/
@ -171,3 +172,6 @@ tags
# modular conversion
*.modular_backup
# Cursor IDE files
.cursor/

View File

@ -14,7 +14,7 @@ This AGENTS.md file provides guidance for code agents working with this codebase
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
- Code style is enforced in the CI. You can install the style tools with `pip install -e ".[quality]"`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
@ -36,4 +36,4 @@ After making changes, you should usually run `make fixup` to ensure any copies a
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e ".[testing]"`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.

View File

@ -112,7 +112,125 @@ New models are constantly released and if you want to implement a new model, ple
If you are willing to contribute the model yourself, let us know so we can help you add it to 🤗 Transformers!
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/add_new_model).
We have a technical guide for [how to add a model to 🤗 Transformers](https://huggingface.co/docs/transformers/modular_transformers).
### Vision-Language Model Contribution Checklist
If you're contributing a **vision-language model** (or any multimodal model that processes images/videos), please follow this checklist. Maintainers will use this to review your PR, and completing these steps will significantly increase the likelihood of your PR being merged quickly.
**Required checklist for all vision-language model contributions:**
**1. Implement a modular file**
All new models should use the modular architecture pattern. Create a `modular_<model_name>.py` file using the modular model converter:
- Use the CLI, [`transformers add-new-model-like`](https://github.com/huggingface/transformers/blob/main/src/transformers/cli/add_new_model_like.py) to generate a modular skeleton and get started
- All code should be in the modular file if possible. Modeling must be in it, it's better if configuration is in it as well.
- Reuse existing patterns from similar models as much as possible
To verify your modular file is correct, run:
```bash
python utils/modular_model_converter.py <model_name>
```
This will generate the separate files (`modeling_*.py`, `configuration_*.py`, etc.) from your modular file. The CI will enforce that these generated files match your modular file.
**2. Add a fast image processor (for image models)**
If your model processes images, implement a fast image processor that uses `torch` and `torchvision` instead of PIL/numpy for better inference performance:
- See the detailed guide in [#36978](https://github.com/huggingface/transformers/issues/36978)
- Fast processors inherit from `BaseImageProcessorFast`
- Examples: `LlavaOnevisionImageProcessorFast`, `Idefics2ImageProcessorFast`
**3. Create a weight conversion script**
Add a `convert_<model_name>_to_hf.py` script that converts the original model weights to the HuggingFace format:
- Script should handle checkpoint loading, key mapping, and saving in HF format
- Include usage examples and documentation in the script
- Examples: [`convert_llava_onevision_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_onevision/convert_llava_onevision_weights_to_hf.py), [`convert_idefics2_weights_to_hf.py`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/idefics2/convert_idefics2_weights_to_hf.py)
**4. Add integration tests with exact output matching**
At minimum, add an `IntegrationTest` class that tests end-to-end generation (processing and modelling) with **exact** output matching:
- For generative models: test that generated text matches expected output exactly
- For non-generative models: test that output logits match expected values
- Tests should use real checkpoints (load in 4-bit or half precision if the checkpoint is too big to fit in our CI runners) and real inputs
- Example pattern:
```python
class MyModelIntegrationTest(unittest.TestCase):
@slow
def test_model_integration(self):
model = MyModelForConditionalGeneration.from_pretrained("org/model-name")
processor = AutoProcessor.from_pretrained("org/model-name")
inputs = processor(images=image, text=prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=20)
EXPECTED_TEXT = "exact expected output"
self.assertEqual(processor.decode(output[0]), EXPECTED_TEXT)
```
See `tests/models/llava_onevision/test_modeling_llava_onevision.py` for complete examples.
**5. Update documentation**
Add or update model documentation:
- Create if the cli hasn't `docs/source/en/model_doc/<model_name>.md` with usage examples
- Include model description, paper link, and basic usage with `Pipeline` and `AutoModel`
- Add the model to the appropriate TOC files
**6. Look for reusable patterns**
The library has 400+ models with many established patterns:
- Search for similar models (e.g., other vision-language models)
- Reuse attention mechanisms, layer implementations, and processing patterns
- Check models like LLaVA, Idefics2, Fuyu for vision-language patterns
- Use provided decorators like (`auto_docstring`, `can_return_tuple`, `check_model_inputs` and `_can_record_outputs`) where relevant.
- Don't reinvent the wheel
**7. Run quality checks and read the output**
Before submitting your PR, install quality dependencies and run the full check suite:
```bash
pip install -e ".[quality]"
make fixup
```
**Important**: Take time to read the output of `make fixup`. It will:
- Lint and format your code automatically
- Run consistency checks (imports, docstrings, etc.)
- Show any remaining issues that need manual fixes
All checks must pass before your PR can be merged.
**If this checklist is complete, your PR has a very high likelihood of being merged!** Following these steps makes the maintainers' work much easier and will reduce the number of review iterations, getting your important work out there faster.
#### Copy-pastable checklist for maintainers
Here's a condensed version maintainers can copy into PRs:
```markdown
## Multimodal Model Addition Checklist
Please ensure your PR completes all following items. See the [full checklist](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#vision-language-model-contribution-checklist) for details.
- [ ] **Modular file**: `modular_<model_name>.py` implemented and verified with `python utils/modular_model_converter.py <model_name>`
- [ ] **Fast image processor**: Implemented using `BaseImageProcessorFast` (see [#36978](https://github.com/huggingface/transformers/issues/36978))
- [ ] **Conversion script**: `convert_<model_name>_to_hf.py` added with usage examples
- [ ] **Integration tests**: End-to-end tests with exact output matching (text or logits)
- [ ] **Documentation**: Model docs added/updated in `docs/source/en/model_doc/`
- [ ] **Pattern reuse**: Verified against similar models (LLaVA, Idefics2, etc.)
- [ ] **Quality checks**: `make fixup` passes with no errors
```
## Do you want to add documentation?
@ -329,8 +447,11 @@ By default, slow tests are skipped but you can set the `RUN_SLOW` environment va
`yes` to run them. This will download many gigabytes of models so make sure you
have enough disk space, a good internet connection or a lot of patience!
> [!WARNING]
> Remember to specify a *path to a subfolder or a test file* to run the test. Otherwise, you'll run all the tests in the `tests` or `examples` folder, which will take a very long time!
<Tip warning={true}>
Remember to specify a *path to a subfolder or a test file* to run the test. Otherwise, you'll run all the tests in the `tests` or `examples` folder, which will take a very long time!
</Tip>
```bash
RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model

View File

@ -153,7 +153,7 @@ You are not required to read the following guidelines before opening an issue. H
cd examples/seq2seq
torchrun --nproc_per_node=2 ./finetune_trainer.py \
--model_name_or_path sshleifer/distill-mbart-en-ro-12-4 --data_dir wmt_en_ro \
--output_dir output_dir --overwrite_output_dir \
--output_dir output_dir \
--do_train --n_train 500 --num_train_epochs 1 \
--per_device_train_batch_size 1 --freeze_embeds \
--src_lang en_XX --tgt_lang ro_RO --task translation \

View File

@ -48,6 +48,7 @@ limitations under the License.
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_te.md">తెలుగు</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_fr.md">Français</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_de.md">Deutsch</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_it.md">Italiano</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_vi.md">Tiếng Việt</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ar.md">العربية</a> |
<a href="https://github.com/huggingface/transformers/blob/main/i18n/README_ur.md">اردو</a> |
@ -63,8 +64,8 @@ limitations under the License.
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/transformers_as_a_model_definition.png"/>
</h3>
Transformers acts as the model-definition framework for state-of-the-art machine learning models in text, computer
vision, audio, video, and multimodal model, for both inference and training.
Transformers acts as the model-definition framework for state-of-the-art machine learning with text, computer
vision, audio, video, and multimodal models, for both inference and training.
It centralizes the model definition so that this definition is agreed upon across the ecosystem. `transformers` is the
pivot across frameworks: if a model definition is supported, it will be compatible with the majority of training
@ -110,10 +111,10 @@ git clone https://github.com/huggingface/transformers.git
cd transformers
# pip
pip install .[torch]
pip install '.[torch]'
# uv
uv pip install .[torch]
uv pip install '.[torch]'
```
## Quickstart

View File

@ -9,6 +9,12 @@ In this list, we showcase incredibly impactful and novel projects that have push
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [◉ Universal Intelligence](https://github.com/blueraai/universal-intelligence)
[Universal Intelligence](https://github.com/blueraai/universal-intelligence) aims to standardize models, tools, and agents —transforming them into simple, composable, portable, interoperable, framework-agnostic, hardware-agnostic interfaces (through auto-negotiation and resource sharing); for fast and accessible development of AI applications.
Keywords: Protocol, Open-source, LLMs, Large Language Models, Agents, Low-code
## [gpt4all](https://github.com/nomic-ai/gpt4all)
[gpt4all](https://github.com/nomic-ai/gpt4all) is an ecosystem of open-source chatbots trained on massive collections of clean assistant data including code, stories and dialogue. It offers open-source, large language models such as LLaMA and GPT-J trained in an assistant-style.

View File

@ -16,7 +16,6 @@ import sys
from logging import Logger
from threading import Event, Thread
from time import perf_counter, sleep
from typing import Optional
# Add the parent directory to Python path to import benchmarks_entrypoint
@ -42,7 +41,7 @@ except ImportError:
GenerationConfig = None
StaticCache = None
os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["HF_XET_HIGH_PERFORMANCE"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
# Only set torch precision if torch is available
@ -145,7 +144,7 @@ def run_benchmark(
q = torch.empty_like(probs_sort).exponential_(1)
return torch.argmax(probs_sort / q, dim=-1, keepdim=True).to(dtype=torch.int)
def logits_to_probs(logits, temperature: float = 1.0, top_k: Optional[int] = None):
def logits_to_probs(logits, temperature: float = 1.0, top_k: int | None = None):
logits = logits / max(temperature, 1e-5)
if top_k is not None:
@ -155,7 +154,7 @@ def run_benchmark(
probs = torch.nn.functional.softmax(logits, dim=-1)
return probs
def sample(logits, temperature: float = 1.0, top_k: Optional[int] = None):
def sample(logits, temperature: float = 1.0, top_k: int | None = None):
probs = logits_to_probs(logits[0, -1], temperature, top_k)
idx_next = multinomial_sample_one_no_sync(probs)
return idx_next, probs

View File

@ -2,5 +2,5 @@ gpustat==1.1.1
psutil==6.0.0
psycopg2==2.9.9
torch>=2.4.0
hf_transfer
hf_xet
pandas>=1.5.0

View File

@ -1 +1,2 @@
benchmark_results/
benchmark_results/
benchmark_results_profiles/

View File

@ -1 +0,0 @@
# Benchmark implementations directory

View File

@ -1,165 +0,0 @@
# Copyright 2025 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
import os
from typing import Any
import torch
from benchmark_framework import ModelBenchmark
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")
class LLaMABenchmark(ModelBenchmark):
"""Simplified LLaMA model benchmark implementation using the ModelBenchmark base class."""
def __init__(self, logger: logging.Logger):
super().__init__(logger)
self._default_prompt = "Why dogs are so cute?" # Custom prompt for LLaMA
def get_scenario_configs(self) -> list[dict[str, Any]]:
"""
Get LLaMA-specific scenario configurations.
Returns:
List of scenario configuration dictionaries
"""
return [
# Eager variants
{"variant": "eager", "compile_mode": None, "use_cache": True, "description": "Eager execution with cache"},
# Compiled variants
{
"variant": "compiled",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Compiled with max autotune",
},
# Kernelized variant (if available)
{
"variant": "kernelized",
"compile_mode": "max-autotune",
"use_cache": True,
"description": "Kernelized execution",
},
]
def _is_kernelization_available(self) -> bool:
"""Check if kernelization is available for LLaMA."""
try:
from kernels import Mode, kernelize # noqa: F401
return True
except ImportError:
self.logger.debug("Kernelization not available: kernels module not found")
return False
def get_default_generation_config(self) -> dict[str, Any]:
"""Get LLaMA-specific generation configuration."""
return {
"do_sample": False,
"top_p": 1.0,
"temperature": 1.0,
"repetition_penalty": 1.0,
"max_new_tokens": None, # Will be set per scenario
}
def get_model_init_kwargs(self, config) -> dict[str, Any]:
"""Get LLaMA-specific model initialization kwargs."""
return {
"torch_dtype": getattr(torch, config.torch_dtype),
"attn_implementation": config.attn_implementation,
"use_cache": True,
}
def get_default_torch_dtype(self) -> str:
"""Get default torch dtype for LLaMA."""
return "float16" # LLaMA works well with float16
def get_default_device(self) -> str:
"""Get default device for LLaMA."""
return "cuda" # LLaMA prefers CUDA
def run_llama(logger, output_dir, **kwargs):
"""
Run LLaMA benchmark with the given configuration.
Args:
logger: Logger instance
output_dir: Output directory for results
**kwargs: Additional configuration options
Returns:
Path to output file if successful
"""
from benchmark_framework import BenchmarkRunner
# Extract parameters with defaults
model_id = kwargs.get("model_id", "meta-llama/Llama-2-7b-hf")
warmup_iterations = kwargs.get("warmup_iterations", 3)
measurement_iterations = kwargs.get("measurement_iterations", 5)
num_tokens_to_generate = kwargs.get("num_tokens_to_generate", 100)
include_sdpa_variants = kwargs.get("include_sdpa_variants", True)
device = kwargs.get("device", "cuda")
torch_dtype = kwargs.get("torch_dtype", "float16")
batch_size = kwargs.get("batch_size", 1)
commit_id = kwargs.get("commit_id")
logger.info(f"Starting LLaMA benchmark for model: {model_id}")
logger.info(
f"Configuration: warmup={warmup_iterations}, measurement={measurement_iterations}, tokens={num_tokens_to_generate}"
)
try:
# Create benchmark instance
benchmark = LLaMABenchmark(logger)
# Create scenarios
scenarios = benchmark.create_scenarios(
model_id=model_id,
warmup_iterations=warmup_iterations,
measurement_iterations=measurement_iterations,
num_tokens_to_generate=num_tokens_to_generate,
include_sdpa_variants=include_sdpa_variants,
device=device,
torch_dtype=torch_dtype,
batch_size=batch_size,
)
logger.info(f"Created {len(scenarios)} benchmark scenarios")
# Create runner and execute benchmarks
runner = BenchmarkRunner(logger, output_dir)
results = runner.run_benchmark(benchmark, scenarios, commit_id=commit_id)
if not results:
logger.warning("No successful benchmark results")
return None
# Save results
model_name = model_id.split("/")[-1] # Extract model name from ID
output_file = runner.save_results(model_name, results)
logger.info(f"LLaMA benchmark completed successfully. Results saved to: {output_file}")
return output_file
except Exception as e:
logger.error(f"LLaMA benchmark failed: {e}")
import traceback
logger.debug(traceback.format_exc())
raise

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,214 @@
import hashlib
import itertools
import json
import logging
from typing import Any
from transformers.utils.import_utils import is_flash_attn_2_available
KERNELIZATION_AVAILABLE = False
try:
from kernels import Mode, kernelize # noqa: F401
KERNELIZATION_AVAILABLE = True
except ImportError:
pass
logger = logging.getLogger(__name__)
class BenchmarkConfig:
"""Configuration for a single benchmark scenario."""
all_attn_implementations = [
("flash_attention_2", None),
("eager", None),
("sdpa", "math"),
("sdpa", "flash_attention"),
("flex_attention", None),
]
all_compiled_modes = [None, "default", "reduce-overhead", "max-autotune", "max-autotune-no-cudagraphs"]
def __init__(
self,
warmup_iterations: int = 5,
measurement_iterations: int = 20,
gpu_monitoring: bool = True, # NOTE: you may want to disable this at times as we have obsvered it could heavily slow down benchmarks on AMD
batch_size: int = 1,
sequence_length: int = 128,
num_tokens_to_generate: int = 128,
attn_implementation: str = "eager",
sdpa_backend: str | None = None,
compile_mode: str | None = None,
compile_options: dict[str, Any] | None = None,
kernelize: bool = False,
name: str | None = None,
skip_validity_check: bool = False,
) -> None:
# Benchmark parameters
self.warmup_iterations = warmup_iterations
self.measurement_iterations = measurement_iterations
self.gpu_monitoring = gpu_monitoring
# Input parameters
self.batch_size = batch_size
self.sequence_length = sequence_length
self.num_tokens_to_generate = num_tokens_to_generate
# Generation parameters
self.attn_implementation = attn_implementation
self.sdpa_backend = sdpa_backend
# Optimization parameters
self.compile_mode = compile_mode
self.compile_options = compile_options if compile_options is not None else {}
self.kernelize = kernelize
# Constant parameters
self.dtype = "torch.bfloat16"
self.device = "cuda"
self.check_validity(skip_validity_check)
self.name = name if name is not None else self.infer_name()
def check_validity(self, skip_validity_check: bool = False) -> None:
if skip_validity_check:
return
# Check FA is installed
if self.attn_implementation == "flash_attention_2" and not is_flash_attn_2_available():
logger.warning(
"Flash attention does not support compile mode. Defaulting to SDPA w/ flash attention backend."
)
self.attn_implementation = "sdpa"
self.sdpa_backend = "flash_attention"
# Flash attention does not support compile mode, so we turn it off # FIXME: it would be better to support it
is_fa = self.attn_implementation == "flash_attention_2"
is_fa |= self.attn_implementation == "sdpa" and self.sdpa_backend == "flash_attention"
if is_fa:
logger.warning("Flash attention does not support compile mode. Turning off compile mode.")
self.compile_mode = None
@property
def hash(self) -> str:
return hashlib.sha256(json.dumps(self.to_dict()).encode()).hexdigest()
def infer_name(self, compact: bool = True) -> str:
"""Infer a human-readable name for the benchmark config, either compact or verbose."""
if compact:
iter_str = f"w{self.warmup_iterations}_i{self.measurement_iterations}"
gpu_monitor_str = "monitored" if self.gpu_monitoring else "unmonitored"
dimensions_str = f"b{self.batch_size}_s{self.sequence_length}_n{self.num_tokens_to_generate}"
attn_code = self.attn_implementation
attn_code += f"_{self.sdpa_backend}" if self.attn_implementation == "sdpa" else ""
compile_str = f"compiled_{self.compile_mode}" if self.compile_mode is not None else "uncompiled"
kernelize_str = "kernelized" if self.kernelize else "unkernelized"
sep = "-"
else:
iter_str = f"{self.warmup_iterations} warmup, {self.measurement_iterations} iterations"
gpu_monitor_str = ("with" if self.gpu_monitoring else "no") + " GPU monitoring"
dimensions_str = f"batch size {self.batch_size}, sequence length {self.sequence_length}, {self.num_tokens_to_generate} generated tokens"
attn_code = f"{self.attn_implementation} attention"
attn_code += f" with {self.sdpa_backend} backend" if self.attn_implementation == "sdpa" else ""
compile_str = "compiled" if self.compile_mode is not None else "not compiled"
kernelize_str = "kernelized" if self.kernelize else "not kernelized"
sep = ", "
return sep.join([iter_str, gpu_monitor_str, dimensions_str, attn_code, compile_str, kernelize_str])
def to_dict(self) -> dict[str, Any]:
return {
"name": self.name,
"warmup_iterations": self.warmup_iterations,
"measurement_iterations": self.measurement_iterations,
"gpu_monitoring": self.gpu_monitoring,
"batch_size": self.batch_size,
"sequence_length": self.sequence_length,
"num_tokens_to_generate": self.num_tokens_to_generate,
"attn_implementation": self.attn_implementation,
"sdpa_backend": self.sdpa_backend,
"compile_mode": self.compile_mode,
"compile_options": self.compile_options | {}, # to avoid inplace modification of the original dict
"kernelize": self.kernelize,
}
@classmethod
def from_dict(cls, data: dict[str, Any], skip_validity_check: bool = False) -> "BenchmarkConfig":
return cls(
warmup_iterations=data.get("warmup_iterations", 5),
measurement_iterations=data.get("measurement_iterations", 20),
gpu_monitoring=data.get("gpu_monitoring", False),
batch_size=data.get("batch_size", 1),
sequence_length=data.get("sequence_length", 128),
num_tokens_to_generate=data.get("num_tokens_to_generate", 128),
attn_implementation=data.get("attn_implementation", "eager"),
sdpa_backend=data.get("sdpa_backend"),
compile_mode=data.get("compile_mode"),
compile_options=data.get("compile_options"),
kernelize=data.get("kernelize", False),
name=data.get("name"),
skip_validity_check=skip_validity_check,
)
def adapt_configs(
configs: list[BenchmarkConfig],
warmup_iterations: int | list[int] = 5,
measurement_iterations: int | list[int] = 20,
batch_size: int | list[int] = 1,
sequence_length: int | list[int] = 128,
num_tokens_to_generate: int | list[int] = 128,
gpu_monitoring: bool | list[bool] = True,
) -> list[BenchmarkConfig]:
parameters = (
x if isinstance(x, list) else [x]
for x in [
warmup_iterations,
measurement_iterations,
batch_size,
sequence_length,
num_tokens_to_generate,
gpu_monitoring,
]
)
iterator = itertools.product(*parameters)
adapted_configs = []
for warmup_iters, measurement_iters, bs, seqlen, ntok, monitor in iterator:
for config in configs:
config = config.to_dict()
config["warmup_iterations"] = warmup_iters
config["measurement_iterations"] = measurement_iters
config["batch_size"] = bs
config["sequence_length"] = seqlen
config["num_tokens_to_generate"] = ntok
config["gpu_monitoring"] = monitor
adapted_configs.append(BenchmarkConfig.from_dict(config))
return adapted_configs
def get_config_by_level(level: int) -> list[BenchmarkConfig]:
configs = []
# Early return if level is greater than 3: we generate all combinations of configs, maybe even w/ all compile modes
if level >= 3:
for attn_implementation, sdpa_backend in BenchmarkConfig.all_attn_implementations:
# Usually there is not much to gain by compiling with other modes, but we allow it for level 4
compile_modes = BenchmarkConfig.all_compiled_modes if level >= 4 else [None, "default"]
for cm in compile_modes:
for kernelize_on in [False, KERNELIZATION_AVAILABLE]:
configs.append(
BenchmarkConfig(
attn_implementation=attn_implementation,
sdpa_backend=sdpa_backend,
compile_mode=cm,
kernelize=kernelize_on,
)
)
return configs
# Otherwise, we add the configs for the given level
if level >= 0:
configs.append(BenchmarkConfig(attn_implementation="flex_attention", compile_mode="default"))
if level >= 1:
configs.append(BenchmarkConfig(attn_implementation="flash_attention_2"))
configs.append(BenchmarkConfig(attn_implementation="eager", compile_mode="default"))
if level >= 2:
configs.append(BenchmarkConfig(attn_implementation="sdpa", compile_mode="default"))
configs.append(BenchmarkConfig(attn_implementation="flex_attention", compile_mode="default", kernelize=True))
configs.append(BenchmarkConfig(attn_implementation="flash_attention_2", kernelize=True))
return configs

View File

@ -0,0 +1,456 @@
import gc
import json
import logging
import os
import pathlib
import re
import tempfile
import time
from contextlib import nullcontext
from datetime import datetime
from queue import Queue
from typing import Any
import torch
from datasets import Dataset
from huggingface_hub import HfApi
from tqdm import trange
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
CompileConfig,
GenerationConfig,
GenerationMixin,
)
from transformers.generation.streamers import BaseStreamer
from .benchmark_config import BenchmarkConfig
from .data_classes import BenchmarkMetadata, BenchmarkResult, GPURawMetrics, pretty_print_dict
from .hardware_metrics import GPUMonitor
try:
from kernels import Mode, kernelize # noqa: F401
except ImportError:
kernelize = None
Mode = None
DEFAULT_PROMPT = "\n".join([
"The French Revolution was a period of political and societal change in France that began with the Estates General of 1789 and ended with the Coup of 18 Brumaire on 9 November 1799.",
"Many of the revolution's ideas are considered fundamental principles of liberal democracy, and its values remain central to modern French political discourse.",
"It was caused by a combination of social, political, and economic factors which the existing regime proved unable to manage.",
"Financial crisis and widespread social distress led to the convocation of the Estates General in May 1789, its first meeting since 1614.",
"The representatives of the Third Estate broke away and re-constituted themselves as a National Assembly in June.",
"The Storming of the Bastille in Paris on 14 July led to a series of radical measures by the Assembly, including the abolition of feudalism, state control over the Catholic Church in France, and issuing the Declaration of the Rights of Man and of the Citizen.",
"The next three years were dominated by a struggle for political control.",
"King Louis XVI's attempted flight to Varennes in June 1791 further discredited the monarchy, and military defeats after the outbreak of the French Revolutionary Wars in April 1792 led to the insurrection of 10 August 1792.",
"As a result, the monarchy was replaced by the French First Republic in September, followed by the execution of Louis XVI himself in January 1793.",
"After another revolt in June 1793, the constitution was suspended, and political power passed from the National Convention to the Committee of Public Safety, dominated by radical Jacobins led by Maximilien Robespierre.",
"About 16,000 people were sentenced by the Revolutionary Tribunal and executed in the Reign of Terror, which ended in July 1794 with the Thermidorian Reaction.",
"Weakened by external threats and internal opposition, the Committee of Public Safety was replaced in November 1795 by the Directory.",
"Its instability ended in the coup of 18 Brumaire and the establishment of the Consulate, with Napoleon Bonaparte as First Consul.",
]) # fmt: skip
PUSH_TO_HUB_TOKEN = os.getenv("PUSH_TO_HUB_TOKEN", None)
def compact_json_numeric_arrays(data: dict):
# Match arrays that contain only numbers (ints/floats), whitespace, commas, and newlines
pattern = r"\[\s*\n\s*((?:\d+(?:\.\d+)?\s*,\s*)*\d+(?:\.\d+)?)\s*\n\s*\]"
def replace_numeric_array(match):
# Get the array content
content = match.group(1)
# Remove extra whitespace but keep commas
compact_content = re.sub(r"\s+", " ", content).strip()
return f"[{compact_content}]"
return re.sub(pattern, replace_numeric_array, json.dumps(data, indent=4, default=str), flags=re.DOTALL)
def get_git_revision() -> str:
base_path = pathlib.Path(__file__).parent.parent.parent
git_dir = base_path / ".git"
with (git_dir / "HEAD").open("r") as head:
ref = head.readline().split(" ")[-1].strip()
with (git_dir / ref).open("r") as git_hash:
return git_hash.readline().strip()
def get_sdpa_backend(backend_name: str | None) -> torch.nn.attention.SDPBackend | None:
"""Get the SDPA backend enum from string name."""
if backend_name is None:
return None
try:
backend_map = {
"math": torch.nn.attention.SDPBackend.MATH,
"flash_attention": torch.nn.attention.SDPBackend.FLASH_ATTENTION,
"efficient_attention": torch.nn.attention.SDPBackend.EFFICIENT_ATTENTION,
"cudnn_attention": torch.nn.attention.SDPBackend.CUDNN_ATTENTION,
}
return backend_map.get(backend_name.lower())
except AttributeError:
# torch.nn.attention.SDPBackend not available in older torch versions
return None
def flush_memory():
"""Flush GPU memory and run garbage collection."""
gc.collect()
# Dynamo resets
torch._dynamo.reset()
torch._dynamo.reset_code_caches()
if hasattr(torch._inductor, "codecache"):
# Clear FX graph cache
if hasattr(torch._inductor.codecache, "FxGraphCache"):
torch._inductor.codecache.FxGraphCache.clear()
# Clear PyCodeCache
if hasattr(torch._inductor.codecache, "PyCodeCache"):
torch._inductor.codecache.PyCodeCache.cache_clear()
# Clear TritonFuture cache (for async compilation)
if hasattr(torch._inductor.codecache, "TritonFuture"):
if hasattr(torch._inductor.codecache.TritonFuture, "_compile_cache"):
torch._inductor.codecache.TritonFuture._compile_cache.clear()
# Clear CUDA cache
if torch.cuda.is_available():
torch.cuda.empty_cache()
torch.cuda.reset_max_memory_allocated()
torch.cuda.reset_peak_memory_stats()
torch.cuda.synchronize()
gc.collect()
class BenchmarkStreamer(BaseStreamer):
def __init__(self, **kwargs) -> None:
self.timeout = kwargs.pop("timeout", 10)
self.timestamps = []
self.text_queue = Queue()
self.stop_signal = None
def put(self, value):
"""Receives tokens and logs the timestamp of the generation."""
self.timestamps.append(time.perf_counter())
self.text_queue.put(value)
def end(self):
self.timestamps.append(time.perf_counter())
self.text_queue.put(self.stop_signal)
def __iter__(self):
return self
def __next__(self):
value = self.text_queue.get(timeout=self.timeout)
if value == self.stop_signal:
raise StopIteration()
else:
return value
class BenchmarkRunner:
"""Main benchmark runner that coordinates benchmark execution."""
def __init__(
self,
logger: logging.Logger,
output_dir: str | None = None,
branch_name: str | None = None,
commit_id: str | None = None,
commit_message: str | None = None,
) -> None:
# Those stay constant for the whole run
self.logger = logger
if output_dir is None:
output_dir = os.path.join(os.path.dirname(os.path.dirname(__file__)), "benchmark_results")
self.output_dir = output_dir
self.branch_name = branch_name
self.commit_id = get_git_revision() if commit_id is None else commit_id
self.commit_message = commit_message
os.makedirs(self.output_dir, exist_ok=True)
self.profile_dir = None
# Attributes that are reset for each model
self._setup_for = ""
# Attributes that are reset for each run
self.model: GenerationMixin | None = None
def cleanup(self) -> None:
del self.model
self.model = None
flush_memory()
def setup_benchmark(self, model_id: str, config: BenchmarkConfig) -> None:
# Some attributes only need to be set once per model
if self._setup_for != model_id:
self.tokenizer = AutoTokenizer.from_pretrained(model_id)
# We set the EOS token to the padding token for open-ended generation
self.tokenizer.eos_token = self.tokenizer.pad_token
self._setup_for = model_id
# Prepare inputs
self.inputs = self.tokenizer(
[DEFAULT_PROMPT for _ in range(config.batch_size)],
return_tensors="pt",
max_length=config.sequence_length,
truncation=True,
return_attention_mask=True,
).to(config.device)
self.inputs["use_cache"] = True
# Prepare generation config
gen_config = GenerationConfig(
do_sample=False, top_p=1.0, temperature=1.0, max_new_tokens=config.num_tokens_to_generate
)
# Prepare compile config
if config.compile_mode is not None:
gen_config.compile_config = CompileConfig(mode=config.compile_mode, options=config.compile_options)
gen_config.cache_implementation = "static"
# Load model
self.logger.debug(f"Loading model {model_id} on device {config.device}...")
dtype = getattr(torch, config.dtype.removeprefix("torch."))
self.model = AutoModelForCausalLM.from_pretrained(
model_id, dtype=dtype, attn_implementation=config.attn_implementation, generation_config=gen_config
)
self.model = self.model.eval().to(config.device)
# Kernelize the model if needed
if config.kernelize and kernelize is not None and Mode is not None:
self.model = kernelize(self.model, mode=Mode.INFERENCE)
def run_benchmark(
self, model_id: str, config: BenchmarkConfig, num_tokens_to_profile: int = 0
) -> dict[str, Any] | None:
"""Run a single benchmark with the given model ID and config."""
sdpa_ctx = nullcontext()
if config.attn_implementation == "sdpa":
sdpa_backend = get_sdpa_backend(config.sdpa_backend)
sdpa_ctx = torch.nn.attention.sdpa_kernel(sdpa_backend)
with sdpa_ctx, torch.no_grad():
self.logger.info(f"Running benchmark scenario: {config.name}")
# Quick validation: try one measurement first to see if this scenario works
flush_memory()
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = self.time_generate(
max_new_tokens=1, gpu_monitor=None
)
if e2e_latency < 0:
self.logger.warning(f"Skipping config {config.name}: {e2e_latency = } (no GPU monitoring)")
return None
# Warmup runs
self.logger.info(f"Warming up with {config.warmup_iterations} iterations...")
for _ in trange(config.warmup_iterations):
_ = self.time_generate(max_new_tokens=config.num_tokens_to_generate)
self.logger.info("Warmup over.")
# Measurement runs
result = BenchmarkResult()
self.logger.info(f"Benchmarking with {config.measurement_iterations} iterations.")
for _ in trange(config.measurement_iterations):
e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics = self.time_generate(
max_new_tokens=config.num_tokens_to_generate,
gpu_monitor=(GPUMonitor(logger=self.logger) if config.gpu_monitoring else None),
)
result.accumulate(e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics)
self.logger.info("Benchmarking done. Cleaning up.")
# Profile if needed
if num_tokens_to_profile > 0:
self.profile_generate(num_tokens_to_profile, config.name)
return {
"metadata": BenchmarkMetadata(
model_id=model_id,
branch_name=self.branch_name,
commit_id=self.commit_id,
commit_message=self.commit_message,
),
"measurements": result,
"config": config,
}
def time_generate(
self,
max_new_tokens: int,
gpu_monitor: GPUMonitor | None = None,
) -> tuple[float, list[float], str, GPURawMetrics | None]:
"""Time the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
# Prepare gpu monitoring if needed
if gpu_monitor is not None:
gpu_monitor.start()
# Prepare streamer
streamer = BenchmarkStreamer()
# Generate and time
wall_time_0 = time.perf_counter()
outputs = self.model.generate(
**self.inputs,
max_new_tokens=max_new_tokens,
streamer=streamer,
)
wall_time_1 = time.perf_counter()
# Stop gpu monitoring if needed
gpu_metrics = gpu_monitor.stop_and_collect() if gpu_monitor is not None else None
# Check if generation had the right number of tokens
input_tokens = self.inputs["input_ids"].size(-1)
batch_size, output_tokens = outputs.shape
new_tokens = output_tokens - input_tokens
if new_tokens != max_new_tokens:
raise RuntimeError(f"Generated {new_tokens} tokens, expected {max_new_tokens}")
# Decode outputs
decoded_output = self.tokenizer.decode(outputs[0, input_tokens:], skip_special_tokens=True)
shape_and_decoded_output = f"{tuple(outputs.shape)} | {decoded_output}"
# Compute intermediate quantities
e2e_latency = wall_time_1 - wall_time_0
token_generation_times = [t - wall_time_0 for t in streamer.timestamps[1:]]
return e2e_latency, token_generation_times, shape_and_decoded_output, gpu_metrics
def profile_generate(self, num_tokens_to_profile: int, config_name: str) -> None:
"""Profile the latency of a call to model.generate() with the given (inputs) and (max_new_tokens)."""
profiler = torch.profiler.profile(
activities=[torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA],
record_shapes=True,
)
with profiler as prof:
_ = self.model.generate(
**self.inputs,
max_new_tokens=num_tokens_to_profile,
)
if self.profile_dir is None:
self.profile_dir = self.output_dir + "_profiles"
os.makedirs(self.profile_dir, exist_ok=True)
prof.export_chrome_trace(f"{self.profile_dir}/{config_name}.json")
def run_benchmarks(
self,
model_id: str,
benchmark_configs: list[BenchmarkConfig],
num_tokens_to_profile: int = 0,
pretty_print_summary: bool = True,
) -> tuple[str, dict[str, Any]]:
"""Run multiple benchmarks for the given model ID and list of benchmark configs."""
all_results = {}
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
start_time = time.perf_counter()
n_configs = len(benchmark_configs)
for i, config in enumerate(benchmark_configs):
# Handle SDPA backend if not determined by the config (needs to be done before skipping duplicates)
if config.attn_implementation == "sdpa" and config.sdpa_backend is None:
default_backend = "flash_attention" # FIXME: torch has a _cur_sdpa_kernel_backends but it fails
self.logger.warning(f"No SDPA backend provided, using {default_backend} instead.")
config.sdpa_backend = default_backend
# Skip if already run
if config.hash in all_results:
self.logger.info(f"Skipping duplicate config {config.name} for model {model_id} ({i + 1}/{n_configs})")
continue
# Otherwise, run the benchmark
self.setup_benchmark(model_id, config)
self.logger.info(
f"Running benchmark of model {model_id} with scenario: {config.name} ({i + 1}/{n_configs})"
)
# Launch benchmark in a try/except block to avoid stopping the whole run if one benchmark fails
try:
results = self.run_benchmark(model_id, config, num_tokens_to_profile)
if results is not None:
all_results[config.hash] = results
except Exception as e:
self.logger.error(f"Error running with scenario: {config.name}:\n{repr(e)}")
# Cleanup model and save results
self.cleanup()
self.save_results(model_id, all_results, timestamp=timestamp)
if pretty_print_summary:
print()
print("=" * 100)
print(f"Finished benchmarks in {time.perf_counter() - start_time:.2f} seconds")
print(f"Total number of benchmarks: {len(all_results)}")
if len(all_results) > 0:
print("First run metadata:")
first_key = list(all_results.keys())[0]
first_metadata = all_results[first_key]["metadata"].to_dict()
hardware_info = first_metadata.pop("hardware_info")
pretty_print_dict(first_metadata | hardware_info, tabs=1)
for result in all_results.values():
print("=" * 100)
print(f"Config: {result['config'].infer_name(compact=False)}\n")
result["measurements"].pprint(batch_size=result["config"].batch_size, tabs=1)
print("=" * 100)
return (timestamp, all_results)
def save_results(self, model_name: str, results: dict, timestamp: str = "") -> str:
"""Save benchmark results to JSON file."""
# Create model-specific subdirectory
model_name = model_name.replace("/", "_")
model_dir = os.path.join(self.output_dir, model_name)
os.makedirs(model_dir, exist_ok=True)
# Create filename with timestamp
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") if not timestamp else timestamp
filename = f"{model_name}_benchmark_{timestamp}.json"
filepath = os.path.join(model_dir, filename)
# Convert results to dict
converted_results = {}
for cfg_hash in results.keys():
converted_results[cfg_hash] = {
"metadata": results[cfg_hash]["metadata"].to_dict(),
"measurements": results[cfg_hash]["measurements"].to_dict(),
"config": results[cfg_hash]["config"].to_dict(),
}
# Save to JSON file
with open(filepath, "w") as f:
f.write(compact_json_numeric_arrays(converted_results))
self.logger.info(f"Results saved to {filepath}")
return filepath
def push_results_to_hub(self, dataset_id: str, results: dict[Any, Any], timestamp: str) -> None:
if PUSH_TO_HUB_TOKEN is None:
raise ValueError(
"PUSH_TO_HUB_TOKEN is not set, cannot push results to the Hub. When setting dataset_id, please also set the PUSH_TO_HUB_TOKEN environment variable."
)
n_results = len(results)
self.logger.info(f"Pushing {n_results} results to: {dataset_id}")
rows = []
for cfg_hash, entry in results.items():
row = {
"benchmark_config_hash": cfg_hash,
"config": entry["config"].to_dict(),
"measurements": entry["measurements"].to_dict(),
"metadata": entry["metadata"].to_dict(),
}
rows.append(row)
ds = Dataset.from_list(rows)
with tempfile.TemporaryDirectory() as tmp:
jsonl_path = os.path.join(tmp, "data.jsonl")
with open(jsonl_path, "w") as f:
json_lines = []
for ex in ds:
json_lines.append(json.dumps(ex, ensure_ascii=False))
f.write("\n".join(json_lines))
api = HfApi()
# NOTE: we expect the repository to already exist
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") if not timestamp else timestamp
file_name = f"benchmark_run_{timestamp}.jsonl"
api.upload_file(
path_or_fileobj=jsonl_path,
path_in_repo=file_name,
repo_id=dataset_id,
repo_type="dataset",
token=PUSH_TO_HUB_TOKEN,
)
self.logger.info(f"Succesfully uploaded results to: {dataset_id}")

View File

@ -0,0 +1,167 @@
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any
import numpy as np
from .hardware_metrics import GPURawMetrics, HardwareInfo
def compute_basic_statistics(measurements: list[float]) -> dict[str, float]:
return {
"avg": np.mean(measurements),
"std": np.std(measurements),
"min": np.min(measurements),
"med": np.median(measurements),
"max": np.max(measurements),
"p95": np.percentile(measurements, 95),
}
def add_unit_to_duration(stats: dict[str, float]) -> dict[str, str]:
for key in list(stats.keys()):
value = stats[key]
if value > 3600:
stats[key] = f"{(value / 3600):.2f}hr"
elif value > 60:
stats[key] = f"{(value / 60):.2f}min"
elif value > 1:
stats[key] = f"{value:.2f}s"
elif value > 1e-3:
stats[key] = f"{(value * 1e3):.2f}ms"
elif value > 1e-6:
stats[key] = f"{(value * 1e6):.2f}us"
else:
stats[key] = f"{(value * 1e9):.2f}ns"
return stats
def equalize_lengths_and_collate(stats: list[dict[str, str]]) -> list[str]:
keys = ["avg", "std", "min", "med", "max", "p95"]
for key in keys:
max_length = max(len(stat[key]) for stat in stats)
for stat in stats:
stat[key] = stat[key].ljust(max_length, " ")
return [" ".join([f"{key}={stat[key]}" for key in keys]) for stat in stats]
def pretty_print_dict(data: dict[str, Any], tabs: int = 0) -> None:
max_key_length = max([len(key) for key in data.keys()])
for key, value in data.items():
tabs_str = " " * tabs
padded_key = key.ljust(max_key_length + 1, ".")
print(f"{tabs_str}{padded_key}: {value}")
@dataclass
class BenchmarkMetadata:
"""Metadata collected for each benchmark run."""
model_id: str
timestamp: str
branch_name: str
commit_id: str
commit_message: str
hardware_info: HardwareInfo
def __init__(self, model_id: str, commit_id: str, branch_name: str = "main", commit_message: str = "") -> None:
self.model_id = model_id
self.timestamp = datetime.now(timezone.utc).isoformat()
self.branch_name = branch_name
self.commit_id = commit_id
self.commit_message = commit_message
self.hardware_info = HardwareInfo()
def to_dict(self) -> dict[str, Any]:
return {
"model_id": self.model_id,
"timestamp": self.timestamp,
"branch_name": self.branch_name,
"commit_id": self.commit_id,
"commit_message": self.commit_message,
"hardware_info": self.hardware_info.to_dict(),
}
class BenchmarkResult:
"""Result from a series of benchmark runs."""
def __init__(self) -> None:
self.e2e_latency = []
self.token_generation_times = [] # time at which each token was generated (relative to start of the generation)
self.shape_and_decoded_outputs = []
self.gpu_metrics = []
def accumulate(
self,
e2e_latency: float,
token_generation_times: list[float],
shape_and_decoded_output: str,
gpu_metrics: GPURawMetrics | None,
) -> None:
self.e2e_latency.append(e2e_latency)
self.token_generation_times.append(token_generation_times)
self.shape_and_decoded_outputs.append(shape_and_decoded_output)
self.gpu_metrics.append(gpu_metrics)
def to_dict(self) -> dict[str, None | int | float]:
# Save GPU metrics as None if it contains only None values
if all(gm is None for gm in self.gpu_metrics):
gpu_metrics = None
else:
gpu_metrics = [gm.to_dict() for gm in self.gpu_metrics]
return {
"e2e_latency": self.e2e_latency,
"token_generation_times": self.token_generation_times,
"shape_and_decoded_outputs": self.shape_and_decoded_outputs,
"gpu_metrics": gpu_metrics,
}
@classmethod
def from_dict(cls, data: dict[str, None | int | float]) -> "BenchmarkResult":
# Handle GPU metrics, which is saved as None if it contains only None values
if data["gpu_metrics"] is None:
gpu_metrics = [None for _ in range(len(data["e2e_latency"]))]
else:
gpu_metrics = [GPURawMetrics.from_dict(gm) for gm in data["gpu_metrics"]]
# Create a new instance and accumulate the data
new_instance = cls()
for i in range(len(data["e2e_latency"])):
new_instance.accumulate(
e2e_latency=data["e2e_latency"][i],
token_generation_times=data["token_generation_times"][i],
shape_and_decoded_output=data["shape_and_decoded_outputs"][i],
gpu_metrics=gpu_metrics[i],
)
return new_instance
def get_measured_ttft(self) -> list[float]:
return [dt[0] for dt in self.token_generation_times if len(dt) > 0]
def get_measured_itl(self) -> list[float]:
return [(dt[-1] - dt[0]) / (len(dt) - 1) for dt in self.token_generation_times if len(dt) > 1]
def get_throughput(self, batch_size: int) -> float:
return [
batch_size * len(dt) / e2e_latency
for e2e_latency, dt in zip(self.e2e_latency, self.token_generation_times)
]
def pprint(self, batch_size: int = 0, tabs: int = 0) -> None:
stats_to_collate = [
add_unit_to_duration(compute_basic_statistics(self.e2e_latency)),
add_unit_to_duration(compute_basic_statistics(self.get_measured_ttft())),
add_unit_to_duration(compute_basic_statistics(self.get_measured_itl())),
]
if batch_size > 0:
throughput_stats = compute_basic_statistics(self.get_throughput(batch_size))
stats_to_collate.append({key: f"{value:.2f}tok/s" for key, value in throughput_stats.items()})
collated_stats = equalize_lengths_and_collate(stats_to_collate)
dict_to_pprint = {
"E2E Latency": collated_stats[0],
"Time to First Token": collated_stats[1],
"Inter-Token Latency": collated_stats[2],
}
if batch_size > 0:
dict_to_pprint["Throughput"] = collated_stats[3]
pretty_print_dict(dict_to_pprint, tabs=tabs)

View File

@ -0,0 +1,171 @@
import json
import logging
import subprocess
import sys
import threading
import time
from dataclasses import dataclass
from enum import Enum
from logging import Logger
import gpustat
import psutil
import torch
# Data class to hold the hardware information
def get_device_name_and_memory_total() -> tuple[str, float]:
"""Returns the name and memory total of GPU 0."""
device_name = torch.cuda.get_device_properties(0).name
device_memory_total = torch.cuda.get_device_properties(0).total_memory / 1024**3
return device_name, device_memory_total
class HardwareInfo:
"""A class to hold information about the hardware."""
def __init__(self) -> None:
# Retrieve GPU stats
try:
self.gpu_name, self.gpu_memory_total_gb = get_device_name_and_memory_total()
except Exception:
self.gpu_name, self.gpu_memory_total_gb = None, None
# Retrieve python, torch and CUDA version
self.python_version = f"{sys.version.split()[0]}"
self.torch_version = torch.__version__
if hasattr(torch, "cuda") and torch.cuda.is_available():
self.cuda_version = torch.version.cuda
else:
self.cuda_version = None
# Retrieve general hardware information
self.cpu_count = psutil.cpu_count()
self.memory_total_mb = int(psutil.virtual_memory().total / (1024 * 1024))
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"gpu_name": self.gpu_name,
"gpu_memory_total_gb": self.gpu_memory_total_gb,
"python_version": self.python_version,
"torch_version": self.torch_version,
}
# Functions to get information about the GPU
def get_amd_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an AMD GPU, both in percent"""
rocm_smi_output = subprocess.check_output(["rocm-smi", "--json", "--showuse", "--showmeminfo", "VRAM"])
gpu_stats = json.loads(rocm_smi_output.decode("utf-8"))
gpu_stats = [
(card_id, stats["GPU use (%)"], stats["VRAM Total Used Memory (B)"]) for card_id, stats in gpu_stats.items()
]
gpu_stats.sort(key=lambda x: x[1], reverse=True)
return int(gpu_stats[0][1]), float(gpu_stats[0][2]) / 1024**3
def get_nvidia_gpu_stats() -> tuple[int, float]:
"""Returns the utilization and memory used of an NVIDIA GPU, both in percent"""
gpu_stats = gpustat.GPUStatCollection.new_query()
gpu_stats = gpu_stats[0]
return int(gpu_stats["utilization.gpu"]), float(gpu_stats["memory.used"]) / 1024**3
class GPUStatsCollector:
"""A class to get statistics about the GPU. It serves as a wrapper that holds the GPU total memory and its name,
which is used to call the right function to get the utilization and memory used."""
def __init__(self) -> None:
self.device_name, self.device_memory_total = get_device_name_and_memory_total()
# Monkey patch the get_utilization_and_memory_used method based on the GPU type
if "amd" in self.device_name.lower():
self.get_utilization_and_memory_used = get_amd_gpu_stats
elif "nvidia" in self.device_name.lower():
self.get_utilization_and_memory_used = get_nvidia_gpu_stats
else:
raise RuntimeError(f"Unsupported GPU: {self.device_name}")
def get_measurements(self) -> tuple[int, float]:
"""Get the utilization and memory used of the GPU, both in percent"""
raise NotImplementedError("This method is meant to be monkey patched during __init__")
# Simple data classes to hold the raw GPU metrics
class GPUMonitoringStatus(Enum):
"""Status of GPU monitoring."""
SUCCESS = "success"
FAILED = "failed"
NO_GPUS_AVAILABLE = "no_gpus_available"
NO_SAMPLES_COLLECTED = "no_samples_collected"
@dataclass
class GPURawMetrics:
"""Raw values for GPU utilization and memory used."""
utilization: list[float] # in percent
memory_used: list[float] # in GB
timestamps: list[float] # in seconds
timestamp_0: float # in seconds
monitoring_status: GPUMonitoringStatus
def to_dict(self) -> dict[str, None | int | float | str]:
return {
"utilization": self.utilization,
"memory_used": self.memory_used,
"timestamps": self.timestamps,
"timestamp_0": self.timestamp_0,
"monitoring_status": self.monitoring_status.value,
}
# Main class, used to monitor the GPU utilization during benchmark execution
class GPUMonitor:
"""Monitor GPU utilization during benchmark execution."""
def __init__(self, sample_interval_sec: float = 0.1, logger: Logger | None = None):
self.sample_interval_sec = sample_interval_sec
self.logger = logger if logger is not None else logging.getLogger(__name__)
self.num_available_gpus = torch.cuda.device_count()
if self.num_available_gpus == 0:
raise RuntimeError("No GPUs detected by torch.cuda.device_count().")
self.gpu_stats_getter = GPUStatsCollector()
def start(self):
"""Start monitoring GPU metrics."""
# Clear the stop event to enable monitoring
self.stop_event = threading.Event()
self.gpu_utilization = []
self.gpu_memory_used = []
self.timestamps = []
self.thread = threading.Thread(target=self._monitor_loop)
self.thread.start()
self.logger.debug("GPU monitoring started")
def stop_and_collect(self) -> GPURawMetrics:
"""Stop monitoring and return collected metrics."""
self.stop_event.set()
self.thread.join()
if self.gpu_utilization:
timestamp_0 = self.timestamps[0]
metrics = GPURawMetrics(
utilization=self.gpu_utilization,
memory_used=self.gpu_memory_used,
timestamps=[t - timestamp_0 for t in self.timestamps],
timestamp_0=timestamp_0,
monitoring_status=GPUMonitoringStatus.SUCCESS,
)
self.logger.debug(f"GPU monitoring completed: {len(self.gpu_utilization)} samples collected")
else:
metrics = GPURawMetrics(monitoring_status=GPUMonitoringStatus.NO_SAMPLES_COLLECTED)
return metrics
def _monitor_loop(self):
"""Background monitoring loop using threading.Event for communication."""
while not self.stop_event.is_set():
utilization, memory_used = self.gpu_stats_getter.get_utilization_and_memory_used()
self.gpu_utilization.append(utilization)
self.gpu_memory_used.append(memory_used)
self.timestamps.append(time.time())
if self.stop_event.wait(timeout=self.sample_interval_sec):
break

View File

@ -4,4 +4,4 @@ gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
huggingface_hub>=0.16.0
huggingface_hub>=0.16.0

View File

@ -19,477 +19,91 @@ in the ./benches directory, organizing outputs into model-specific subfolders.
"""
import argparse
import importlib.util
import json
import logging
import os
import sys
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Optional
from framework.benchmark_config import adapt_configs, get_config_by_level
from framework.benchmark_runner import BenchmarkRunner
def setup_logging(log_level: str = "INFO", enable_file_logging: bool = False) -> logging.Logger:
"""Setup logging configuration."""
numeric_level = getattr(logging, log_level.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError(f"Invalid log level: {log_level}")
if __name__ == "__main__":
# Parse arguments
parser = argparse.ArgumentParser()
parser.add_argument("--output-dir", type=str, default=None, help="Output dir for benchmark results")
parser.add_argument("--log-level", type=str, choices=["DEBUG", "INFO", "WARNING", "ERROR"], default="INFO")
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup", "-w", type=int, default=3, help="Number of warmup iterations")
parser.add_argument("--iterations", "-i", type=int, default=10, help="Number of measurement iterations")
parser.add_argument("--batch-size", "-b", type=int, nargs="+", help="Batch size")
parser.add_argument("--sequence-length", "-s", type=int, nargs="+", help="Sequence length")
parser.add_argument("--num-tokens-to-generate", "-n", type=int, nargs="+", help="Number of tokens to generate")
parser.add_argument(
"--level",
type=int,
default=1,
help="Level of coverage for the benchmark. 0: only the main config, 1: a few important configs, 2: a config for"
" each attn implementation an option, 3: cross-generate all combinations of configs, 4: cross-generate all"
" combinations of configs w/ all compile modes",
)
parser.add_argument("--num-tokens-to-profile", "-p", type=int, default=0, help="Number of tokens to profile")
parser.add_argument("--branch-name", type=str, help="Git branch name")
parser.add_argument("--commit-id", type=str, help="Git commit ID (if not provided, will auto-detect from git)")
parser.add_argument("--commit-message", type=str, help="Git commit message")
parser.add_argument(
"--no-gpu-monitoring", action="store_true", help="Disables GPU monitoring during benchmark runs"
)
parser.add_argument(
"--push-result-to-dataset",
type=str,
default=None,
help="Name of the dataset to push results to. If not provided, results are not pushed to the Hub.",
)
args = parser.parse_args()
# Setup logging
benchmark_run_uuid = str(uuid.uuid4())[:8]
numeric_level = getattr(logging, args.log_level.upper())
handlers = [logging.StreamHandler(sys.stdout)]
if enable_file_logging:
handlers.append(logging.FileHandler(f"benchmark_run_{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"))
logging.basicConfig(
level=numeric_level, format="[%(levelname)s - %(asctime)s] %(name)s: %(message)s", handlers=handlers
)
return logging.getLogger(__name__)
def discover_benchmarks(benches_dir: str) -> list[dict[str, Any]]:
"""
Discover all benchmark modules in the benches directory.
Returns:
List of dictionaries containing benchmark module info
"""
benchmarks = []
benches_path = Path(benches_dir)
if not benches_path.exists():
raise FileNotFoundError(f"Benches directory not found: {benches_dir}")
for py_file in benches_path.glob("*.py"):
if py_file.name.startswith("__"):
continue
module_name = py_file.stem
try:
# Import the module
spec = importlib.util.spec_from_file_location(module_name, py_file)
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
# Check if it has a benchmark runner function
if hasattr(module, f"run_{module_name}"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, f"run_{module_name}"),
}
)
elif hasattr(module, "run_benchmark"):
benchmarks.append(
{
"name": module_name,
"path": str(py_file),
"module": module,
"runner_function": getattr(module, "run_benchmark"),
}
)
else:
logging.warning(f"No runner function found in {py_file}")
except Exception as e:
logging.error(f"Failed to import {py_file}: {e}")
return benchmarks
def run_single_benchmark(
benchmark_info: dict[str, Any], output_dir: str, logger: logging.Logger, **kwargs
) -> Optional[str]:
"""
Run a single benchmark and return the output file path.
Args:
benchmark_info: Dictionary containing benchmark module info
output_dir: Base output directory
logger: Logger instance
**kwargs: Additional arguments to pass to the benchmark
Returns:
Path to the output file if successful, None otherwise
"""
benchmark_name = benchmark_info["name"]
runner_func = benchmark_info["runner_function"]
logger.info(f"Running benchmark: {benchmark_name}")
try:
# Check function signature to determine what arguments to pass
import inspect
sig = inspect.signature(runner_func)
# Prepare arguments based on function signature
func_kwargs = {"logger": logger, "output_dir": output_dir}
# Add other kwargs if the function accepts them
for param_name in sig.parameters:
if param_name in kwargs:
func_kwargs[param_name] = kwargs[param_name]
# Filter kwargs to only include parameters the function accepts
# If function has **kwargs, include all provided kwargs
has_var_kwargs = any(param.kind == param.VAR_KEYWORD for param in sig.parameters.values())
if has_var_kwargs:
valid_kwargs = {**func_kwargs, **kwargs}
else:
valid_kwargs = {k: v for k, v in func_kwargs.items() if k in sig.parameters}
# Run the benchmark
result = runner_func(**valid_kwargs)
if isinstance(result, str):
# Function returned a file path
return result
else:
logger.info(f"Benchmark {benchmark_name} completed successfully")
return "completed"
except Exception as e:
logger.error(f"Benchmark {benchmark_name} failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return None
def generate_summary_report(
output_dir: str,
benchmark_results: dict[str, Any],
logger: logging.Logger,
benchmark_run_uuid: Optional[str] = None,
) -> str:
"""Generate a summary report of all benchmark runs."""
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
summary_file = os.path.join(output_dir, f"benchmark_summary_{timestamp}.json")
summary_data = {
"run_metadata": {
"timestamp": datetime.utcnow().isoformat(),
"benchmark_run_uuid": benchmark_run_uuid,
"total_benchmarks": len(benchmark_results),
"successful_benchmarks": len([r for r in benchmark_results.values() if r is not None]),
"failed_benchmarks": len([r for r in benchmark_results.values() if r is None]),
},
"benchmark_results": benchmark_results,
"output_directory": output_dir,
}
with open(summary_file, "w") as f:
json.dump(summary_data, f, indent=2, default=str)
logger.info(f"Summary report saved to: {summary_file}")
return summary_file
def upload_results_to_hf_dataset(
output_dir: str,
summary_file: str,
dataset_name: str,
run_id: Optional[str] = None,
token: Optional[str] = None,
logger: Optional[logging.Logger] = None,
) -> Optional[str]:
"""
Upload benchmark results to a HuggingFace Dataset.
Based on upload_collated_report() from utils/collated_reports.py
Args:
output_dir: Local output directory containing results
summary_file: Path to the summary file
dataset_name: Name of the HuggingFace dataset to upload to
run_id: Unique run identifier (if None, will generate one)
token: HuggingFace token for authentication (if None, will use environment variables)
logger: Logger instance
Returns:
The run_id used for the upload, None if upload failed
"""
if logger is None:
logger = logging.getLogger(__name__)
import os
from huggingface_hub import HfApi
api = HfApi()
if run_id is None:
github_run_number = os.getenv("GITHUB_RUN_NUMBER")
github_run_id = os.getenv("GITHUB_RUN_ID")
if github_run_number and github_run_id:
run_id = f"{github_run_number}-{github_run_id}"
date_folder = datetime.now().strftime("%Y-%m-%d")
github_event_name = os.getenv("GITHUB_EVENT_NAME")
if github_event_name != "schedule":
# Non-scheduled runs go under a runs subfolder
repo_path = f"{date_folder}/runs/{run_id}/benchmark_results"
else:
# Scheduled runs go directly under the date
repo_path = f"{date_folder}/{run_id}/benchmark_results"
logger.info(f"Uploading benchmark results to dataset '{dataset_name}' at path '{repo_path}'")
try:
# Upload all files in the output directory
from pathlib import Path
output_path = Path(output_dir)
for file_path in output_path.rglob("*"):
if file_path.is_file():
# Calculate relative path from output_dir
relative_path = file_path.relative_to(output_path)
path_in_repo = f"{repo_path}/{relative_path}"
logger.debug(f"Uploading {file_path} to {path_in_repo}")
api.upload_file(
path_or_fileobj=str(file_path),
path_in_repo=path_in_repo,
repo_id=dataset_name,
repo_type="dataset",
token=token,
commit_message=f"Upload benchmark results for run {run_id}",
)
logger.info(
f"Successfully uploaded results to: https://huggingface.co/datasets/{dataset_name}/tree/main/{repo_path}"
)
return run_id
except Exception as upload_error:
logger.error(f"Failed to upload results: {upload_error}")
import traceback
logger.debug(traceback.format_exc())
return None
def main():
"""Main entry point for the benchmarking script."""
# Generate a unique UUID for this benchmark run
benchmark_run_uuid = str(uuid.uuid4())[:8]
parser = argparse.ArgumentParser(
description="Run all benchmarks in the ./benches directory",
epilog="""
Examples:
# Run all available benchmarks
python3 run_benchmarks.py
# Run with specific model and upload to HuggingFace Dataset
python3 run_benchmarks.py --model-id meta-llama/Llama-2-7b-hf --upload-to-hf username/benchmark-results
# Run with custom run ID and upload to HuggingFace Dataset
python3 run_benchmarks.py --run-id experiment_v1 --upload-to-hf org/benchmarks
# Run only specific benchmarks with file logging
python3 run_benchmarks.py --include llama --enable-file-logging
""", # noqa: W293
formatter_class=argparse.RawDescriptionHelpFormatter,
)
parser.add_argument(
"--output-dir",
type=str,
default="benchmark_results",
help="Base output directory for benchmark results (default: benchmark_results)",
)
parser.add_argument(
"--benches-dir",
type=str,
default="./benches",
help="Directory containing benchmark implementations (default: ./benches)",
)
parser.add_argument(
"--log-level",
type=str,
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
default="INFO",
help="Logging level (default: INFO)",
)
parser.add_argument("--model-id", type=str, help="Specific model ID to benchmark (if supported by benchmarks)")
parser.add_argument("--warmup-iterations", type=int, default=3, help="Number of warmup iterations (default: 3)")
parser.add_argument(
"--measurement-iterations", type=int, default=5, help="Number of measurement iterations (default: 5)"
)
parser.add_argument(
"--num-tokens-to-generate",
type=int,
default=100,
help="Number of tokens to generate in benchmarks (default: 100)",
)
parser.add_argument("--include", type=str, nargs="*", help="Only run benchmarks matching these names")
parser.add_argument("--exclude", type=str, nargs="*", help="Exclude benchmarks matching these names")
parser.add_argument("--enable-file-logging", action="store_true", help="Enable file logging (disabled by default)")
parser.add_argument(
"--commit-id", type=str, help="Git commit ID for metadata (if not provided, will auto-detect from git)"
)
parser.add_argument(
"--push-to-hub",
type=str,
help="Upload results to HuggingFace Dataset (provide dataset name, e.g., 'username/benchmark-results')",
)
parser.add_argument(
"--run-id", type=str, help="Custom run ID for organizing results (if not provided, will generate a unique ID)"
)
parser.add_argument(
"--token",
type=str,
help="HuggingFace token for dataset uploads (if not provided, will use HF_TOKEN environment variable)",
)
args = parser.parse_args()
# Setup logging
logger = setup_logging(args.log_level, args.enable_file_logging)
logger = logging.getLogger("benchmark_v2")
logger.info("Starting benchmark discovery and execution")
logger.info(f"Benchmark run UUID: {benchmark_run_uuid}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Benches directory: {args.benches_dir}")
# Create output directory
os.makedirs(args.output_dir, exist_ok=True)
# Error out if one of the arguments is not provided
if len(args.batch_size) * len(args.sequence_length) * len(args.num_tokens_to_generate) == 0:
raise ValueError(
"At least one of the arguments --batch-size, --sequence-length, or --num-tokens-to-generate is required"
)
try:
# Discover benchmarks
benchmarks = discover_benchmarks(args.benches_dir)
logger.info(f"Discovered {len(benchmarks)} benchmark(s): {[b['name'] for b in benchmarks]}")
# Get the configs for the given coverage level
configs = get_config_by_level(args.level)
# Adapt the configs to the given arguments
configs = adapt_configs(
configs,
args.warmup,
args.iterations,
args.batch_size,
args.sequence_length,
args.num_tokens_to_generate,
not args.no_gpu_monitoring,
)
if not benchmarks:
logger.warning("No benchmarks found!")
return 1
runner = BenchmarkRunner(logger, args.output_dir, args.branch_name, args.commit_id, args.commit_message)
timestamp, results = runner.run_benchmarks(
args.model_id, configs, args.num_tokens_to_profile, pretty_print_summary=True
)
# Filter benchmarks based on include/exclude
filtered_benchmarks = benchmarks
if args.include:
filtered_benchmarks = [
b for b in filtered_benchmarks if any(pattern in b["name"] for pattern in args.include)
]
logger.info(f"Filtered to include: {[b['name'] for b in filtered_benchmarks]}")
if args.exclude:
filtered_benchmarks = [
b for b in filtered_benchmarks if not any(pattern in b["name"] for pattern in args.exclude)
]
logger.info(f"After exclusion: {[b['name'] for b in filtered_benchmarks]}")
if not filtered_benchmarks:
logger.warning("No benchmarks remaining after filtering!")
return 1
# Prepare common kwargs for benchmarks
benchmark_kwargs = {
"warmup_iterations": args.warmup_iterations,
"measurement_iterations": args.measurement_iterations,
"num_tokens_to_generate": args.num_tokens_to_generate,
}
if args.model_id:
benchmark_kwargs["model_id"] = args.model_id
# Add commit_id if provided
if args.commit_id:
benchmark_kwargs["commit_id"] = args.commit_id
# Run benchmarks
benchmark_results = {}
successful_count = 0
for benchmark_info in filtered_benchmarks:
result = run_single_benchmark(benchmark_info, args.output_dir, logger, **benchmark_kwargs)
benchmark_results[benchmark_info["name"]] = result
if result is not None:
successful_count += 1
# Generate summary report
summary_file = generate_summary_report(args.output_dir, benchmark_results, logger, benchmark_run_uuid)
# Upload results to HuggingFace Dataset if requested
upload_run_id = None
if args.push_to_hub:
logger.info("=" * 60)
logger.info("UPLOADING TO HUGGINGFACE DATASET")
logger.info("=" * 60)
# Use provided run_id or fallback to benchmark run UUID
effective_run_id = args.run_id or benchmark_run_uuid
upload_run_id = upload_results_to_hf_dataset(
output_dir=args.output_dir,
summary_file=summary_file,
dataset_name=args.push_to_hub,
run_id=effective_run_id,
token=args.token,
logger=logger,
)
if upload_run_id:
logger.info(f"Upload completed with run ID: {upload_run_id}")
else:
logger.warning("Upload failed - continuing with local results")
# Final summary
total_benchmarks = len(filtered_benchmarks)
failed_count = total_benchmarks - successful_count
logger.info("=" * 60)
logger.info("BENCHMARK RUN SUMMARY")
logger.info("=" * 60)
logger.info(f"Total benchmarks: {total_benchmarks}")
logger.info(f"Successful: {successful_count}")
logger.info(f"Failed: {failed_count}")
logger.info(f"Output directory: {args.output_dir}")
logger.info(f"Summary report: {summary_file}")
if args.push_to_hub:
if upload_run_id:
logger.info(f"HuggingFace Dataset: {args.push_to_hub}")
logger.info(f"Run ID: {upload_run_id}")
logger.info(
f"View results: https://huggingface.co/datasets/{args.push_to_hub}/tree/main/{datetime.now().strftime('%Y-%m-%d')}/runs/{upload_run_id}"
)
else:
logger.warning("Upload to HuggingFace Dataset failed")
if failed_count > 0:
logger.warning(f"{failed_count} benchmark(s) failed. Check logs for details.")
return 1
else:
logger.info("All benchmarks completed successfully!")
return 0
except Exception as e:
logger.error(f"Benchmark run failed: {e}")
import traceback
logger.debug(traceback.format_exc())
return 1
if __name__ == "__main__":
sys.exit(main())
dataset_id = args.push_result_to_dataset
if dataset_id is not None and len(results) > 0:
runner.push_results_to_hub(dataset_id, results, timestamp)

View File

@ -54,12 +54,10 @@ NOT_DEVICE_TESTS = {
"test_gradient_checkpointing_backward_compatibility",
"test_gradient_checkpointing_enable_disable",
"test_torch_save_load",
"test_initialization",
"test_forward_signature",
"test_model_get_set_embeddings",
"test_model_main_input_name",
"test_correct_missing_keys",
"test_tie_model_weights",
"test_can_use_safetensors",
"test_load_save_without_tied_weights",
"test_tied_weights_keys",
@ -89,6 +87,10 @@ def pytest_configure(config):
config.addinivalue_line("markers", "not_device_test: mark the tests always running on cpu")
config.addinivalue_line("markers", "torch_compile_test: mark test which tests torch compile functionality")
config.addinivalue_line("markers", "torch_export_test: mark test which tests torch export functionality")
config.addinivalue_line("markers", "flash_attn_test: mark test which tests flash attention functionality")
config.addinivalue_line("markers", "flash_attn_3_test: mark test which tests flash attention 3 functionality")
os.environ["DISABLE_SAFETENSORS_CONVERSION"] = "true"
def pytest_collection_modifyitems(items):

View File

@ -9,11 +9,14 @@ SHELL ["sh", "-lc"]
# The following `ARG` are mainly used to specify the versions explicitly & directly in this docker file, and not meant
# to be used as arguments for docker build (so far).
ARG PYTORCH='2.8.0'
ARG PYTORCH='2.9.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# Disable kernel mapping for now until all tests pass
ENV DISABLE_KERNEL_MAPPING=1
# This needs to be compatible with the above `PYTORCH`.
ARG TORCHCODEC='0.8.0'
ARG FLASH_ATTN='false'
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg git-lfs
@ -23,14 +26,48 @@ RUN python3 -m pip install --no-cache-dir --upgrade pip
ARG REF=main
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev]
# 1. Put several commands in a single `RUN` to avoid image/layer exporting issue. Could be revised in the future.
# 2. Regarding `torch` part, We might need to specify proper versions for `torchvision` and `torchaudio`.
# Currently, let's not bother to specify their versions explicitly (so installed with their latest release versions).
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev,onnxruntime] && [ ${#PYTORCH} -gt 0 -a "$PYTORCH" != "pre" ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch'; echo "export VERSION='$VERSION'" >> ~/.profile && echo torch=$VERSION && [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA || python3 -m pip install --no-cache-dir -U --pre torch torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/$CUDA
# 2. For `torchcodec`, use `cpu` as we don't have `libnvcuvid.so` on the host runner. See https://github.com/meta-pytorch/torchcodec/issues/912
# **Important**: We need to specify `torchcodec` version if the torch version is not the latest stable one.
# 3. `set -e` means "exit immediately if any command fails".
RUN set -e; \
# Determine torch version
if [ ${#PYTORCH} -gt 0 ] && [ "$PYTORCH" != "pre" ]; then \
VERSION="torch==${PYTORCH}.*"; \
TORCHCODEC_VERSION="torchcodec==${TORCHCODEC}.*"; \
else \
VERSION="torch"; \
TORCHCODEC_VERSION="torchcodec"; \
fi; \
\
# Log the version being installed
echo "Installing torch version: $VERSION"; \
\
# Install PyTorch packages
if [ "$PYTORCH" != "pre" ]; then \
python3 -m pip install --no-cache-dir -U \
$VERSION \
torchvision \
torchaudio \
--extra-index-url https://download.pytorch.org/whl/$CUDA; \
# We need to specify the version if the torch version is not the latest stable one.
python3 -m pip install --no-cache-dir -U \
$TORCHCODEC_VERSION --extra-index-url https://download.pytorch.org/whl/cpu; \
else \
python3 -m pip install --no-cache-dir -U --pre \
torch \
torchvision \
torchaudio \
--extra-index-url https://download.pytorch.org/whl/nightly/$CUDA; \
python3 -m pip install --no-cache-dir -U --pre \
torchcodec --extra-index-url https://download.pytorch.org/whl/nightly/cpu; \
fi
RUN python3 -m pip install --no-cache-dir -U timm
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git || echo "Don't install detectron2 with nightly torch"
RUN python3 -m pip install --no-cache-dir pytesseract
@ -55,7 +92,7 @@ RUN python3 -m pip install --no-cache-dir bitsandbytes
RUN python3 -m pip install --no-cache-dir quanto
# After using A10 as CI runner, let's run FA2 tests
RUN [ "$PYTORCH" != "pre" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
RUN [ "$FLASH_ATTN" != "false" ] && python3 -m pip uninstall -y ninja && python3 -m pip install --no-cache-dir ninja && python3 -m pip install flash-attn --no-cache-dir --no-build-isolation || echo "Don't install FA2 with nightly torch"
# TODO (ydshieh): check this again
# `quanto` will install `ninja` which leads to many `CUDA error: an illegal memory access ...` in some model tests

View File

@ -10,7 +10,7 @@ RUN apt-get -y update && apt-get install -y libsndfile1-dev && apt install -y te
# Torch needs to be installed before deepspeed
RUN python3 -m pip install --no-cache-dir ./transformers[deepspeed]
RUN python3 -m pip install --no-cache-dir torchvision git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install --no-cache-dir --no-build-isolation torchvision git+https://github.com/facebookresearch/detectron2.git pytesseract
RUN python3 -m pip install -U "itsdangerous<2.1.0"
# Test if the image could successfully build the doc. before publishing the image

View File

@ -1,4 +1,4 @@
FROM rocm/pytorch:rocm6.4.1_ubuntu24.04_py3.12_pytorch_release_2.7.1
FROM rocm/pytorch:rocm7.0.2_ubuntu24.04_py3.12_pytorch_release_2.7.1
LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
@ -10,8 +10,8 @@ RUN apt update && \
RUN git lfs install
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy
RUN python3 -m pip install --no-cache-dir --upgrade importlib-metadata setuptools ninja git+https://github.com/facebookresearch/detectron2.git pytesseract "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy importlib-metadata setuptools wheel ninja pytesseract "itsdangerous<2.1.0"
RUN python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git
ARG REF=main
WORKDIR /
@ -39,6 +39,7 @@ RUN python3 -m pip install --no-cache-dir "torchcodec==0.5"
# Install flash attention from source. Tested with commit 6387433156558135a998d5568a9d74c1778666d8
RUN git clone https://github.com/ROCm/flash-attention/ -b tridao && \
cd flash-attention && \
GPU_ARCHS="gfx942" python setup.py install
GPU_ARCHS="gfx942" python setup.py install
# GPU_ARCHS builds for MI300, MI325 but not MI355: we would need to add `;gfx950` but it takes too long to build.
RUN python3 -m pip install --no-cache-dir einops

View File

@ -29,7 +29,7 @@ RUN python3 -m pip uninstall -y apex torch torchvision torchaudio
RUN python3 -m pip install torch==$PYTORCH torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO --index-url https://download.pytorch.org/whl/rocm$ROCM --no-cache-dir
# Pre-build DeepSpeed, so it's be ready for testing (to avoid timeout)
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache-dir -v --disable-pip-version-check 2>&1
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache-dir -v --disable-pip-version-check 2>&1
ARG REF=main
WORKDIR /

View File

@ -21,7 +21,7 @@ RUN python3 -m pip install --no-cache-dir './transformers[deepspeed-testing]' 'p
# Install latest release PyTorch
# (PyTorch must be installed before pre-compiling any DeepSpeed c++/cuda ops.)
# (https://www.deepspeed.ai/tutorials/advanced-install/#pre-install-deepspeed-ops)
RUN python3 -m pip uninstall -y torch torchvision torchaudio && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip uninstall -y torch torchvision torchaudio torchcodec && python3 -m pip install --no-cache-dir -U torch==$PYTORCH torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -43,7 +43,7 @@ RUN python3 -m pip uninstall -y deepspeed
# This has to be run (again) inside the GPU VMs running the tests.
# The installation works here, but some tests fail, if we don't pre-build deepspeed again in the VMs running the tests.
# TODO: Find out why test fail.
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --global-option="build_ext" --global-option="-j8" --no-cache -v --disable-pip-version-check 2>&1
RUN DS_BUILD_CPU_ADAM=1 DS_BUILD_FUSED_ADAM=1 python3 -m pip install deepspeed --no-build-isolation --config-settings="--build-option=build_ext" --config-settings="--build-option=-j8" --no-cache -v --disable-pip-version-check 2>&1
# `kernels` may give different outputs (within 1e-5 range) even with the same model (weights) and the same inputs
RUN python3 -m pip uninstall -y kernels

View File

@ -3,11 +3,10 @@ LABEL maintainer="Hugging Face"
SHELL ["/bin/bash", "-c"]
ARG PYTHON_VER=3.11
ARG PYTHON_VER=3.12
ENV TORCH_DEVICE_BACKEND_AUTOLOAD=0
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get remove -y python3.10 && apt-get autoremove -y
RUN apt-get update && \
apt-get install -y software-properties-common && \
add-apt-repository -y ppa:deadsnakes/ppa && \
@ -23,7 +22,6 @@ RUN apt-get update && \
apt-utils \
build-essential \
ca-certificates \
clinfo \
curl \
git \
git-lfs \
@ -35,7 +33,6 @@ RUN apt-get update && \
rsync \
sudo \
libnl-genl-3-200 \
xpu-smi \
unzip \
ffmpeg \
tesseract-ocr \
@ -45,34 +42,47 @@ RUN apt-get update && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN apt-get update && \
apt-get install -y \
linux-headers-$(uname -r) \
linux-modules-extra-$(uname -r) \
linux-headers-$(uname -r) linux-modules-extra-$(uname -r) \
flex bison \
intel-fw-gpu intel-i915-dkms xpu-smi \
intel-fw-gpu intel-i915-dkms xpu-smi intel-ocloc clinfo\
intel-opencl-icd libze-intel-gpu1 libze1 \
intel-media-va-driver-non-free libmfx-gen1 libvpl2 \
libegl-mesa0 libegl1-mesa libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libegl-mesa0 libegl1 libegl1-mesa-dev libgbm1 libgl1-mesa-dev libgl1-mesa-dri \
libglapi-mesa libglx-mesa0 libigdgmm12 libxatracker2 mesa-va-drivers \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo clinfo intel-ocloc \
mesa-vdpau-drivers mesa-vulkan-drivers va-driver-all vainfo hwinfo \
libigc-dev intel-igc-cm libigdfcl-dev libigfxcmrt-dev libze-dev && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
RUN pip install --upgrade pip
RUN pip install triton==3.3.0
# Use virtual env because Ubuntu-24 does not allowed pip on original python
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.local/bin:$PATH"
ENV VIRTUAL_ENV="/opt/venv"
ENV UV_PYTHON_INSTALL_DIR=/opt/uv/python
RUN uv venv --python ${PYTHON_VER} --seed ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"
RUN pip install torch==2.7.0 torchvision==0.22.0 torchaudio==2.7.0 --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install --upgrade pip wheel
RUN pip install triton==3.4.0
RUN pip install evaluate torchdata pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree
RUN pip install gguf hqq compressed_tensors gptqmodel mergekit autoawq deepspeed torchao onnx
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft
RUN pip install torch==2.8.0+xpu torchvision==0.23.0+xpu torchaudio==2.8.0+xpu --index-url https://download.pytorch.org/whl/xpu --no-cache-dir
RUN pip install torchcodec torchdata --no-cache-dir
RUN pip install evaluate pyctcdecode pytesseract decord galore-torch fire scipy scikit-learn sentencepiece sacremoses nltk rouge_score librosa soundfile g2p_en mpi4py requests_mock
RUN pip install pretty_midi essentia resampy Levenshtein av sacrebleu phonemizer invisible_watermark schedulefree setuptools
RUN pip install gptqmodel --no-build-isolation
RUN pip install gguf hqq compressed_tensors autoawq deepspeed torchao onnx auto_round
RUN pip install hf_transfer huggingface-hub hf-doc-builder datasets optimum-quanto timm transformers accelerate optimum peft diffusers trl kernels
# install liger-kernel
RUN pip install git+https://github.com/linkedin/Liger-Kernel.git --extra-index-url https://download.pytorch.org/whl/test/xpu
# install mergekit
RUN pip install --break-system-packages git+https://github.com/arcee-ai/mergekit.git@v0.1.3
# install bitsandbytes
RUN pip install git+https://github.com/bitsandbytes-foundation/bitsandbytes.git

View File

@ -12,8 +12,6 @@ SHELL ["sh", "-lc"]
ARG PYTORCH='2.8.0'
# Example: `cu102`, `cu113`, etc.
ARG CUDA='cu126'
# Disable kernel mapping for quantization tests
ENV DISABLE_KERNEL_MAPPING=1
RUN apt update
RUN apt install -y git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-pip ffmpeg
@ -26,7 +24,7 @@ RUN [ ${#PYTORCH} -gt 0 ] && VERSION='torch=='$PYTORCH'.*' || VERSION='torch';
RUN echo torch=$VERSION
# `torchvision` and `torchaudio` should be installed along with `torch`, especially for nightly build.
# Currently, let's just use their latest releases (when `torch` is installed with a release version)
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio torchcodec --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir -U $VERSION torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/$CUDA
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
@ -52,7 +50,7 @@ RUN python3 -m pip install --no-cache-dir hqq
RUN python3 -m pip install --no-cache-dir gguf
# Add autoawq for quantization testing
RUN python3 -m pip install --no-cache-dir autoawq[kernels]
RUN python3 -m pip install --no-cache-dir --no-build-isolation autoawq[kernels]
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir optimum-quanto
@ -82,6 +80,9 @@ RUN python3 -m pip uninstall -y flash-attn
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop
# Add fp-quant for quantization testing
RUN python3 -m pip install --no-cache-dir "fp-quant>=0.3.2"
# Low usage or incompatible lib, will enable later on
# # Add aqlm for quantization testing
@ -102,7 +103,3 @@ RUN cd transformers && python3 setup.py develop
# # TODO: create a new workflow to test them
# RUN python3 -m pip install --no-cache-dir flute-kernel==0.4.1
# RUN python3 -m pip install --no-cache-dir git+https://github.com/Dao-AILab/fast-hadamard-transform.git
# Add fp-quant for quantization testing
# Requires py3.11 but our CI runs on 3.9
# RUN python3 -m pip install --no-cache-dir "fp-quant>=0.1.6"

View File

@ -24,7 +24,7 @@ pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to work around it.
Then you need to install our special tool that builds the documentation:
@ -38,7 +38,7 @@ pip install git+https://github.com/huggingface/doc-builder
## Building the documentation
Once you have setup the `doc-builder` and additional packages, you can generate the documentation by
Once you have set up the `doc-builder` and additional packages, you can generate the documentation by
typing the following command:
```bash
@ -295,12 +295,11 @@ Here's an example of a tuple return, comprising several objects:
Due to the rapidly growing repository, it is important to make sure that no files that would significantly weigh down the repository are added. This includes images, videos, and other non-text files. We prefer to leverage a hf.co hosted `dataset` like
the ones hosted on [`hf-internal-testing`](https://huggingface.co/hf-internal-testing) in which to place these files and reference
them by URL. We recommend putting them in the following dataset: [huggingface/documentation-images](https://huggingface.co/datasets/huggingface/documentation-images).
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate your images
to this dataset.
If an external contribution, feel free to add the images to your PR and ask a Hugging Face member to migrate them to this dataset.
## Styling the docstring
We have an automatic script running with the `make style` comment that will make sure that:
We have an automatic script running with the `make style` command that will make sure that:
- the docstrings fully take advantage of the line width
- all code examples are formatted using black, like the code of the Transformers library

View File

@ -123,8 +123,6 @@
title: تشغيل التدريب على Amazon SageMaker
- local: serialization
title: التصدير إلى ONNX
- local: torchscript
title: التصدير إلى TorchScript
- local: notebooks
title: دفاتر الملاحظات مع الأمثلة
- local: community
@ -260,8 +258,6 @@
# title: النماذج
# - local: main_classes/text_generation
# title: توليد النصوص
# - local: main_classes/onnx
# title: ONNX
# - local: main_classes/optimizer_schedules
# title: التحسين
# - local: main_classes/output

View File

@ -52,7 +52,7 @@
<figcaption class="mt-2 text-center text-sm text-gray-500">الصورة توضح مخطط مراحل نموذج Swin.</figcaption>
</div>
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PretrainedConfig.from_pretrained`]:
يسمح لك [`AutoBackbone`] باستخدام النماذج المُدربة مسبقًا كعمود فقري للحصول على خرائط ميزات من مراحل مختلفة من العمود الفقري. يجب عليك تحديد أحد المعلمات التالية في [`~PreTrainedConfig.from_pretrained`]:
* `out_indices` هو فهرس الطبقة التي تريد الحصول على خريطة الميزات منها
* `out_features` هو اسم الطبقة التي تريد الحصول على خريطة الميزات منها
@ -131,10 +131,13 @@
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
> [!WARNING]
> بالنسبة لنماذج PyTorch، تستخدم طريقة `from_pretrained()` `torch.load()` التي تستخدم داخليًا `pickle` والتي يُعرف أنها غير آمنة. بشكل عام، لا تقم مطلقًا بتحميل نموذج قد يكون مصدره مصدرًا غير موثوق به، أو قد يكون تم العبث به. يتم تخفيف هذا الخطر الأمني جزئيًا للنماذج العامة المستضافة على Hub Hugging Face، والتي يتم [فحصها بحثًا عن البرامج الضارة](https://huggingface.co/docs/hub/security-malware) في كل ارتكاب. راجع [توثيق Hub](https://huggingface.co/docs/hub/security) للحصول على أفضل الممارسات مثل [التحقق من التوقيع](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) باستخدام GPG.
>
> لا تتأثر نقاط تفتيش TensorFlow و Flax، ويمكن تحميلها داخل بنيات PyTorch باستخدام `from_tf` و `from_flax` kwargs لطريقة `from_pretrained` للتحايل على هذه المشكلة.
<Tip warning={true}>
بالنسبة لنماذج PyTorch، تستخدم طريقة `from_pretrained()` `torch.load()` التي تستخدم داخليًا `pickle` والتي يُعرف أنها غير آمنة. بشكل عام، لا تقم مطلقًا بتحميل نموذج قد يكون مصدره مصدرًا غير موثوق به، أو قد يكون تم العبث به. يتم تخفيف هذا الخطر الأمني جزئيًا للنماذج العامة المستضافة على Hub Hugging Face، والتي يتم [فحصها بحثًا عن البرامج الضارة](https://huggingface.co/docs/hub/security-malware) في كل ارتكاب. راجع [توثيق Hub](https://huggingface.co/docs/hub/security) للحصول على أفضل الممارسات مثل [التحقق من التوقيع](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) باستخدام GPG.
لا تتأثر نقاط تفتيش TensorFlow و Flax، ويمكن تحميلها داخل بنيات PyTorch باستخدام `from_tf` و `from_flax` kwargs لطريقة `from_pretrained` للتحايل على هذه المشكلة.
</Tip>
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.

View File

@ -230,10 +230,11 @@ The sun.</s>
من هنا، استمر في التدريب كما تفعل مع مهمة نمذجة اللغة القياسية، باستخدام عمود `formatted_chat`.
> [!TIP]
> بشكل افتراضي ، تضيف بعض *tokenizers* رموزًا خاصة مثل `<bos>` و `<eos>` إلى النص الذي تقوم بتقسيمه إلى رموز. يجب أن تتضمن قوالب المحادثة بالفعل جميع الرموز الخاصة التي تحتاجها ، وبالتالي فإن الرموز الخاصة الإضافية ستكون غالبًا غير صحيحة أو مُكررة ، مما سيؤثر سلبًا على أداء النموذج .
>
> لذلك ، إذا قمت بتنسيق النص باستخدام `apply_chat_template(tokenize=False)` ، فيجب تعيين المعامل `add_special_tokens=False` عندما تقوم بتقسيم ذلك النص إلى رموز لاحقًا . إذا كنت تستخدم `apply_chat_template(tokenize=True)` ، فلن تحتاج إلى القلق بشأن ذلك !
<Tip>
بشكل افتراضي ، تضيف بعض *tokenizers* رموزًا خاصة مثل `<bos>` و `<eos>` إلى النص الذي تقوم بتقسيمه إلى رموز. يجب أن تتضمن قوالب المحادثة بالفعل جميع الرموز الخاصة التي تحتاجها ، وبالتالي فإن الرموز الخاصة الإضافية ستكون غالبًا غير صحيحة أو مُكررة ، مما سيؤثر سلبًا على أداء النموذج .
لذلك ، إذا قمت بتنسيق النص باستخدام `apply_chat_template(tokenize=False)` ، فيجب تعيين المعامل `add_special_tokens=False` عندما تقوم بتقسيم ذلك النص إلى رموز لاحقًا . إذا كنت تستخدم `apply_chat_template(tokenize=True)` ، فلن تحتاج إلى القلق بشأن ذلك !
</Tip>
## متقدّم: مدخلات إضافية لِقوالب الدردشة
@ -360,8 +361,9 @@ print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
The current temperature in Paris, France is 22.0 ° Celsius.<|im_end|>
```
> [!TIP]
> لا تستخدم جميع نماذج استخدام الأدوات جميع ميزات استدعاء الأدوات الموضحة أعلاه. يستخدم البعض معرفات استدعاء الأدوات، بينما يستخدم البعض الآخر ببساطة اسم الدالة ويقارن استدعاءات الأدوات بالنتائج باستخدام الترتيب، وهناك عدة نماذج لا تستخدم أيًا منهما ولا تصدر سوى استدعاء أداة واحد في كل مرة لتجنب الارتباك. إذا كنت تريد أن يكون رمزك متوافقًا مع أكبر عدد ممكن من النماذج، فإننا نوصي بهيكلة استدعاءات الأدوات الخاصة بك كما هو موضح هنا، وإعادة نتائج الأدوات بالترتيب الذي أصدرها النموذج. يجب أن تتعامل قوالب الدردشة على كل نموذج مع الباقي.
<Tip>
لا تستخدم جميع نماذج استخدام الأدوات جميع ميزات استدعاء الأدوات الموضحة أعلاه. يستخدم البعض معرفات استدعاء الأدوات، بينما يستخدم البعض الآخر ببساطة اسم الدالة ويقارن استدعاءات الأدوات بالنتائج باستخدام الترتيب، وهناك عدة نماذج لا تستخدم أيًا منهما ولا تصدر سوى استدعاء أداة واحد في كل مرة لتجنب الارتباك. إذا كنت تريد أن يكون رمزك متوافقًا مع أكبر عدد ممكن من النماذج، فإننا نوصي بهيكلة استدعاءات الأدوات الخاصة بك كما هو موضح هنا، وإعادة نتائج الأدوات بالترتيب الذي أصدرها النموذج. يجب أن تتعامل قوالب الدردشة على كل نموذج مع الباقي.
</Tip>
### فهم مخططات الأدوات
@ -512,8 +514,9 @@ print(gen_text)
إن مُدخل documents للتوليد القائم على الاسترجاع غير مدعوم على نطاق واسع، والعديد من النماذج لديها قوالب دردشة تتجاهل هذا المُدخل ببساطة.
للتحقق مما إذا كان النموذج يدعم مُدخل `documents`، يمكنك قراءة بطاقة النموذج الخاصة به، أو `print(tokenizer.chat_template)` لمعرفة ما إذا كان مفتاح `documents` مستخدمًا في أي مكان.
> [!TIP]
> ومع ذلك، فإن أحد فئات النماذج التي تدعمه هي [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) و [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-pluse-08-2024) من Cohere، من خلال قالب الدردشة rag الخاص بهم. يمكنك رؤية أمثلة إضافية على التوليد باستخدام هذه الميزة في بطاقات النموذج الخاصة بهم.
<Tip>
ومع ذلك، فإن أحد فئات النماذج التي تدعمه هي [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) و [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-pluse-08-2024) من Cohere، من خلال قالب الدردشة rag الخاص بهم. يمكنك رؤية أمثلة إضافية على التوليد باستخدام هذه الميزة في بطاقات النموذج الخاصة بهم.
</Tip>
## متقدم: كيف تعمل قوالب الدردشة؟
يتم تخزين قالب الدردشة للنموذج في الخاصية `tokenizer.chat_template`. إذا لم يتم تعيين قالب دردشة، فسيتم استخدام القالب الافتراضي لفئة النموذج هذه بدلاً من ذلك. دعونا نلقي نظرة على قالب دردشة `Zephyr`، ولكن لاحظ أن هذا القالب مُبسّط قليلاً عن القالب الفعلي!
@ -584,8 +587,9 @@ tokenizer.push_to_hub("model_name") # تحميل القالب الجديد إل
يتم استدعاء الدالة [`~PreTrainedTokenizer.apply_chat_template`] الذي نستخدم قالب الدردشة الخاص بك بواسطة فئة [`TextGenerationPipeline`] لذلك بمجرد تعيين قالب الدردشة الصحيح، سيصبح نموذجك متوافقًا تلقائيًا مع [`TextGenerationPipeline`].
> [!TIP]
> إذا كنت تُجري ضبطًا دقيقًا لنموذج للدردشة، بالإضافة إلى تعيين قالب دردشة، فربما يجب عليك إضافة أي رموز تحكم دردشة جديدة كرموز خاصة في المجزىء اللغوي. لا يتم تقسيم الرموز الخاصة أبدًا، مما يضمن معالجة رموز التحكم الخاصة بك دائمًا كرموز فردية بدلاً من تجزئتها إلى أجزاء. يجب عليك أيضًا تعيين خاصية `eos_token` للمجزىء اللغوي إلى الرمز الذي يُشير إلى نهاية توليدات المساعد في قالبك. سيضمن هذا أن أدوات توليد النصوص يمكنها تحديد وقت إيقاف توليد النص بشكل صحيح.
<Tip>
إذا كنت تُجري ضبطًا دقيقًا لنموذج للدردشة، بالإضافة إلى تعيين قالب دردشة، فربما يجب عليك إضافة أي رموز تحكم دردشة جديدة كرموز خاصة في المجزىء اللغوي. لا يتم تقسيم الرموز الخاصة أبدًا، مما يضمن معالجة رموز التحكم الخاصة بك دائمًا كرموز فردية بدلاً من تجزئتها إلى أجزاء. يجب عليك أيضًا تعيين خاصية `eos_token` للمجزىء اللغوي إلى الرمز الذي يُشير إلى نهاية توليدات المساعد في قالبك. سيضمن هذا أن أدوات توليد النصوص يمكنها تحديد وقت إيقاف توليد النص بشكل صحيح.
</Tip>
### لماذا تحتوي بعض النماذج على قوالب متعددة؟
تستخدم بعض النماذج قوالب مختلفة لحالات استخدام مختلفة. على سبيل المثال، قد تستخدم قالبًا واحدًا للدردشة العادية وآخر لاستخدام الأدوات، أو التوليد القائم على الاسترجاع. في هذه الحالات، تكون `tokenizer.chat_template` قاموسًا. يمكن أن يتسبب هذا في بعض الارتباك، وحيثما أمكن، نوصي باستخدام قالب واحد لجميع حالات الاستخدام. يمكنك استخدام عبارات Jinja مثل `if tools is defined` وتعريفات `{% macro %}` لتضمين مسارات تعليمات برمجية متعددة بسهولة في قالب واحد.
@ -636,8 +640,10 @@ I'm doing great!<|im_end|>
## متقدم: نصائح لكتابة القوالب
> [!TIP]
> أسهل طريقة للبدء في كتابة قوالب Jinja هي إلقاء نظرة على بعض القوالب الموجودة. يمكنك استخدام `print(tokenizer.chat_template)` لأي نموذج دردشة لمعرفة القالب الذي يستخدمه. بشكل عام، تحتوي النماذج التي تدعم استخدام الأدوات على قوالب أكثر تعقيدًا بكثير من النماذج الأخرى - لذلك عندما تبدأ للتو، فمن المحتمل أنها مثال سيئ للتعلم منه! يمكنك أيضًا إلقاء نظرة على [وثائق Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) للحصول على تفاصيل حول تنسيق Jinja العام وتركيبه.
<Tip>
أسهل طريقة للبدء في كتابة قوالب Jinja هي إلقاء نظرة على بعض القوالب الموجودة. يمكنك استخدام `print(tokenizer.chat_template)` لأي نموذج دردشة لمعرفة القالب الذي يستخدمه. بشكل عام، تحتوي النماذج التي تدعم استخدام الأدوات على قوالب أكثر تعقيدًا بكثير من النماذج الأخرى - لذلك عندما تبدأ للتو، فمن المحتمل أنها مثال سيئ للتعلم منه! يمكنك أيضًا إلقاء نظرة على [وثائق Jinja](https://jinja.palletsprojects.com/en/3.1.x/templates/#synopsis) للحصول على تفاصيل حول تنسيق Jinja العام وتركيبه.
</Tip>
تُطابق قوالب Jinja في `transformers` قوالب Jinja في أي مكان آخر. الشيء الرئيسي الذي يجب معرفته هو أن سجل الدردشة سيكون متاحًا داخل قالبك كمتغير يسمى `messages`. ستتمكن من الوصول إلى `messages` في قالبك تمامًا كما يمكنك في Python، مما يعني أنه يمكنك التكرار خلاله باستخدام `{% for message in messages %}` أو الوصول إلى رسائل فردية باستخدام `{{ messages[0] }}`، على سبيل المثال.
@ -674,8 +680,11 @@ I'm doing great!<|im_end|>
- **الرموز الخاصة** مثل `bos_token` و `eos_token`. يتم استخراجها من `tokenizer.special_tokens_map`. ستختلف الرموز الدقيقة المتاحة داخل كل قالب اعتمادًا على المجزىء اللغوي الأصلي.
> [!TIP]
> يمكنك في الواقع تمرير أي `kwarg` إلى `apply_chat_template`، وستكون متاحة داخل القالب كمتغير. بشكل عام، نوصي بمحاولة الالتزام بالمتغيرات الأساسية المذكورة أعلاه، لأن ذلك سيجعل نموذجك أكثر صعوبة في الاستخدام إذا كان على المستخدمين كتابة تعليمات برمجية مخصصة لتمرير `kwargs` خاصة بالنموذج. ومع ذلك، فنحن نُدرك أن هذا المجال يتحرك بسرعة، لذلك إذا كانت لديك حالة استخدام جديدة لا تتناسب مع واجهة برمجة التطبيقات الأساسية، فلا تتردد في استخدام `kwarg` معامل جديد لها! إذا أصبح `kwarg` المعامل الجديد شائعًا، فقد نقوم بترقيته إلى واجهة برمجة التطبيقات الأساسية وإنشاء وتوثيق الخاص به.
<Tip>
يمكنك في الواقع تمرير أي `kwarg` إلى `apply_chat_template`، وستكون متاحة داخل القالب كمتغير. بشكل عام، نوصي بمحاولة الالتزام بالمتغيرات الأساسية المذكورة أعلاه، لأن ذلك سيجعل نموذجك أكثر صعوبة في الاستخدام إذا كان على المستخدمين كتابة تعليمات برمجية مخصصة لتمرير `kwargs` خاصة بالنموذج. ومع ذلك، فنحن نُدرك أن هذا المجال يتحرك بسرعة، لذلك إذا كانت لديك حالة استخدام جديدة لا تتناسب مع واجهة برمجة التطبيقات الأساسية، فلا تتردد في استخدام `kwarg` معامل جديد لها! إذا أصبح `kwarg` المعامل الجديد شائعًا، فقد نقوم بترقيته إلى واجهة برمجة التطبيقات الأساسية وإنشاء وتوثيق الخاص به.
</Tip>
### دوال قابلة للاستدعاء

View File

@ -188,8 +188,11 @@ pipe = pipeline("text-generation", "meta-llama/Meta-Llama-3-8B-Instruct", device
### اعتبارات الأداء
> [!TIP]
> للحصول على دليل أكثر شمولاً حول أداء نموذج اللغة والتحسين، راجع [تحسين استدلال LLM](./llm_optims).
<Tip>
للحصول على دليل أكثر شمولاً حول أداء نموذج اللغة والتحسين، راجع [تحسين استدلال LLM](./llm_optims).
</Tip>
كقاعدة عامة، ستكون نماذج المحادثة الأكبر حجمًا أبطأ في توليد النصوص بالإضافة إلى احتياجها لذاكرة أكبرة. من الممكن أن تكون أكثر تحديدًا بشأن هذا: إن توليد النص من نموذج دردشة أمر غير عادي في أنه يخضع لقيود **سعة الذاكرة** بدلاً من قوة الحوسبة، لأن كل معلمة نشطة يجب قراءتها من الذاكرة لكل رمز ينشئه النموذج. وهذا يعني أن عدد الرموز في الثانية التي يمكنك توليدها من نموذج الدردشة يتناسب بشكل عام مع إجمالي حجم الذاكرة التي بوجد بها ا، مقسومًا على حجم النموذج.

View File

@ -54,26 +54,27 @@ DistilBertConfig {
```
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PretrainedConfig.from_pretrained`] :
يمكن تعديل خصائص النموذج المدرب مسبقًا في دالة [`~PreTrainedConfig.from_pretrained`] :
```py
>>> my_config = DistilBertConfig.from_pretrained("distilbert/distilbert-base-uncased", activation="relu", attention_dropout=0.4)
```
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PretrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
بمجرد أن تصبح راضيًا عن تكوين نموذجك، يمكنك حفظه باستخدام [`~PreTrainedConfig.save_pretrained`]. يتم تخزين ملف التكوين الخاص بك على أنه ملف JSON في دليل الحفظ المحدد:
```py
>>> my_config.save_pretrained(save_directory="./your_model_save_path")
```
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PretrainedConfig.from_pretrained`]:
لإعادة استخدام ملف التكوين، قم بتحميله باستخدام [`~PreTrainedConfig.from_pretrained`]:
```py
>>> my_config = DistilBertConfig.from_pretrained("./your_model_save_path/config.json")
```
> [!TIP]
> يمكنك أيضًا حفظ ملف التكوين كقاموس أو حتى كفرق بين خصائص التكوين المُعدّلة والخصائص التكوين الافتراضية! راجع وثائق [التكوين](main_classes/configuration) لمزيد من التفاصيل.
<Tip>
يمكنك أيضًا حفظ ملف التكوين كقاموس أو حتى كفرق بين خصائص التكوين المُعدّلة والخصائص التكوين الافتراضية! راجع وثائق [التكوين](main_classes/configuration) لمزيد من التفاصيل.
</Tip>
## النموذج
@ -132,8 +133,11 @@ DistilBertConfig {
يدعم كلا النوعين من المجزئات طرقًا شائعة مثل الترميز وفك الترميز، وإضافة رموز جديدة، وإدارة الرموز الخاصة.
> [!WARNING]
> لا يدعم كل نموذج مجزئ النصوص سريع. الق نظرة على هذا [جدول](index#supported-frameworks) للتحقق مما إذا كان النموذج يحتوي على دعم مجزئ النصوص سريع.
<Tip warning={true}>
لا يدعم كل نموذج مجزئ النصوص سريع. الق نظرة على هذا [جدول](index#supported-frameworks) للتحقق مما إذا كان النموذج يحتوي على دعم مجزئ النصوص سريع.
</Tip>
إذا دربت مجزئ النصوص خاص بك، فيمكنك إنشاء واحد من *قاموسك*:```
@ -159,8 +163,9 @@ DistilBertConfig {
>>> fast_tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert/distilbert-base-uncased")
```
> [!TIP]
> افتراضيًا، سيحاول [`AutoTokenizer`] تحميل مجزئ نصوص سريع. يمكنك تعطيل هذا السلوك عن طريق تعيين `use_fast=False` في `from_pretrained`.
<Tip>
افتراضيًا، سيحاول [`AutoTokenizer`] تحميل مجزئ نصوص سريع. يمكنك تعطيل هذا السلوك عن طريق تعيين `use_fast=False` في `from_pretrained`.
</Tip>
## معالج الصور
@ -192,8 +197,11 @@ ViTImageProcessor {
}
```
> [!TIP]
> إذا كنت لا تبحث عن أي تخصيص، فما عليك سوى استخدام طريقة `from_pretrained` لتحميل معلمات معالج الصور الافتراضية للنموذج.
<Tip>
إذا كنت لا تبحث عن أي تخصيص، فما عليك سوى استخدام طريقة `from_pretrained` لتحميل معلمات معالج الصور الافتراضية للنموذج.
</Tip>
عدل أيًا من معلمات [`ViTImageProcessor`] لإنشاء معالج الصور المخصص الخاص بك:
@ -326,8 +334,9 @@ Wav2Vec2FeatureExtractor {
}
```
> [!TIP]
> إذا لم تكن بحاجة لأي تخصيص، فاستخدم فقط طريقة `from_pretrained` لتحميل معلمات مستخرج الميزات الافتراضية للنموذج.
<Tip>
إذا لم تكن بحاجة لأي تخصيص، فاستخدم فقط طريقة `from_pretrained` لتحميل معلمات مستخرج الميزات الافتراضية للنموذج.
</Tip>
قم بتعديل أي من معلمات [`Wav2Vec2FeatureExtractor`] لإنشاء مستخرج ميزات مخصص:

View File

@ -11,17 +11,20 @@
لنبدأ بكتابة إعدادات النموذج. إعدادات النموذج هو كائنٌ يحتوي على جميع المعلومات اللازمة لبنائه. كما سنرى لاحقًا، يتطلب النموذج كائن `config` لتهيئته، لذا يجب أن يكون هذا الكائن كاملاً.
> [!TIP]
> تتبع النماذج في مكتبة `transformers` اتفاقية قبول كائن `config` في دالة `__init__` الخاصة بها، ثم تمرر كائن `config` بالكامل إلى الطبقات الفرعية في النموذج، بدلاً من تقسيمه إلى معامﻻت متعددة. يؤدي كتابة نموذجك بهذا الأسلوب إلى كود أبسط مع "مصدر حقيقة" واضح لأي فرط معلمات، كما يسهل إعادة استخدام الكود من نماذج أخرى في `transformers`.
<Tip>
تتبع النماذج في مكتبة `transformers` اتفاقية قبول كائن `config` في دالة `__init__` الخاصة بها، ثم تمرر كائن `config` بالكامل إلى الطبقات الفرعية في النموذج، بدلاً من تقسيمه إلى معامﻻت متعددة. يؤدي كتابة نموذجك بهذا الأسلوب إلى كود أبسط مع "مصدر حقيقة" واضح لأي فرط معلمات، كما يسهل إعادة استخدام الكود من نماذج أخرى في `transformers`.
</Tip>
في مثالنا، سنعدّل بعض الوسائط في فئة ResNet التي قد نرغب في ضبطها. ستعطينا التكوينات المختلفة أنواع ResNets المختلفة الممكنة. سنقوم بتخزين هذه الوسائط بعد التحقق من صحته.
```python
from transformers import PretrainedConfig
from transformers import PreTrainedConfig
from typing import List
class ResnetConfig(PretrainedConfig):
class ResnetConfig(PreTrainedConfig):
model_type = "resnet"
def __init__(
@ -55,11 +58,11 @@ class ResnetConfig(PretrainedConfig):
```
الأشياء الثلاثة المهمة التي يجب تذكرها عند كتابة تكوينك الخاص هي:
- يجب أن ترث من `PretrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PretrainedConfig` أي معامﻻت إضافية kwargs،
- يجب أن ترث من `PreTrainedConfig`،
- يجب أن تقبل دالة `__init__` الخاصة بـ `PreTrainedConfig` أي معامﻻت إضافية kwargs،
- يجب تمرير هذه المعامﻻت الإضافية إلى دالة `__init__` فى الفئة الأساسية الاعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PretrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
يضمن الإرث حصولك على جميع الوظائف من مكتبة 🤗 Transformers، في حين أن القيدين التانى والثالث يأتيان من حقيقة أن `PreTrainedConfig` لديه المزيد من الحقول أكثر من تلك التي تقوم بتعيينها. عند إعادة تحميل تكوين باستخدام طريقة `from_pretrained`، يجب أن يقبل تكوينك هذه الحقول ثم إرسالها إلى الفئة الأساسية الأعلى.
تحديد `model_type` لتكوينك (هنا `model_type="resnet"`) ليس إلزاميًا، ما لم ترغب في
تسجيل نموذجك باستخدام الفئات التلقائية (راجع القسم الأخير).
@ -79,7 +82,7 @@ resnet50d_config.save_pretrained("custom-resnet")
resnet50d_config = ResnetConfig.from_pretrained("custom-resnet")
```
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PretrainedConfig`]، مثل [`~PretrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
يمكنك أيضًا استخدام أي طريقة أخرى من فئة [`PreTrainedConfig`]، مثل [`~PreTrainedConfig.push_to_hub`] لتحميل تكوينك مباشرة إلى Hub.
## كتابة نموذج مخصص
@ -151,8 +154,11 @@ class ResnetModelForImageClassification(PreTrainedModel):
```
في كلتا الحالتين، لاحظ كيف نرث من `PreTrainedModel` ونستدعي مُهيئ الفئة الرئيسية باستخدام `config` (كما تفعل عند إنشاء وحدة `torch.nn.Module` عادية). ليس من الضروري تعريف `config_class` إلا إذا كنت ترغب في تسجيل نموذجك مع الفئات التلقائية (راجع القسم الأخير).
> [!TIP]
> إذا كان نموذجك مشابهًا جدًا لنموذج داخل المكتبة، فيمكنك إعادة استخدام نفس التكوين مثل هذا النموذج.
<Tip>
إذا كان نموذجك مشابهًا جدًا لنموذج داخل المكتبة، فيمكنك إعادة استخدام نفس التكوين مثل هذا النموذج.
</Tip>
يمكن لنموذجك أن يعيد أي شيء تريده، ولكن إعادة قاموس مثلما فعلنا لـ
`ResnetModelForImageClassification`، مع تضمين الخسارة عند تمرير العلامات، سيجعل نموذجك قابلًا للاستخدام مباشرة داخل فئة [`Trainer`]. يعد استخدام تنسيق إخراج آخر أمرًا جيدًا طالما أنك تخطط لاستخدام حلقة تدريب خاصة بك أو مكتبة أخرى للتدريب.
@ -198,8 +204,11 @@ AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassi
## إرسال الكود إلى Hub
> [!WARNING]
> هذا API تجريبي وقد يكون له بعض التغييرات الطفيفة في الإصدارات القادمة.
<Tip warning={true}>
هذا API تجريبي وقد يكون له بعض التغييرات الطفيفة في الإصدارات القادمة.
</Tip>
أولاً، تأكد من تعريف نموذجك بالكامل في ملف `.py`. يمكن أن يعتمد على الاستيراد النسبي لملفات أخرى طالما أن جميع الملفات موجودة في نفس الدليل (لا ندعم الوحدات الفرعية لهذه الميزة حتى الآن). في مثالنا، سنحدد ملف `modeling_resnet.py` وملف `configuration_resnet.py` في مجلد باسم "resnet_model" في دليل العمل الحالي. يحتوي ملف التكوين على كود لـ `ResnetConfig` ويحتوي ملف النمذجة على كود لـ `ResnetModel` و`ResnetModelForImageClassification`.
@ -213,9 +222,12 @@ AutoModelForImageClassification.register(ResnetConfig, ResnetModelForImageClassi
يمكن أن يكون ملف `__init__.py` فارغًا، فهو موجود فقط حتى يتمكن Python من اكتشاف أن `resnet_model` يمكن استخدامه كموديل.
> [!WARNING]
> إذا كنت تقوم بنسخ ملفات النمذجة من المكتبة، فسوف تحتاج إلى استبدال جميع الواردات النسبية في أعلى الملف
> لاستيرادها من حزمة `transformers`.
<Tip warning={true}>
إذا كنت تقوم بنسخ ملفات النمذجة من المكتبة، فسوف تحتاج إلى استبدال جميع الواردات النسبية في أعلى الملف
لاستيرادها من حزمة `transformers`.
</Tip>
لاحظ أنه يمكنك إعادة استخدام (أو توسيع) تكوين/نموذج موجود.
@ -239,18 +251,21 @@ ResnetModelForImageClassification.register_for_auto_class("AutoModelForImageClas
[`AutoConfig`]) ولكن الأمر يختلف بالنسبة للنماذج. قد يكون نموذجك المخصص مناسبًا للعديد من المهام المختلفة، لذلك يجب
تحديد أي من الفئات التلقائية هو الصحيح لنموذجك.
> [!TIP]
> استخدم `register_for_auto_class()` إذا كنت تريد نسخ ملفات الكود. إذا كنت تفضل استخدام الكود على Hub من مستودع آخر،
> فلا تحتاج إلى استدعائه. في الحالات التي يوجد فيها أكثر من فئة تلقائية واحدة، يمكنك تعديل ملف `config.json` مباشرة باستخدام
> الهيكل التالي:
>
> ```json
> "auto_map": {
> "AutoConfig": "<your-repo-name>--<config-name>",
> "AutoModel": "<your-repo-name>--<config-name>",
> "AutoModelFor<Task>": "<your-repo-name>--<config-name>",
> },
> ```
<Tip>
استخدم `register_for_auto_class()` إذا كنت تريد نسخ ملفات الكود. إذا كنت تفضل استخدام الكود على Hub من مستودع آخر،
فلا تحتاج إلى استدعائه. في الحالات التي يوجد فيها أكثر من فئة تلقائية واحدة، يمكنك تعديل ملف `config.json` مباشرة باستخدام
الهيكل التالي:
```json
"auto_map": {
"AutoConfig": "<your-repo-name>--<config-name>",
"AutoModel": "<your-repo-name>--<config-name>",
"AutoModelFor<Task>": "<your-repo-name>--<config-name>",
},
```
</Tip>
بعد ذلك، دعنا نقوم بإنشاء التكوين والنماذج كما فعلنا من قبل:

View File

@ -245,8 +245,11 @@
نماذج اكتشاف الأجسام: ([DetrForObjectDetection]) يتوقع النموذج قائمة من القواميس تحتوي على مفتاح class_labels و boxes حيث تتوافق كل قيمة من المجموعة مع الملصق المتوقع وعدد المربعات المحيطة بكل صورة فردية.
نماذج التعرف التلقائي على الكلام: ([Wav2Vec2ForCTC]) يتوقع النموذج مصفوفة ذات بعد (batch_size, target_length) حيث تتوافق كل قيمة مع الملصق المتوقع لكل رمز فردي.
> [!TIP]
> قد تختلف تسميات كل نموذج، لذا تأكد دائمًا من مراجعة وثائق كل نموذج للحصول على معلومات حول التسميات الخاصة به.
<Tip>
قد تختلف تسميات كل نموذج، لذا تأكد دائمًا من مراجعة وثائق كل نموذج للحصول على معلومات حول التسميات الخاصة به.
</Tip>
لا تقبل النماذج الأساسية ([`BertModel`]) الملصقات ، لأنها نماذج المحول الأساسية، والتي تقوم ببساطة بإخراج الميزات.
### نماذج اللغة الكبيرة large language models (LLM)

View File

@ -48,14 +48,17 @@ pip install 'transformers[torch]'
pip install 'transformers[tf-cpu]'
```
> [!WARNING]
> لمستخدمي M1 / ARM
>
> ستحتاج إلى تثبيت ما يلي قبل تثبيت TensorFLow 2.0
> ```bash
> brew install cmake
> brew install pkg-config
> ```
<Tip warning={true}>
لمستخدمي M1 / ARM
ستحتاج إلى تثبيت ما يلي قبل تثبيت TensorFLow 2.0
```bash
brew install cmake
brew install pkg-config
```
</Tip>
🤗 Transformers وFlax:
@ -114,8 +117,11 @@ pip install -e .
ستقوم هذه الأوامر بربط المجلد الذي قمت باستنساخ المستودع فيه بمسارات مكتبة Python. بمعنى آخر، سيبحث Python داخل المجلد الذي قمت باستنساخه بالإضافة إلى المسارات المعتادة للمكتبات. على سبيل المثال، إذا تم تثبيت حزم Python الخاصة بك عادةً في `~/anaconda3/envs/main/lib/python3.7/site-packages/`, فسيقوم Python أيضًا بالبحث في المجلد الذي قمت باستنساخه: `~/transformers/`.
> [!WARNING]
> يجب عليك الاحتفاظ بمجلد `transformers` إذا كنت تريد الاستمرار في استخدام المكتبة.
<Tip warning={true}>
يجب عليك الاحتفاظ بمجلد `transformers` إذا كنت تريد الاستمرار في استخدام المكتبة.
</Tip>
الآن يمكنك تحديث المستنسخ الخاص بك بسهولة إلى أحدث إصدار من 🤗 Transformers باستخدام الأمر التالي:
@ -142,15 +148,21 @@ conda install conda-forge::transformers
2. متغير البيئة: `HF_HOME`.
3. متغير البيئة: `XDG_CACHE_HOME` + `/huggingface`.
> [!TIP]
> سيستخدم 🤗 Transformers متغيرات البيئة `PYTORCH_TRANSFORMERS_CACHE` أو `PYTORCH_PRETRAINED_BERT_CACHE` إذا كنت قادمًا من إصدار سابق من هذه المكتبة وقمت بتعيين متغيرات البيئة هذه، ما لم تحدد متغير البيئة `TRANSFORMERS_CACHE`.
<Tip>
سيستخدم 🤗 Transformers متغيرات البيئة `PYTORCH_TRANSFORMERS_CACHE` أو `PYTORCH_PRETRAINED_BERT_CACHE` إذا كنت قادمًا من إصدار سابق من هذه المكتبة وقمت بتعيين متغيرات البيئة هذه، ما لم تحدد متغير البيئة `TRANSFORMERS_CACHE`.
</Tip>
## الوضع دون اتصال بالإنترنت
قم بتشغيل 🤗 Transformers في بيئة محمية بجدار حماية أو غير متصلة باستخدام الملفات المخزنة مؤقتًا محليًا عن طريق تعيين متغير البيئة `HF_HUB_OFFLINE=1`.
> [!TIP]
> أضف [🤗 Datasets](https://huggingface.co/docs/datasets/) إلى سير عمل التدريب غير المتصل باستخدام متغير البيئة `HF_DATASETS_OFFLINE=1`.
<Tip>
أضف [🤗 Datasets](https://huggingface.co/docs/datasets/) إلى سير عمل التدريب غير المتصل باستخدام متغير البيئة `HF_DATASETS_OFFLINE=1`.
</Tip>
```bash
HF_DATASETS_OFFLINE=1 HF_HUB_OFFLINE=1 \
@ -227,5 +239,8 @@ model = T5Model.from_pretrained("./path/to/local/directory", local_files_only=Tr
>>> config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
```
> [!TIP]
> راجع قسم [كيفية تنزيل الملفات من Hub](https://huggingface.co/docs/hub/how-to-downstream) لمزيد من التفاصيل حول تنزيل الملفات المخزنة على Hub.
<Tip>
راجع قسم [كيفية تنزيل الملفات من Hub](https://huggingface.co/docs/hub/how-to-downstream) لمزيد من التفاصيل حول تنزيل الملفات المخزنة على Hub.
</Tip>

View File

@ -51,16 +51,19 @@ pip install transformers bitsandbytes>=0.39.0 -q
دعنا نتحدث عن الكود!
> [!TIP]
> إذا كنت مهتمًا بالاستخدام الأساسي لـ LLM، فإن واجهة [`Pipeline`](pipeline_tutorial) عالية المستوى هي نقطة انطلاق رائعة. ومع ذلك، غالبًا ما تتطلب LLMs ميزات متقدمة مثل التكميم والتحكم الدقيق في خطوة اختيار الرمز، والتي يتم تنفيذها بشكل أفضل من خلال [`~generation.GenerationMixin.generate`]. التوليد التلقائي باستخدام LLMs يستهلك الكثير من المواردد ويجب تنفيذه على وحدة معالجة الرسومات للحصول على أداء كافٍ.
<Tip>
إذا كنت مهتمًا بالاستخدام الأساسي لـ LLM، فإن واجهة [`Pipeline`](pipeline_tutorial) عالية المستوى هي نقطة انطلاق رائعة. ومع ذلك، غالبًا ما تتطلب LLMs ميزات متقدمة مثل التكميم والتحكم الدقيق في خطوة اختيار الرمز، والتي يتم تنفيذها بشكل أفضل من خلال [`~generation.GenerationMixin.generate`]. التوليد التلقائي باستخدام LLMs يستهلك الكثير من المواردد ويجب تنفيذه على وحدة معالجة الرسومات للحصول على أداء كافٍ.
</Tip>
أولاً، تحتاج إلى تحميل النموذج.
```py
>>> from transformers import AutoModelForCausalLM
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -110,12 +113,12 @@ pip install transformers bitsandbytes>=0.39.0 -q
هناك العديد من [استراتيجيات التوليد](generation_strategies)، وفي بعض الأحيان قد لا تكون القيم الافتراضية مناسبة لحالتك الاستخدام. إذا لم تكن الإخراج الخاصة بك متوافقة مع ما تتوقعه، فقد قمنا بإنشاء قائمة بأكثر الأخطاء الشائعة وكيفية تجنبها.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")
>>> tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "mistralai/Mistral-7B-v0.1", device_map="auto", load_in_4bit=True
... "mistralai/Mistral-7B-v0.1", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -189,7 +192,7 @@ LLMs هي [معماريات فك التشفير فقط](https://huggingface.co/l
```python
>>> tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-alpha")
>>> model = AutoModelForCausalLM.from_pretrained(
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", load_in_4bit=True
... "HuggingFaceH4/zephyr-7b-alpha", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
>>> set_seed(0)
>>> prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""

View File

@ -231,7 +231,7 @@ flush()
دعنا نرى ما هو استهلاك ذاكرة GPU الذروة الذي يوفره تكميم 4 بت. يمكن تكميم النموذج إلى 4 بت باستخدام نفس واجهة برمجة التطبيقات كما في السابق - هذه المرة عن طريق تمرير `load_in_4bit=True` بدلاً من `load_in_8bit=True`.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", load_in_4bit=True, pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", quantization_config=BitsAndBytesConfig(load_in_4bit=True), pad_token_id=0)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
@ -329,174 +329,6 @@ $$ \textbf{O}_i \leftarrow s^a_{ij} * \textbf{O}_i + s^b_{ij} * \mathbf{V}_{j} \
لنلقِ نظرة على مثال عملي.
يحصل نموذج OctoCoder الخاص بنا الآن على موجه إدخال أطول بشكل كبير يتضمن ما يسمى *موجه النظام*. تُستخدم موجهات النظام لتوجيه LLM إلى مساعد أفضل مصمم لمهام المستخدمين.
فيما يلي، نستخدم موجه النظام الذي سيجعل OctoCoder مساعد ترميز أفضل.
```python
system_prompt = """Below are a series of dialogues between various people and an AI technical assistant.
The assistant tries to be helpful, polite, honest, sophisticated, emotionally aware, and humble but knowledgeable.
The assistant is happy to help with code questions and will do their best to understand exactly what is needed.
It also tries to avoid giving false or misleading information, and it caveats when it isn't entirely sure about the right answer.
That said, the assistant is practical really does its best, and doesn't let caution get too much in the way of being useful.
The Starcoder models are a series of 15.5B parameter models trained on 80+ programming languages from The Stack (v1.2) (excluding opt-out requests).
The model uses Multi Query Attention, was trained using the Fill-in-the-Middle objective, and with 8,192 tokens context window for a trillion tokens of heavily deduplicated data.
-----
Question: Write a function that takes two lists and returns a list that has alternating elements from each input list.
Answer: Sure. Here is a function that does that.
def alternating(list1, list2):
results = []
for i in range(len(list1)):
results.append(list1[i])
results.append(list2[i])
return results
Question: Can you write some test cases for this function?
Answer: Sure, here are some tests.
assert alternating([10, 20, 30], [1, 2, 3]) == [10, 1, 20, 2, 30, 3]
assert alternating([True, False], [4, 5]) == [True, 4, False, 5]
assert alternating([], []) == []
Question: Modify the function so that it returns all input elements when the lists have uneven length. The elements from the longer list should be at the end.
Answer: Here is the modified function.
def alternating(list1, list2):
results = []
for i in range(min(len(list1), len(list2))):
results.append(list1[i])
results.append(list2[i])
if len(list1) > len(list2):
results.extend(list1[i+1:])
else:
results.extend(list2[i+1:])
return results
-----
"""
```
لأغراض التوضيح، سنكرر موجه النظام عشر مرات بحيث يكون طول الإدخال طويلاً بما يكفي لملاحظة وفورات ذاكرة Flash Attention.
نضيف موجه النص الأصلي "سؤال: يرجى كتابة وظيفة في Python تقوم بتحويل البايتات إلى جيجا بايت.
```python
long_prompt = 10 * system_prompt + prompt
```
نقوم بتنفيذ نموذجنا مرة أخرى بدقة bfloat16.
```python
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("bigcode/octocoder")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
```
دعنا الآن نقوم بتشغيل النموذج تمامًا مثلما كان من قبل *بدون اهتمام فلاشي* وقياس متطلبات ذاكرة GPU وقت الذروة ووقت الاستدلال.
```python
import time
start_time = time.time()
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 10.96854019165039 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس الإخراج كما كان من قبل، ولكن هذه المرة، يقوم النموذج بتكرار الإجابة عدة مرات حتى يتم قطعها عند 60 رمزًا. ليس من المستغرب أننا كررنا موجه النظام عشر مرات لأغراض التوضيح وبالتالي قمنا بتشغيل النموذج لتكرار نفسه.
**ملاحظة** لا ينبغي تكرار موجه النظام عشر مرات في التطبيقات الواقعية - مرة واحدة كافية!
دعنا نقيس متطلبات ذاكرة GPU وقت الذروة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
37.668193340301514
```
كما نرى، فإن متطلبات ذاكرة GPU وقت الذروة أعلى بكثير مما كانت عليه في البداية، وهو ما يرجع إلى حد كبير إلى تسلسل الإدخال الأطول. أيضًا، يستغرق التوليد أكثر من دقيقة بقليل الآن.
نستدعي `flush()` لتحرير ذاكرة GPU لتجربتنا التالية.
```python
flush()
```
لمقارنة، دعونا نقوم بتشغيل نفس الدالة، ولكن تمكين الاهتمام فلاش بدلا من ذلك.
للقيام بذلك، نقوم بتحويل النموذج إلى [BetterTransformer](Https://huggingface.co/docs/optimum/bettertransformer/overview) ومن خلال القيام بذلك تمكين PyTorch's [SDPA self-attention](Https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention) والتي بدورها قادرة على استخدام الاهتمام فلاش.
```python
model.to_bettertransformer()
```
الآن نقوم بتشغيل نفس مقتطف التعليمات البرمجية بالضبط كما كان من قبل وتحت الغطاء سوف تستخدم المحولات الاهتمام فلاش.
```py
start_time = time.time()
with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=False, enable_mem_efficient=False):
result = pipe(long_prompt, max_new_tokens=60)[0]["generated_text"][len(long_prompt):]
print(f"Generated in {time.time() - start_time} seconds.")
result
```
**الإخراج**:
```
تم التوليد في 3.0211617946624756 ثانية.
بالتأكيد. إليك وظيفة للقيام بذلك.
def bytes_to_giga(bytes):
return bytes / 1024 / 1024 / 1024
الإجابة: بالتأكيد. إليك وظيفة للقيام بذلك.
ديف
```
نحصل على نفس النتيجة بالضبط كما كان من قبل، ولكن يمكننا ملاحظة تسريع كبير بفضل الاهتمام فلاش.
دعنا نقيس استهلاك الذاكرة لآخر مرة.
```python
bytes_to_giga_bytes(torch.cuda.max_memory_allocated())
```
**الإخراج**:
```
32.617331981658936
```
ونحن تقريبا مرة أخرى إلى ذاكرة GPU الذروة الأصلية لدينا 29GB.
يمكننا أن نلاحظ أننا نستخدم فقط حوالي 100 ميجابايت إضافية من ذاكرة GPU عند تمرير تسلسل إدخال طويل جدًا مع الاهتمام فلاش مقارنة بتمرير تسلسل إدخال قصير كما فعلنا في البداية.
```py
flush()
```
لمزيد من المعلومات حول كيفية استخدام Flash Attention، يرجى الاطلاع على [صفحة doc هذه](Https://huggingface.co/docs/transformers/en/perf_infer_gpu_one#flashattention-2).
## 3. الابتكارات المعمارية
حتى الآن، نظرنا في تحسين الكفاءة الحسابية والذاكرة من خلال:
@ -640,7 +472,7 @@ for _ in range(5):
next_token_id = torch.argmax(next_logits, dim=-1)
print("shape of input_ids", next_token_id.shape)
print("length of key-value cache", len(past_key_values[0][0])) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
print("length of key-value cache", past_key_values.get_seq_length()) # past_key_values are of shape [num_layers, 0 for k, 1 for v, batch_size, length, hidden_dim]
generated_tokens.append(next_token_id.item())
generated_text = tokenizer.batch_decode(generated_tokens)
@ -673,8 +505,11 @@ length of key-value cache 24
> يجب *دائمًا* استخدام ذاكرة التخزين المؤقت للمفاتيح والقيم حيث يؤدي ذلك إلى نتائج متطابقة وزيادة كبيرة في السرعة لتسلسلات الإدخال الأطول. ذاكرة التخزين المؤقت للمفاتيح والقيم ممكّنة بشكل افتراضي في Transformers عند استخدام خط أنابيب النص أو طريقة [`generate`](https://huggingface.co/docs/transformers/main_classes/text_generation).
> [!WARNING]
> لاحظ أنه على الرغم من نصيحتنا باستخدام ذاكرة التخزين المؤقت للمفاتيح والقيم، فقد يكون إخراج نموذج اللغة الكبيرة مختلفًا قليلاً عند استخدامها. هذه خاصية نوى ضرب المصفوفة نفسها - يمكنك قراءة المزيد عنها [هنا](https://github.com/huggingface/transformers/issues/25420#issuecomment-1775317535).
<Tip warning={true}>
لاحظ أنه على الرغم من نصيحتنا باستخدام ذاكرة التخزين المؤقت للمفاتيح والقيم، فقد يكون إخراج نموذج اللغة الكبيرة مختلفًا قليلاً عند استخدامها. هذه خاصية نوى ضرب المصفوفة نفسها - يمكنك قراءة المزيد عنها [هنا](https://github.com/huggingface/transformers/issues/25420#issuecomment-1775317535).
</Tip>
#### 3.2.1 محادثة متعددة الجولات

View File

@ -116,8 +116,11 @@ default_args = {
}
```
> [!TIP]
> إذا كنت تخطط لتشغيل عدة تجارب، من أجل مسح الذاكرة بشكل صحيح بين التجارب، قم بإعادة تشغيل نواة Python بين التجارب.
<Tip>
إذا كنت تخطط لتشغيل عدة تجارب، من أجل مسح الذاكرة بشكل صحيح بين التجارب، قم بإعادة تشغيل نواة Python بين التجارب.
</Tip>
## استخدام الذاكرة في التدريب الأساسي

View File

@ -12,8 +12,11 @@
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
> [!TIP]
> لمشاركة نموذج مع المجتمع، تحتاج إلى حساب على [huggingface.co](https://huggingface.co/join). يمكنك أيضًا الانضمام إلى منظمة موجودة أو إنشاء منظمة جديدة.
<Tip>
لمشاركة نموذج مع المجتمع، تحتاج إلى حساب على [huggingface.co](https://huggingface.co/join). يمكنك أيضًا الانضمام إلى منظمة موجودة أو إنشاء منظمة جديدة.
</Tip>
## ميزات المستودع

View File

@ -51,8 +51,11 @@ peft_model_id = "ybelkada/opt-350m-lora"
model = AutoModelForCausalLM.from_pretrained(peft_model_id)
```
> [!TIP]
> يمكنك تحميل محول PEFT باستخدام فئة `AutoModelFor` أو فئة النموذج الأساسي مثل `OPTForCausalLM` أو `LlamaForCausalLM`.
<Tip>
يمكنك تحميل محول PEFT باستخدام فئة `AutoModelFor` أو فئة النموذج الأساسي مثل `OPTForCausalLM` أو `LlamaForCausalLM`.
</Tip>
يمكنك أيضًا تحميل محول PEFT عن طريق استدعاء طريقة `load_adapter`:
@ -159,8 +162,11 @@ output = model.generate(**inputs)
يدعم محول PEFT فئة [`Trainer`] بحيث يمكنك تدريب محول لحالتك الاستخدام المحددة. فهو يتطلب فقط إضافة بضع سطور أخرى من التعليمات البرمجية. على سبيل المثال، لتدريب محول LoRA:
> [!TIP]
> إذا لم تكن معتادًا على ضبط نموذج دقيق باستخدام [`Trainer`، فراجع البرنامج التعليمي](training) لضبط نموذج مُدرب مسبقًا.
<Tip>
إذا لم تكن معتادًا على ضبط نموذج دقيق باستخدام [`Trainer`، فراجع البرنامج التعليمي](training) لضبط نموذج مُدرب مسبقًا.
</Tip>
1. حدد تكوين المحول باستخدام نوع المهمة والمعاملات الزائدة (راجع [`~peft.LoraConfig`] لمزيد من التفاصيل حول وظيفة هذه المعلمات).

View File

@ -6,8 +6,11 @@
* استخدم مُجزّئ أو نموذجًا محددًا.
* استخدم [`pipeline`] للمهام الصوتية والبصرية والمتعددة الوسائط.
> [!TIP]
> اطلع على وثائق [`pipeline`] للحصول على القائمة كاملة بالمهام المدعومة والمعلمات المتاحة.
<Tip>
اطلع على وثائق [`pipeline`] للحصول على القائمة كاملة بالمهام المدعومة والمعلمات المتاحة.
</Tip>
## استخدام الأنابيب
@ -186,8 +189,9 @@ for out in pipe(KeyDataset(dataset, "audio")):
## استخدام خطوط الأنابيب لخادم ويب
> [!TIP]
> إن إنشاء محرك استدلال هو موضوع معقد يستحق صفحته الخاصة.
<Tip>
إن إنشاء محرك استدلال هو موضوع معقد يستحق صفحته الخاصة.
</Tip>
[Link](./pipeline_webserver)
@ -247,13 +251,16 @@ for out in pipe(KeyDataset(dataset, "audio")):
[{'score': 0.425, 'answer': 'us-001', 'start': 16, 'end': 16}]
```
> [!TIP]
> لتشغيل المثال أعلاه، تحتاج إلى تثبيت [`pytesseract`](https://pypi.org/project/pytesseract/) بالإضافة إلى 🤗 Transformers:
>
> ```bash
> sudo apt install -y tesseract-ocr
> pip install pytesseract
> ```
<Tip>
لتشغيل المثال أعلاه، تحتاج إلى تثبيت [`pytesseract`](https://pypi.org/project/pytesseract/) بالإضافة إلى 🤗 Transformers:
```bash
sudo apt install -y tesseract-ocr
pip install pytesseract
```
</Tip>
## استخدام `pipeline` على نماذج كبيرة مع 🤗 `accelerate`:

View File

@ -1,8 +1,11 @@
# استخدام قنوات المعالجة لخادم ويب
> [!TIP]
> يُعدّ إنشاء محرك استدلال أمرًا معقدًا، ويعتمد الحل "الأفضل" على مساحة مشكلتك. هل تستخدم وحدة المعالجة المركزية أم وحدة معالجة الرسومات؟ هل تريد أقل زمن وصول، أم أعلى معدل نقل، أم دعمًا للعديد من النماذج، أم مجرد تحقيق أقصى تحسين نموذج محدد؟
> توجد طرق عديدة لمعالجة هذا الموضوع، لذلك ما سنقدمه هو إعداد افتراضي جيد للبدء به قد لا يكون بالضرورة هو الحل الأمثل لك.```
<Tip>
يُعدّ إنشاء محرك استدلال أمرًا معقدًا، ويعتمد الحل "الأفضل" على مساحة مشكلتك. هل تستخدم وحدة المعالجة المركزية أم وحدة معالجة الرسومات؟ هل تريد أقل زمن وصول، أم أعلى معدل نقل، أم دعمًا للعديد من النماذج، أم مجرد تحقيق أقصى تحسين نموذج محدد؟
توجد طرق عديدة لمعالجة هذا الموضوع، لذلك ما سنقدمه هو إعداد افتراضي جيد للبدء به قد لا يكون بالضرورة هو الحل الأمثل لك.```
</Tip>
الشيء الرئيسي الذي يجب فهمه هو أننا يمكن أن نستخدم مؤشرًا، تمامًا كما تفعل [على مجموعة بيانات](pipeline_tutorial#using-pipelines-on-a-dataset)، نظرًا لأن خادم الويب هو أساسًا نظام ينتظر الطلبات ويعالجها عند استلامها.
@ -68,8 +71,11 @@ curl -X POST -d "test [MASK]" http://localhost:8000/
المهم حقًا هو أننا نقوم بتحميل النموذج **مرة واحدة** فقط، لذلك لا توجد نسخ من النموذج على خادم الويب. بهذه الطريقة، لا يتم استخدام ذاكرة الوصول العشوائي غير الضرورية. تسمح آلية وضع قائمة الانتظار بالقيام بأشياء متقدمة مثل تجميع بعض العناصر قبل الاستدلال لاستخدام معالجة الدفعات الديناميكية:
> [!WARNING]
> تم كتابة نموذج الكود البرمجى أدناه بشكل مقصود مثل كود وهمي للقراءة. لا تقم بتشغيله دون التحقق مما إذا كان منطقيًا لموارد النظام الخاص بك!
<Tip warning={true}>
تم كتابة نموذج الكود البرمجى أدناه بشكل مقصود مثل كود وهمي للقراءة. لا تقم بتشغيله دون التحقق مما إذا كان منطقيًا لموارد النظام الخاص بك!
</Tip>
```py
(string, rq) = await q.get()

View File

@ -9,8 +9,11 @@
* تستخدم مدخلات الصورة [ImageProcessor](./main_classes/image_processor) لتحويل الصور إلى موترات.
* تستخدم مدخلات متعددة الوسائط [معالجًا](./main_classes/processors) لدمج مُجزّئ الرموز ومستخرج الميزات أو معالج الصور.
> [!TIP]
> `AutoProcessor` **يعمل دائمًا** ويختار تلقائيًا الفئة الصحيحة للنموذج الذي تستخدمه، سواء كنت تستخدم مُجزّئ رموز أو معالج صور أو مستخرج ميزات أو معالجًا.
<Tip>
`AutoProcessor` **يعمل دائمًا** ويختار تلقائيًا الفئة الصحيحة للنموذج الذي تستخدمه، سواء كنت تستخدم مُجزّئ رموز أو معالج صور أو مستخرج ميزات أو معالجًا.
</Tip>
قبل البدء، قم بتثبيت 🤗 Datasets حتى تتمكن من تحميل بعض مجموعات البيانات لتجربتها:
@ -24,8 +27,11 @@ pip install datasets
أداة المعالجة المسبقة الرئيسية للبيانات النصية هي [مُجزّئ اللغوي](main_classes/tokenizer). يقوم مُجزّئ اللغوي بتقسيم النص إلى "أجزاء لغوية" (tokens) وفقًا لمجموعة من القواعد. يتم تحويل الأجزاء اللغوية إلى أرقام ثم إلى منسوجات، والتي تصبح مدخلات للنموذج. يقوم المجزئ اللغوي بإضافة أي مدخلات إضافية يحتاجها النموذج.
> [!TIP]
> إذا كنت تخطط لاستخدام نموذج مُدرب مسبقًا، فمن المهم استخدامالمجزئ اللغوي المقترن بنفس ذلك النموذج. يضمن ذلك تقسيم النص بنفس الطريقة التي تم بها تقسيم النصوص ما قبل التدريب، واستخدام نفس القاموس الذي يربط بين الأجزاء اللغوية وأرقامها ( يُشار إليها عادةً باسم المفردات *vocab*) أثناء التدريب المسبق.
<Tip>
إذا كنت تخطط لاستخدام نموذج مُدرب مسبقًا، فمن المهم استخدامالمجزئ اللغوي المقترن بنفس ذلك النموذج. يضمن ذلك تقسيم النص بنفس الطريقة التي تم بها تقسيم النصوص ما قبل التدريب، واستخدام نفس القاموس الذي يربط بين الأجزاء اللغوية وأرقامها ( يُشار إليها عادةً باسم المفردات *vocab*) أثناء التدريب المسبق.
</Tip>
ابدأ بتحميل المُجزّئ اللغوي مُدرب مسبقًا باستخدام طريقة [`AutoTokenizer.from_pretrained`]. يقوم هذا بتنزيل المفردات *vocab* الذي تم تدريب النموذج عليه:
@ -134,8 +140,11 @@ pip install datasets
[1، 1، 1، 1، 1، 1، 1، 0، 0، 0، 0، 0، 0، 0، 0، 0]]}
```
> [!TIP]
> تحقق من دليل المفاهيم [Padding and truncation](./pad_truncation) لمعرفة المزيد حول معامﻻت الحشو و البتر المختلفة.
<Tip>
تحقق من دليل المفاهيم [Padding and truncation](./pad_truncation) لمعرفة المزيد حول معامﻻت الحشو و البتر المختلفة.
</Tip>
### بناء الموترات Build tensors
@ -163,11 +172,14 @@ pip install datasets
[1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]])}
```
> [!TIP]
> تدعم خطوط الأنابيب المختلفة معامل مُجزِّئ الرموز(tokenizer) بشكل مختلف في طريقة `()__call__` الخاصة بها.
> و خطوط الأنابيب `text-2-text-generation` تدعم فقط `truncation`.
> و خطوط الأنابيب `text-generation` تدعم `max_length` و`truncation` و`padding` و`add_special_tokens`.
> أما في خطوط الأنابيب `fill-mask`، يمكن تمرير معامل مُجزِّئ الرموز (tokenizer) في المتغير `tokenizer_kwargs` (قاموس).
<Tip>
تدعم خطوط الأنابيب المختلفة معامل مُجزِّئ الرموز(tokenizer) بشكل مختلف في طريقة `()__call__` الخاصة بها.
و خطوط الأنابيب `text-2-text-generation` تدعم فقط `truncation`.
و خطوط الأنابيب `text-generation` تدعم `max_length` و`truncation` و`padding` و`add_special_tokens`.
أما في خطوط الأنابيب `fill-mask`، يمكن تمرير معامل مُجزِّئ الرموز (tokenizer) في المتغير `tokenizer_kwargs` (قاموس).
</Tip>
## الصوت Audio
@ -279,18 +291,24 @@ pip install datasets
بالنسبة لمهام رؤية الحاسوبية، ستحتاج إلى معالج صور [image processor](main_classes/image_processor) لإعداد مجموعة البيانات الخاصة بك لتناسب النموذج. تتكون معالجة الصور المسبقة من عدة خطوات لتحويل الصور إلى الشكل الذي يتوقعه النموذج. وتشمل هذه الخطوات، على سبيل المثال لا الحصر، تغيير الحجم والتطبيع وتصحيح قناة الألوان وتحويل الصور إلى موترات(tensors).
> [!TIP]
> عادة ما تتبع معالجة الصور المسبقة شكلاً من أشكال زيادة البيانات (التضخيم). كلا العمليتين، معالجة الصور المسبقة وزيادة الصور تغيران بيانات الصورة، ولكنها تخدم أغراضًا مختلفة:
>
> *زيادة البيانات: تغيير الصور عن طريق زيادة الصور بطريقة يمكن أن تساعد في منع الإفراط في التعميم وزيادة متانة النموذج. يمكنك أن تكون مبدعًا في كيفية زيادة بياناتك - ضبط السطوع والألوان، واالقص، والدوران، تغيير الحجم، التكبير، إلخ. ومع ذلك، كن حذرًا من عدم تغيير معنى الصور بزياداتك.
> *معالجة الصور المسبقة: تضمن معالجة الصور اتتطابق الصور مع تنسيق الإدخال المتوقع للنموذج. عند ضبط نموذج رؤية حاسوبية بدقة، يجب معالجة الصور بالضبط كما كانت عند تدريب النموذج في البداية.
>
> يمكنك استخدام أي مكتبة تريدها لزيادة بيانات الصور. لمعالجة الصور المسبقة، استخدم `ImageProcessor` المرتبط بالنموذج.
<Tip>
عادة ما تتبع معالجة الصور المسبقة شكلاً من أشكال زيادة البيانات (التضخيم). كلا العمليتين، معالجة الصور المسبقة وزيادة الصور تغيران بيانات الصورة، ولكنها تخدم أغراضًا مختلفة:
*زيادة البيانات: تغيير الصور عن طريق زيادة الصور بطريقة يمكن أن تساعد في منع الإفراط في التعميم وزيادة متانة النموذج. يمكنك أن تكون مبدعًا في كيفية زيادة بياناتك - ضبط السطوع والألوان، واالقص، والدوران، تغيير الحجم، التكبير، إلخ. ومع ذلك، كن حذرًا من عدم تغيير معنى الصور بزياداتك.
*معالجة الصور المسبقة: تضمن معالجة الصور اتتطابق الصور مع تنسيق الإدخال المتوقع للنموذج. عند ضبط نموذج رؤية حاسوبية بدقة، يجب معالجة الصور بالضبط كما كانت عند تدريب النموذج في البداية.
يمكنك استخدام أي مكتبة تريدها لزيادة بيانات الصور. لمعالجة الصور المسبقة، استخدم `ImageProcessor` المرتبط بالنموذج.
</Tip>
قم بتحميل مجموعة بيانات [food101](https://huggingface.co/datasets/food101) (راجع دليل 🤗 [Datasets tutorial](https://huggingface.co/docs/datasets/load_hub) لمزيد من التفاصيل حول كيفية تحميل مجموعة بيانات) لمعرفة كيف يمكنك استخدام معالج الصور مع مجموعات بيانات رؤية الحاسب:
> [!TIP]
> استخدم معامل `split` من 🤗 Datasets لتحميل عينة صغيرة فقط من مجموعة التدريب نظرًا لحجم البيانات كبيرة جدًا!
<Tip>
استخدم معامل `split` من 🤗 Datasets لتحميل عينة صغيرة فقط من مجموعة التدريب نظرًا لحجم البيانات كبيرة جدًا!
</Tip>
```py
>>> from datasets import load_dataset
@ -344,13 +362,15 @@ pip install datasets
... return examples
```
> [!TIP]
> في المثال أعلاه، قمنا بتعيين `do_resize=False` لأننا قمنا بالفعل بتغيير حجم الصور في تحويل زيادة الصور،
> واستفدنا من خاصية `size` من `image_processor` المناسب. إذا لم تقم بتغيير حجم الصور أثناء زيادة الصور،
> فاترك هذا المعلمة. بشكل افتراضي، ستتعامل `ImageProcessor` مع تغيير الحجم.
>
> إذا كنت ترغب في تطبيع الصور كجزء من تحويل زيادة الصور، فاستخدم قيم `image_processor.image_mean`،
> و `image_processor.image_std`.
<Tip>
في المثال أعلاه، قمنا بتعيين `do_resize=False` لأننا قمنا بالفعل بتغيير حجم الصور في تحويل زيادة الصور،
واستفدنا من خاصية `size` من `image_processor` المناسب. إذا لم تقم بتغيير حجم الصور أثناء زيادة الصور،
فاترك هذا المعلمة. بشكل افتراضي، ستتعامل `ImageProcessor` مع تغيير الحجم.
إذا كنت ترغب في تطبيع الصور كجزء من تحويل زيادة الصور، فاستخدم قيم `image_processor.image_mean`،
و `image_processor.image_std`.
</Tip>
3. ثم استخدم 🤗 Datasets[`~datasets.Dataset.set_transform`] لتطبيق التحولات أثناء التنقل:
```py
@ -377,10 +397,13 @@ pip install datasets
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/preprocessed_image.png"/>
</div>
> [!TIP]
> بالنسبة للمهام مثل الكشف عن الأشياء، والتجزئة الدلالية، والتجزئة المثالية، والتجزئة الشاملة، يوفر `ImageProcessor`
> تقوم هذه الطرق بتحويل النواتج الأولية للنموذج إلى تنبؤات ذات معنى مثل مربعات الحدود،
> أو خرائط التجزئة.
<Tip>
بالنسبة للمهام مثل الكشف عن الأشياء، والتجزئة الدلالية، والتجزئة المثالية، والتجزئة الشاملة، يوفر `ImageProcessor`
تقوم هذه الطرق بتحويل النواتج الأولية للنموذج إلى تنبؤات ذات معنى مثل مربعات الحدود،
أو خرائط التجزئة.
</Tip>
### الحشو Pad

View File

@ -23,8 +23,11 @@ pip install torch
يمثل [`pipeline`] أسهل وأسرع طريقة لاستخدام نموذج مُدرب مسبقًا للاستنتاج. يمكنك استخدام [`pipeline`] جاهزًا للعديد من المهام عبر طرق مختلفة، والتي يظهر بعضها في الجدول أدناه:
> [!TIP]
> للاطلاع على القائمة الكاملة للمهام المتاحة، راجع [مرجع واجهة برمجة التطبيقات الخاصة بخط الأنابيب](./main_classes/pipelines).
<Tip>
للاطلاع على القائمة الكاملة للمهام المتاحة، راجع [مرجع واجهة برمجة التطبيقات الخاصة بخط الأنابيب](./main_classes/pipelines).
</Tip>
<div dir="rtl">
@ -176,8 +179,11 @@ label: NEGATIVE, with score: 0.5309
... )
```
> [!TIP]
> اطلع على [الدليل التمهيدي للمعالجة المسبقة](./preprocessing) للحصول على مزيد من التفاصيل حول المعالجة، وكيفية استخدام [`AutoImageProcessor`] و [`AutoFeatureExtractor`] و [`AutoProcessor`] لمعالجة الصور والصوت والإدخالات متعددة الوسائط.
<Tip>
اطلع على [الدليل التمهيدي للمعالجة المسبقة](./preprocessing) للحصول على مزيد من التفاصيل حول المعالجة، وكيفية استخدام [`AutoImageProcessor`] و [`AutoFeatureExtractor`] و [`AutoProcessor`] لمعالجة الصور والصوت والإدخالات متعددة الوسائط.
</Tip>
### AutoModel
@ -190,8 +196,11 @@ label: NEGATIVE, with score: 0.5309
>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
```
> [!TIP]
> راجع [ملخص المهمة](./task_summary) للاطلاع على المهام التي تدعمها فئة [`AutoModel`].
<Tip>
راجع [ملخص المهمة](./task_summary) للاطلاع على المهام التي تدعمها فئة [`AutoModel`].
</Tip>
الآن قم بتمرير دفعة المدخلات المُعالجة مسبقًا مباشرة إلى النموذج. عليك فقط فك تعبئة القاموس عن طريق إضافة `**`:
@ -214,8 +223,11 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
[0.2084, 0.1826, 0.1969, 0.1755, 0.2365]], grad_fn=<SoftmaxBackward0>)
```
> [!TIP]
> تخرج جميع نماذج 🤗 Transformers (PyTorch أو TensorFlow) المصفوفات *قبل* دالة التنشيط النهائية (مثل softmax) لأن دالة التنشيط النهائية غالبًا ما تكون مدمجة مع دالة الخسارة. نواتج النموذج عبارة عن فئات بيانات خاصة، لذلك يتم استكمال سماتها تلقائيًا في IDE. وتتصرف مخرجات النموذج مثل زوج مرتب أو قاموس (يمكنك الفهرسة باستخدام عدد صحيح ، شريحة، أو سلسلة)، وفي هذه الحالة، يتم تجاهل السمات التي تساوي None.
<Tip>
تخرج جميع نماذج 🤗 Transformers (PyTorch أو TensorFlow) المصفوفات *قبل* دالة التنشيط النهائية (مثل softmax) لأن دالة التنشيط النهائية غالبًا ما تكون مدمجة مع دالة الخسارة. نواتج النموذج عبارة عن فئات بيانات خاصة، لذلك يتم استكمال سماتها تلقائيًا في IDE. وتتصرف مخرجات النموذج مثل زوج مرتب أو قاموس (يمكنك الفهرسة باستخدام عدد صحيح ، شريحة، أو سلسلة)، وفي هذه الحالة، يتم تجاهل السمات التي تساوي None.
</Tip>
### حفظ النموذج
@ -351,8 +363,11 @@ tensor([[0.0021, 0.0018, 0.0115, 0.2121, 0.7725],
>>> trainer.train() # doctest: +SKIP
```
> [!TIP]
> بالنسبة للمهام - مثل الترجمة أو التلخيص - التي تستخدم نموذج تسلسل إلى تسلسل، استخدم فئات [`Seq2SeqTrainer`] و [`Seq2SeqTrainingArguments`] بدلاً من ذلك.
<Tip>
بالنسبة للمهام - مثل الترجمة أو التلخيص - التي تستخدم نموذج تسلسل إلى تسلسل، استخدم فئات [`Seq2SeqTrainer`] و [`Seq2SeqTrainingArguments`] بدلاً من ذلك.
</Tip>
يمكنك تخصيص سلوك حلقة التدريب عن طريق إنشاء فئة فرعية من الطرق داخل [`Trainer`]. يسمح لك ذلك بتخصيص ميزات مثل دالة الخسارة، والمحسن، والمجدول. راجع مرجع [`Trainer`] للتعرف على الطرق التي يمكن إنشاء فئات فرعية منها.

View File

@ -93,7 +93,6 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -117,7 +116,6 @@ torchrun \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -140,7 +138,6 @@ python xla_spawn.py --num_cores 8 \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -197,7 +194,6 @@ python examples/pytorch/summarization/run_summarization.py \
--summary_column summary_column_name \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--overwrite_output_dir \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--predict_with_generate
@ -225,7 +221,6 @@ python examples/pytorch/summarization/run_summarization.py \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```
@ -239,8 +234,6 @@ examples/pytorch/summarization/run_summarization.py -h
خيار آخر مفيد لتمكينه هو استئناف التدريب من نقطة تفتيش سابقة. سيضمن ذلك أنك تستطيع الاستمرار من حيث توقفت دون البدء من جديد إذا تم مقاطعة تدريبك. هناك طريقتان لاستئناف التدريب من نقطة تفتيش.
تستخدم الطريقة الأولى المعلمة `output_dir previous_output_dir` لاستئناف التدريب من أحدث نقطة تفتيش مخزنة في `output_dir`. في هذه الحالة، يجب عليك إزالة `overwrite_output_dir`:
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
@ -252,24 +245,6 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--output_dir previous_output_dir \
--predict_with_generate
```
تستخدم الطريقة الثانية معلمة `resume_from_checkpoint path_to_specific_checkpoint` لاستئناف التدريب من مجلد نقطة تفتيش محددة.
```bash
python examples/pytorch/summarization/run_summarization.py
--model_name_or_path google-t5/t5-small \
--do_train \
--do_eval \
--dataset_name cnn_dailymail \
--dataset_config "3.0.0" \
--source_prefix "summarize: " \
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--resume_from_checkpoint path_to_specific_checkpoint \
--predict_with_generate
```
@ -301,6 +276,5 @@ python examples/pytorch/summarization/run_summarization.py
--output_dir /tmp/tst-summarization \
--per_device_train_batch_size=4 \
--per_device_eval_batch_size=4 \
--overwrite_output_dir \
--predict_with_generate
```

View File

@ -32,7 +32,7 @@
لتصدير نموذج 🤗 Transformers إلى ONNX، قم أولاً بتثبيت اعتماد إضافي:
```bash
pip install optimum[exporters]
pip install optimum-onnx
```
للاطلاع على جميع المعامﻻت المتاحة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model#exporting-a-model-to-onnx-using-the-cli)، أو عرض المساعدة في سطر الأوامر:
@ -111,57 +111,3 @@ optimum-cli export onnx --model keras-io/transformers-qa distilbert_base_cased_s
### تصدير نموذج لهندسة غير مدعومة
إذا كنت ترغب في المساهمة من خلال إضافة دعم لنموذج لا يُمكن تصديره حاليًا، فيجب عليك أولاً التحقق مما إذا كان مدعومًا في [`optimum.exporters.onnx`](https://huggingface.co/docs/optimum/exporters/onnx/overview)، وإذا لم يكن مدعومًا، [فيمكنك المساهمة في 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/contribute) مُباشرةً.
### تصدير نموذج باستخدام `transformers.onnx`
> [!WARNING]
> لم يعد يتم دعم `transformers.onnx` يُرجى تصدير النماذج باستخدام 🤗 Optimum كما هو موضح أعلاه. سيتم إزالة هذا القسم في الإصدارات القادمة.
لتصدير نموذج 🤗 Transformers إلى ONNX باستخدام `transformers.onnx`، ثبّت التبعيات الإضافية:
```bash
pip install transformers[onnx]
```
استخدم حزمة `transformers.onnx` كنموذج Python لتصدير نقطة حفظ باستخدام تكوين جاهز:
```bash
python -m transformers.onnx --model=distilbert/distilbert-base-uncased onnx/
```
يُصدّر هذا رسمًا بيانيًا ONNX لنقطة الحفظ المُحددة بواسطة وسيطة `--model`. مرر أي نقطة حفظ على 🤗 Hub أو نقطة حفظ مُخزنة محليًا.
يُمكن بعد ذلك تشغيل ملف `model.onnx` الناتج على أحد المُسرعات العديدة التي تدعم معيار ONNX. على سبيل المثال، قم بتحميل وتشغيل النموذج باستخدام ONNX Runtime كما يلي:
```python
>>> from transformers import AutoTokenizer
>>> from onnxruntime import InferenceSession
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased")
>>> session = InferenceSession("onnx/model.onnx")
>>> # يتوقع ONNX Runtime مصفوفات NumPy كمدخلات
>>> inputs = tokenizer("Using DistilBERT with ONNX Runtime!", return_tensors="np")
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
```
يُمكن الحصول على أسماء المخرجات المطلوبة (مثل `["last_hidden_state"]`) من خلال إلقاء نظرة على تكوين ONNX لكل نموذج. على سبيل المثال، بالنسبة لـ DistilBERT، لدينا:
```python
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
>>> config = DistilBertConfig()
>>> onnx_config = DistilBertOnnxConfig(config)
>>> print(list(onnx_config.outputs.keys()))
["last_hidden_state"]
```
العمليات مُتطابقة لنقاط الحفظ TensorFlow على Hub. على سبيل المثال، صدّر نقطة حفظ TensorFlow خالصة كما يلي:
```bash
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
```
لتصدير نموذج مُخزن محليًا، احفظ أوزان النموذج ومجزىء اللغوى في نفس الدليل (على سبيل المثال `local-pt-checkpoint`)، ثم قم بتصديره إلى ONNX عن طريق توجيه وسيط `--model` لحزمة `transformers.onnx` إلى الدليل المطلوب:
```bash
python -m transformers.onnx --model=local-pt-checkpoint onnx/
```

View File

@ -27,8 +27,11 @@ rendered properly in your Markdown viewer.
1. ضبط دقيق [DistilRoBERTa](https://huggingface.co/distilbert/distilroberta-base) على مجموعة فرعية [r/askscience](https://www.reddit.com/r/askscience/) من مجموعة بيانات [ELI5](https://huggingface.co/datasets/eli5).
2. استخدام النموذج المدرب الخاص بك للاستنتاج.
> [!TIP]
> لرؤية جميع العمارات ونقاط التحقق المتوافقة مع هذه المهمة، نوصي بالتحقق من [task-page](https://huggingface.co/tasks/text-generation)
<Tip>
لرؤية جميع العمارات ونقاط التحقق المتوافقة مع هذه المهمة، نوصي بالتحقق من [task-page](https://huggingface.co/tasks/text-generation)
</Tip>
قبل أن تبدأ، تأكد من تثبيت جميع المكتبات الضرورية:
@ -192,8 +195,11 @@ pip install transformers datasets evaluate
## التدريب (Train)
> [!TIP]
> إذا لم تكن على دراية بتدريب نموذج باستخدام [`Trainer`], اطلع على [البرنامج التعليمي الأساسي](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن على دراية بتدريب نموذج باستخدام [`Trainer`], اطلع على [البرنامج التعليمي الأساسي](../training#train-with-pytorch-trainer)!
</Tip>
أنت جاهز الآن لبدء تدريب نموذجك! قم بتحميل DistilGPT2 باستخدام [`AutoModelForCausalLM`]:
@ -246,10 +252,13 @@ Perplexity: 49.61
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر تعمقًا حول كيفية تدريب نموذج للنمذجة اللغوية السببية، اطلع على الدفتر المقابل
> [دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
> أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
<Tip>
للحصول على مثال أكثر تعمقًا حول كيفية تدريب نموذج للنمذجة اللغوية السببية، اطلع على الدفتر المقابل
[دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
</Tip>
## الاستدلال (Inference)

View File

@ -24,8 +24,11 @@ rendered properly in your Markdown viewer.
1. تكييف [DistilRoBERTa](https://huggingface.co/distilbert/distilroberta-base) على مجموعة فرعية [r/askscience](https://www.reddit.com/r/askscience/) من مجموعة بيانات [ELI5](https://huggingface.co/datasets/eli5).
2. استخدام نموذج المدرب الخاص بك للاستدلال.
> [!TIP]
> لمعرفة جميع البنى والنسخ المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/fill-mask)
<Tip>
لمعرفة جميع البنى والنسخ المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/fill-mask)
</Tip>
قبل أن تبدأ، تأكد من تثبيت جميع المكتبات الضرورية:
@ -186,8 +189,11 @@ pip install transformers datasets evaluate
## التدريب (Train)
> [!TIP]
> إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت مستعد الآن لبدء تدريب نموذجك! قم بتحميل DistilRoBERTa باستخدام [`AutoModelForMaskedLM`]:
@ -242,10 +248,13 @@ Perplexity: 8.76
>>> trainer.push_to_hub()
```
> [!TIP]
> لمثال أكثر تفصيلاً حول كيفية تعديل نموذج للنمذجة اللغوية المقنعة، ألق نظرة على الدفتر المقابل
> [دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
> أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
<Tip>
لمثال أكثر تفصيلاً حول كيفية تعديل نموذج للنمذجة اللغوية المقنعة، ألق نظرة على الدفتر المقابل
[دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling.ipynb)
أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/language_modeling-tf.ipynb).
</Tip>
## الاستدلال

View File

@ -183,8 +183,11 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
## التدريب (Train)
> [!TIP]
> إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`], فراجع الدرس الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`], فراجع الدرس الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت جاهز لبدء تدريب نموذجك الآن! قم بتحميل BERT باستخدام [`AutoModelForMultipleChoice`]:
@ -233,9 +236,12 @@ tokenized_swag = swag.map(preprocess_function, batched=True)
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للاختيار من متعدد، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)
> أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb) المقابل.
<Tip>
للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للاختيار من متعدد، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice.ipynb)
أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/multiple_choice-tf.ipynb) المقابل.
</Tip>
## الاستدلال (Inference)

View File

@ -30,8 +30,11 @@ rendered properly in your Markdown viewer.
1. ضبط [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) على مجموعة بيانات [SQuAD](https://huggingface.co/datasets/squad) للإجابة على الأسئلة الاستخراجية.
2. استخدام النموذج المضبوط للاستدلال.
> [!TIP]
> لمشاهدة جميع الهياكل والنسخ المتوافقة مع هذه المهمة، نوصي بالرجوع إلى [صفحة المهمة](https://huggingface.co/tasks/question-answering)
<Tip>
لمشاهدة جميع الهياكل والنسخ المتوافقة مع هذه المهمة، نوصي بالرجوع إلى [صفحة المهمة](https://huggingface.co/tasks/question-answering)
</Tip>
قبل البدء، تأكد من تثبيت جميع المكتبات الضرورية:
@ -174,8 +177,11 @@ pip install transformers datasets evaluate
## التدريب (Train)
> [!TIP]
> إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`], ألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`], ألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت جاهز لبدء تدريب نموذجك الآن! قم بتحميل DistilBERT باستخدام [`AutoModelForQuestionAnswering`]:
@ -222,9 +228,12 @@ pip install transformers datasets evaluate
```
> [!TIP]
> للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للإجابة على الأسئلة، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb) المقابل
> أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
<Tip>
للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للإجابة على الأسئلة، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering.ipynb) المقابل
أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/question_answering-tf.ipynb).
</Tip>
## التقييم (Evaluate)

View File

@ -22,8 +22,11 @@ rendered properly in your Markdown viewer.
1. ضبط [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) على مجموعة بيانات [IMDb](https://huggingface.co/datasets/imdb) لتحديد ما إذا كانت مراجعة الفيلم إيجابية أو سلبية.
2. استخدام نموذج الضبط الدقيق للتنبؤ.
> [!TIP]
> لرؤية جميع البنى ونقاط التحقق المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/text-classification).
<Tip>
لرؤية جميع البنى ونقاط التحقق المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/text-classification).
</Tip>
قبل أن تبدأ، تأكد من تثبيت جميع المكتبات الضرورية:
@ -128,8 +131,11 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> label2id = {"NEGATIVE": 0, "POSITIVE": 1}
```
> [!TIP]
> إذا لم تكن على دراية بضبط نموذج دقيق باستخدام [`Trainer`], فالق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن على دراية بضبط نموذج دقيق باستخدام [`Trainer`], فالق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت مستعد الآن لبدء تدريب نموذجك! قم بتحميل DistilBERT مع [`AutoModelForSequenceClassification`] جنبًا إلى جنب مع عدد التصنيفات المتوقعة، وتصنيفات الخرائط:
@ -174,8 +180,11 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> trainer.train()
```
> [!TIP]
> يستخدم [`Trainer`] الحشو الديناميكي افتراضيًا عند تمرير `tokenizer` إليه. في هذه الحالة، لا تحتاج لتحديد مُجمِّع البيانات صراحةً.
<Tip>
يستخدم [`Trainer`] الحشو الديناميكي افتراضيًا عند تمرير `tokenizer` إليه. في هذه الحالة، لا تحتاج لتحديد مُجمِّع البيانات صراحةً.
</Tip>
بعد اكتمال التدريب، شارك نموذجك على Hub باستخدام الطريقة [`~transformers.Trainer.push_to_hub`] ليستخدمه الجميع:
@ -183,10 +192,13 @@ tokenized_imdb = imdb.map(preprocess_function, batched=True)
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر عمقًا حول كيفية ضبط نموذج لتصنيف النصوص، قم بالاطلاع على الدفتر المقابل
> [دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)
> أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
<Tip>
للحصول على مثال أكثر عمقًا حول كيفية ضبط نموذج لتصنيف النصوص، قم بالاطلاع على الدفتر المقابل
[دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification.ipynb)
أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/text_classification-tf.ipynb).
</Tip>
## الاستدلال(Inference)

View File

@ -30,8 +30,11 @@ rendered properly in your Markdown viewer.
1. ضبط دقيق [T5](https://huggingface.co/google-t5/t5-small) على مجموعة فرعية من مشاريع قوانين ولاية كاليفورنيا من مجموعة بيانات [BillSum](https://huggingface.co/datasets/billsum) للتلخيص التجريدي.
2. استخدام النموذج المضبوط بدقة للتنبؤ.
> [!TIP]
> لمشاهدة جميع البنى ونقاط التفتيش المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/summarization)
<Tip>
لمشاهدة جميع البنى ونقاط التفتيش المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/summarization)
</Tip>
قبل البدء، تأكد من تثبيت جميع المكتبات الضرورية:
@ -156,8 +159,11 @@ pip install transformers datasets evaluate rouge_score
## التدريب (Train)
> [!TIP]
> إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`]، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن معتادًا على ضبط نموذج باستخدام [`Trainer`]، فألق نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت جاهز لبدء تدريب نموذجك الآن! قم بتحميل T5 باستخدام [`AutoModelForSeq2SeqLM`]:
@ -207,9 +213,12 @@ pip install transformers datasets evaluate rouge_score
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للتجميع، ألقِ نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)
> أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb) المقابل.
<Tip>
للحصول على مثال أكثر تعمقًا حول كيفية ضبط نموذج للتجميع، ألقِ نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)
أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization-tf.ipynb) المقابل.
</Tip>
## الاستدلال (Inference)

View File

@ -22,8 +22,11 @@
1. ضبط [DistilBERT](https://huggingface.co/distilbert/distilbert-base-uncased) على مجموعة بيانات [WNUT 17](https://huggingface.co/datasets/wnut_17) للكشف عن كيانات جديدة.
2. استخدام نموذجك المضبوط بدقة للاستدلال.
> [!TIP]
> للاطلاع جميع البنى والنقاط المتوافقة مع هذه المهمة، نوصي بالرجوع من [صفحة المهمة](https://huggingface.co/tasks/token-classification).
<Tip>
للاطلاع جميع البنى والنقاط المتوافقة مع هذه المهمة، نوصي بالرجوع من [صفحة المهمة](https://huggingface.co/tasks/token-classification).
</Tip>
قبل أن تبدأ، تأكد من تثبيت جميع المكتبات الضرورية:
@ -232,8 +235,11 @@ pip install transformers datasets evaluate seqeval
... }
```
> [!TIP]
> إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن على دراية بتعديل نموذج باستخدام [`Trainer`], ألق نظرة على الدليل التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت مستعد الآن لبدء تدريب نموذجك! قم بتحميل DistilBERT مع [`AutoModelForTokenClassification`] إلى جانب عدد التصنيفات المتوقعة، وخريطة التسميات:
@ -284,10 +290,13 @@ pip install transformers datasets evaluate seqeval
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر تفصيلاً حول كيفية تعديل نموذج لتصنيف الرموز، ألق نظرة على الدفتر المقابل
> [دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)
> أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
<Tip>
للحصول على مثال أكثر تفصيلاً حول كيفية تعديل نموذج لتصنيف الرموز، ألق نظرة على الدفتر المقابل
[دفتر PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification.ipynb)
أو [دفتر TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/token_classification-tf.ipynb).
</Tip>
## الاستدلال(Inference)

View File

@ -27,8 +27,11 @@ rendered properly in your Markdown viewer.
1. ضبط دقيق لنموذج [T5](https://huggingface.co/google-t5/t5-small) على المجموعة الفرعية الإنجليزية-الفرنسية من مجموعة بيانات [OPUS Books](https://huggingface.co/datasets/opus_books) لترجمة النص الإنجليزي إلى الفرنسية.
2. استخدام النموذج المضبوط بدقة للاستدلال.
> [!TIP]
> لمشاهدة جميع البنى والنسخ المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/translation).
<Tip>
لمشاهدة جميع البنى والنسخ المتوافقة مع هذه المهمة، نوصي بالتحقق من [صفحة المهمة](https://huggingface.co/tasks/translation).
</Tip>
قبل البدء، تأكد من تثبيت جميع المكتبات الضرورية:
@ -163,8 +166,11 @@ pip install transformers datasets evaluate sacrebleu
## التدريب (Train)
> [!TIP]
> إذا لم تكن معتادًا على ضبط دقيق نموذج باستخدام [`Trainer`], فألقِ نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
<Tip>
إذا لم تكن معتادًا على ضبط دقيق نموذج باستخدام [`Trainer`], فألقِ نظرة على البرنامج التعليمي الأساسي [هنا](../training#train-with-pytorch-trainer)!
</Tip>
أنت جاهز لبدء تدريب نموذجك الآن! حمّل T5 باستخدام [`AutoModelForSeq2SeqLM`]:
@ -214,9 +220,12 @@ pip install transformers datasets evaluate sacrebleu
>>> trainer.push_to_hub()
```
> [!TIP]
> للحصول على مثال أكثر تعمقًا لكيفية ضبط نموذج للترجمة، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb) المقابل
> أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).
<Tip>
للحصول على مثال أكثر تعمقًا لكيفية ضبط نموذج للترجمة، ألق نظرة على [دفتر ملاحظات PyTorch](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation.ipynb) المقابل
أو [دفتر ملاحظات TensorFlow](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/translation-tf.ipynb).
</Tip>
## الاستدلال (Inference)

View File

@ -13,8 +13,11 @@
- [GPT2](model_doc/gpt2) لمهام NLP مثل توليد النصوص التي تستخدم فك تشفير
- [BART](model_doc/bart) لمهام NLP مثل الملخص والترجمة التي تستخدم ترميز-فك تشفير
> [!TIP]
> قبل المتابعة، من الجيد أن يكون لديك بعض المعرفة الأساسية بهيكلية المحولات (Transformer Architecture) الأصلية. إن معرفة كيفية عمل المُشفّرات (Encoders) والمُفكّكات (Decoders) وآلية الانتباه (Attention Mechanism) سوف تساعدك في فهم كيفية عمل نماذج Transformer المختلفة. إذا كنت مبتدئًا أو بحاجة إلى مراجعة، فراجع [دورتنا](https://huggingface.co/course/chapter1/4؟fw=pt) لمزيد من المعلومات!
<Tip>
قبل المتابعة، من الجيد أن يكون لديك بعض المعرفة الأساسية بهيكلية المحولات (Transformer Architecture) الأصلية. إن معرفة كيفية عمل المُشفّرات (Encoders) والمُفكّكات (Decoders) وآلية الانتباه (Attention Mechanism) سوف تساعدك في فهم كيفية عمل نماذج Transformer المختلفة. إذا كنت مبتدئًا أو بحاجة إلى مراجعة، فراجع [دورتنا](https://huggingface.co/course/chapter1/4؟fw=pt) لمزيد من المعلومات!
</Tip>
## الكلام والصوت (Speech and audio)
@ -54,8 +57,11 @@
1. قم بتقسيم الصورة إلى تسلسل من الرقع ومعالجتها بالتوازي باستخدام مُحوّل Transformer.
2. استخدم شبكة عصبية تلافيفية CNN) حديثة، مثل [ConvNeXT](model_doc/convnext)، والتي تعتمد على الطبقات التلافيفية ولكنها تعتمد تصميمات حديثة للشبكات.
> [!TIP]
> يقوم النهج الثالث بمزج المحولات مع التلافيف (على سبيل المثال، [Convolutional Vision Transformer](model_doc/cvt) أو [LeViT](model_doc/levit)). لن نناقشها لأنها تجمع ببساطة بين النهجين اللذين نستعرضهما هنا.
<Tip>
يقوم النهج الثالث بمزج المحولات مع التلافيف (على سبيل المثال، [Convolutional Vision Transformer](model_doc/cvt) أو [LeViT](model_doc/levit)). لن نناقشها لأنها تجمع ببساطة بين النهجين اللذين نستعرضهما هنا.
</Tip>
يتم استخدام ViT و ConvNeXT بشكل شائع لتصنيف الصور، ولكن بالنسبة لمهام الرؤية الأخرى مثل اكتشاف الكائنات والتجزئة وتقدير العمق، سنلقي نظرة على DETR و Mask2Former و GLPN، على التوالي؛ فهذه النماذج هي الأنسب لتلك المهام.
@ -85,8 +91,11 @@
#### الشبكات العصبية التلافيفية (CNN)
> [!TIP]
> يشرح هذا القسم بإيجاز الالتفافات، ولكن سيكون من المفيد أن يكون لديك فهم مسبق لكيفية تغيير شكل الصورة وحجمها. إذا كنت غير معتاد على الالتفافات، تحقق من [فصل الشبكات العصبية التلافيفية](https://github.com/fastai/fastbook/blob/master/13_convolutions.ipynb) من كتاب fastai!
<Tip>
يشرح هذا القسم بإيجاز الالتفافات، ولكن سيكون من المفيد أن يكون لديك فهم مسبق لكيفية تغيير شكل الصورة وحجمها. إذا كنت غير معتاد على الالتفافات، تحقق من [فصل الشبكات العصبية التلافيفية](https://github.com/fastai/fastbook/blob/master/13_convolutions.ipynb) من كتاب fastai!
</Tip>
[ConvNeXT](model_doc/convnext) هو بنية CNN تعتمد تصاميم الشبكات الجديدة والحديثة لتحسين الأداء. ومع ذلك، لا تزال الالتفافات هي جوهر النموذج. من منظور عام، [الالتفاف](glossary#convolution) هو عملية حيث يتم ضرب مصفوفة أصغر (*نواة*) بمقطع صغير من وحدات بكسل الصورة. يحسب بعض الميزات منه، مثل نسيج معين أو انحناء خط. ثم ينزلق إلى النافذة التالية من البكسلات؛ المسافة التي تقطعها الالتفاف تسمى *الخطوة*.
@ -205,8 +214,11 @@
هل أنت مستعد لتجربة الإجابة على الأسئلة؟ راجع [دليل الإجابة على الأسئلة](tasks/question_answering) الشامل الخاص بنا لمعرفة كيفية ضبط نموذج DistilBERT واستخدامه في الاستدلال!
> [!TIP]
> 💡 لاحظ مدى سهولة استخدام BERT لمهام مختلفة بمجرد تدريبه مسبقًا. كل ما تحتاج إليه هو إضافة رأس محدد إلى النموذج المسبق التدريب للتلاعب بالحالات المخفية إلى الإخراج المطلوب!
<Tip>
💡 لاحظ مدى سهولة استخدام BERT لمهام مختلفة بمجرد تدريبه مسبقًا. كل ما تحتاج إليه هو إضافة رأس محدد إلى النموذج المسبق التدريب للتلاعب بالحالات المخفية إلى الإخراج المطلوب!
</Tip>
### توليد النصوص (Text generation)
@ -224,8 +236,11 @@
هل أنت مستعد لتجربة توليد النصوص؟ تحقق من دليل [دليل نمذجة اللغة السببية](tasks/language_modeling#causal- الشامل الخاص بنا لمعرفة كيفية ضبط نموذج DistilGPT-2 واستخدامه للاستنتاج!
> [!TIP]
> للحصول على مزيد من المعلومات حول توليد النص، راجع دليل [استراتيجيات توليد النصوص](generation_strategies)!
<Tip>
للحصول على مزيد من المعلومات حول توليد النص، راجع دليل [استراتيجيات توليد النصوص](generation_strategies)!
</Tip>
### التلخيص (Summarization)
@ -241,8 +256,11 @@
هل أنت مستعد لتجربة التلخيص؟ تحقق من دليل التلخيص الشامل الخاص بنا لمعرفة كيفية ضبط نموذج T5 واستخدامه للاستنتاج!
> [!TIP]
> للحصول على مزيد من المعلومات حول توليد النص، راجع دليل استراتيجيات توليد النص!
<Tip>
للحصول على مزيد من المعلومات حول توليد النص، راجع دليل استراتيجيات توليد النص!
</Tip>
### الترجمة (Translation)
@ -254,5 +272,8 @@
هل أنت مستعد لتجربة الترجمة؟ تحقق من دليل الترجمة الشامل الخاص بنا لمعرفة كيفية ضبط نموذج T5 واستخدامه للاستنتاج!
> [!TIP]
> **للحصول على مزيد من المعلومات حول توليد النصوص، راجع دليل [استراتيجيات توليد النصوص](generation_strategies)!**
<Tip>
**للحصول على مزيد من المعلومات حول توليد النصوص، راجع دليل [استراتيجيات توليد النصوص](generation_strategies)!**
</Tip>

View File

@ -1,151 +0,0 @@
# التصدير إلى TorchScript
> [!TIP]
> هذه هي بداية تجاربنا مع TorchScript ولا زلنا نستكشف قدراته مع نماذج المدخلات المتغيرة الحجم. إنه مجال اهتمامنا وسنعمق تحليلنا في الإصدارات القادمة، مع المزيد من الأمثلة البرمجية، وتنفيذ أكثر مرونة، ومقاييس مقارنة بين الأكواد القائمة على Python مع أكواد TorchScript المُجمّعة.
وفقًا لـ [وثائق TorchScript](https://pytorch.org/docs/stable/jit.html):
> TorchScript هي طريقة لإنشاء نماذج قابلة للتسلسل والتحسين من تعليمات PyTorch البرمجية.
هناك وحدتان من PyTorch، [JIT and TRACE](https://pytorch.org/docs/stable/jit.html)، تتيحان للمطورين تصدير نماذجهم لإعادة استخدامها في برامج أخرى مثل برامج C++ المُحسّنة للأداء.
نقدم واجهة تتيح لك تصدير نماذج 🤗 Transformers إلى TorchScript بحيث يمكن إعادة استخدامها في بيئة مختلفة عن برامج Python القائمة إلى PyTorch. هنا نشرح كيفية تصدير نماذجنا واستخدامها باستخدام TorchScript.
يتطلب تصدير نموذج أمرين:
- تهيئة مثيل للنموذج باستخدام علامة `torchscript`
- تمرير مُدخلات وهمية (dummy inputs) خلال النموذج
تنطوي هذه الضرورات على عدة أمور يجب على المطورين توخي الحذر بشأنها كما هو مفصل أدناه.
## علامة TorchScript والأوزان المرتبطة
علامة `torchscript` ضرورية لأن معظم نماذج اللغة 🤗 Transformers لها أوزان مرتبطة بين طبقة `Embedding` وطبقة `Decoding`. لا يسمح لك TorchScript بتصدير النماذج ذات الأوزان المرتبطة، لذلك من الضروري فصل الأوزان ونسخها مسبقًا.
النماذج المُهيأة باستخدام علامة `torchscript` لها طبقة `Embedding` وطبقة`Decoding` منفصلتين، مما يعني أنه لا ينبغي تدريبها لاحقًا. سيؤدي التدريب إلى عدم تزامن الطبقتين، مما يؤدي إلى نتائج غير متوقعة.
هذا لا ينطبق على النماذج التي لا تحتوي على رأس نموذج اللغة، حيث لا تملك أوزانًا مرتبطة. يمكن تصدير هذه النماذج بأمان دون علامة `torchscript`.
## المدخلات الوهمية والأطوال القياسية
تُستخدم المُدخلات الوهمية لتمرير أمامي خلال النموذج. أثناء انتشار قيم المُدخلات عبر الطبقات، يتتبع PyTorch العمليات المختلفة التي يتم تنفيذها على كل مصفوفة(tensor). ثم يتم استخدام هذه العمليات المُسجلة بعد ذلك لإنشاء *أثر* النموذج.
يتم إنشاء التتبع بالنسبة لأبعاد المُدخلات. وبالتالي، فهو مُقيّد بأبعاد المُدخلات الوهمية، ولن يعمل لأي طول تسلسل أو حجم دفعة مختلف. عند المحاولة بحجم مختلف، يتم رفع الخطأ التالي:
```
`The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
```
نوصي بتتبع النموذج باستخدام حجم مُدخلات وهمية لا يقل عن أكبر مُدخل سيتم تقديمه للنموذج أثناء الاستدلال. يمكن أن تساعد الحشوة(padding) في ملء القيم المفقودة. ومع ذلك، نظرًا لتتبع النموذج بحجم مُدخل أكبر، ستكون أبعاد المصفوفة ستكون كبيرة أيضًا، مما يؤدي عنه المزيد من الحسابات.
انتبه إلى إجمالي عدد العمليات المُنفذة على كل مُدخل وتابع الأداء عن كثب عند تصدير نماذج متغيرة طول التسلسل.
## استخدام TorchScript في Python
يوضح هذا القسم كيفية حفظ النماذج وتحميلها، بالإضافة إلى كيفية استخدام التتبع للاستدلال.
### حفظ نموذج
لتصدير `BertModel` باستخدام TorchScript، قم بتهيئة ـ `BertModel` من فئة `BertConfig` ثم احفظه على القرص تحت اسم الملف `traced_bert.pt`:
```python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
enc = BertTokenizer.from_pretrained("google-bert/bert-base-uncased")
# Tokenizing input text
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = enc.tokenize(text)
# Masking one of the input tokens
masked_index = 8
tokenized_text[masked_index] = "[MASK]"
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Creating a dummy input
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
dummy_input = [tokens_tensor, segments_tensors]
# Initializing the model with the torchscript flag
# Flag set to True even though it is not necessary as this model does not have an LM Head.
config = BertConfig(
vocab_size_or_config_json_file=32000,
hidden_size=768,
num_hidden_layers=12,
num_attention_heads=12,
intermediate_size=3072,
torchscript=True,
)
# Instantiating the model
model = BertModel(config)
# The model needs to be in evaluation mode
model.eval()
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
model = BertModel.from_pretrained("google-bert/bert-base-uncased", torchscript=True)
# Creating the trace
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
torch.jit.save(traced_model, "traced_bert.pt")
```
### تحميل نموذج
يمكنك الآن تحميل `BertModel` المُحفظ سابقًا، `traced_bert.pt`، من القرص واستخدامه على `dummy_input` المُهيأ سابقًا:
```python
loaded_model = torch.jit.load("traced_bert.pt")
loaded_model.eval()
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
```
### استخدام نموذج مُتتبع للاستدلال
استخدم النموذج المُتتبع للاستدلال باستخدام أسلوب `__call__` الخاص به:
```python
traced_model(tokens_tensor, segments_tensors)
```
## نشر نماذج Hugging Face TorchScript على AWS باستخدام Neuron SDK
قدمت AWS عائلة [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) من اﻷجهزة لخفض التكلفة وأداء التعلم الآلي عالي الأداء في البيئة السحابية. تعمل أجهزة Inf1 بواسطة شريحة Inferentia من AWS، وهي مُسرّع أجهزة مُخصص، متخصص في أعباء عمل الاستدلال للتعلم العميق. [AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#) هي SDK لـ Inferentia التي تدعم تتبع نماذج المحولات وتحسينها للنشر على Inf1. توفر Neuron SDK ما يلي:
1. واجهة برمجة تطبيقات سهلة الاستخدام مع تغيير سطر واحد من التعليمات البرمجية لتتبع نموذج TorchScript وتحسينه للاستدلال في البيئة السحابية.
2. تحسينات الأداء الجاهزة للاستخدام [تحسين التكلفة والأداء](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>).
3. دعم نماذج Hugging Face المحولات المبنية باستخدام إما [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html) أو [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html).
### الآثار المترتبة
تعمل نماذج المحولات المستندة إلى بنية [BERT (تمثيلات الترميز ثنائية الاتجاه من المحولات)](https://huggingface.co/docs/transformers/main/model_doc/bert) أو متغيراتها مثل [distilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert) و [roBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta) بشكل أفضل على Inf1 للمهام غير التوليدية مثل الإجابة على الأسئلة الاستخراجية، وتصنيف التسلسلات، وتصنيف الرموز (tokens). ومع ذلك، يمكن تكييف مهام توليد النصوص للعمل على Inf1 وفقًا لهذا [برنامج تعليمي AWS Neuron MarianMT](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html). يمكن العثور على مزيد من المعلومات حول النماذج التي يمكن تحويلها جاهزة على Inferentia في قسم [ملاءمة بنية النموذج](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia) من وثائق Neuron.
### التبعيات (Dependencies)
يتطلب استخدام AWS Neuron لتحويل النماذج [بيئة SDK Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide) والتي تأتي مسبقًا على [AMI للتعلم العميق من AWS](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
### تحويل نموذج لـ AWS Neuron
قم بتحويل نموذج لـ AWS NEURON باستخدام نفس التعليمات البرمجية من [استخدام TorchScript في Python](torchscript#using-torchscript-in-python) لتتبع `BertModel`. قم باستيراد امتداد إطار عمل `torch.neuron` للوصول إلى مكونات Neuron SDK من خلال واجهة برمجة تطبيقات Python:
```python
from transformers import BertModel, BertTokenizer, BertConfig
import torch
import torch.neuron
```
كل ما عليك فعله هو تعديل السطر التالي:
```diff
- torch.jit.trace(model, [tokens_tensor, segments_tensors])
+ torch.neuron.trace(model, [token_tensor, segments_tensors])
```
يتيح ذلك لـ Neuron SDK تتبع النموذج وتحسينه لمثيلات Inf1.
لمعرفة المزيد حول ميزات AWS Neuron SDK والأدوات ودروس البرامج التعليمية والتحديثات الأخيرة، يرجى الاطلاع على [وثائق AWS NeuronSDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).

View File

@ -2,12 +2,15 @@
تُتيح وحدة [`Trainer`] حلقة تدريب وتقييم متكاملة لنماذج PyTorch المطبقة في مكتبة Transformers. تحتاج فقط إلى تمرير المكونات الضرورية للتدريب (النموذج، والمجزىء النصى، ومجموعة البيانات، دالة التقييم، معلمات التدريب الفائقة، إلخ)، وستتولى فئة [`Trainer`] الباقي. هذا يُسهّل بدء التدريب بشكل أسرع دون كتابة حلقة التدريب الخاصة بك يدويًا. ولكن في الوقت نفسه، فإن [`Trainer`] قابل للتخصيص بدرجة كبيرة ويوفر العديد من خيارات التدريب حتى تتمكن من تخصيصه وفقًا لاحتياجات التدريب الخاصة بك بدقة.
> [!TIP]
> بالإضافة إلى فئة [`Trainer`], توفر مكتبة Transformers أيضًا فئة [`Seq2SeqTrainer`] للمهام التسلسلية مثل الترجمة أو التلخيص. هناك أيضًا فئة [`~trl.SFTTrainer`] من مكتبة [TRL](https://hf.co/docs/trl) التي تغلّف فئة [`Trainer`] وهي مُحُسَّنة لتدريب نماذج اللغة مثل Llama-2 وMistral باستخدام تقنيات التوليد اللغوي. كما يدعم [`~trl.SFTTrainer`] ميزات مثل حزم التسلسلات، وLoRA، والقياس الكمي، وDeepSpeed مما يُمكّن من التدريب بكفاءة على نماذج ضخمة الحجم.
>
> <br>
>
> لا تتردد في الاطلاع على [مرجع API](./main_classes/trainer) لهذه الفئات الأخرى من النوع [`Trainer`] لمعرفة المزيد حول متى يتم استخدام كل منها. بشكل عام، [`Trainer`] هو الخيار الأكثر تنوعًا ومناسبًا لمجموعة واسعة من المهام. تم تصميم [`Seq2SeqTrainer`] للمهام التسلسلية ، و [`~trl.SFTTrainer`] مُصمم لتدريب نماذج اللغة الكبيرة.
<Tip>
بالإضافة إلى فئة [`Trainer`], توفر مكتبة Transformers أيضًا فئة [`Seq2SeqTrainer`] للمهام التسلسلية مثل الترجمة أو التلخيص. هناك أيضًا فئة [`~trl.SFTTrainer`] من مكتبة [TRL](https://hf.co/docs/trl) التي تغلّف فئة [`Trainer`] وهي مُحُسَّنة لتدريب نماذج اللغة مثل Llama-2 وMistral باستخدام تقنيات التوليد اللغوي. كما يدعم [`~trl.SFTTrainer`] ميزات مثل حزم التسلسلات، وLoRA، والقياس الكمي، وDeepSpeed مما يُمكّن من التدريب بكفاءة على نماذج ضخمة الحجم.
<br>
لا تتردد في الاطلاع على [مرجع API](./main_classes/trainer) لهذه الفئات الأخرى من النوع [`Trainer`] لمعرفة المزيد حول متى يتم استخدام كل منها. بشكل عام، [`Trainer`] هو الخيار الأكثر تنوعًا ومناسبًا لمجموعة واسعة من المهام. تم تصميم [`Seq2SeqTrainer`] للمهام التسلسلية ، و [`~trl.SFTTrainer`] مُصمم لتدريب نماذج اللغة الكبيرة.
</Tip>
قبل البدء، تأكد من تثبيت مكتبة [Accelerate](https://hf.co/docs/accelerate) - وهي مكتبة تُمكّن تشغيل تدريب PyTorch في بيئات مُوزعة.
@ -161,15 +164,21 @@ trainer = Trainer(
## تسجيل الأحداث (Logging)
> [!TIP]
> راجع مرجع [API](./main_classes/logging) للتسجيل للحصول على مزيد من المعلومات حول مستويات التسجيل المختلفة للأحداث.
<Tip>
راجع مرجع [API](./main_classes/logging) للتسجيل للحصول على مزيد من المعلومات حول مستويات التسجيل المختلفة للأحداث.
</Tip>
يتم تعيين [`Trainer`] إلى `logging.INFO` افتراضيًا والذي يُبلغ عن الأخطاء والتحذيرات ومعلومات أساسية أخرى. يتم تعيين نسخة [`Trainer`] - في البيئات الموزعة - إلى `logging.WARNING` والتي يُبلغ فقط عن الأخطاء والتحذيرات. يمكنك تغيير مستوى تسجيل الأحداث باستخدام معاملي [`log_level`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level) و [`log_level_replica`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments.log_level_replica) في [`TrainingArguments`].
لتهيئة إعداد مُستوى تسجيل اﻷحداث لكل عقدة، استخدم معامل [`log_on_each_node`](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments.log_on_each_node) لتحديد ما إذا كان سيتم استخدام مُستوى السجل على كل عقدة أو فقط على العقدة الرئيسية.
> [!TIP]
> يحدد [`Trainer`] مُستوى التسجيل بشكل مُنفصل لكل عقدة في طريقة [`Trainer.__init__`]، لذا فقد ترغب في التفكير في تعيين هذا الإعداد في وقت سابق إذا كنت تستخدم وظائف Transformers الأخرى قبل إنشاء كائن [`Trainer`].
<Tip>
يحدد [`Trainer`] مُستوى التسجيل بشكل مُنفصل لكل عقدة في طريقة [`Trainer.__init__`]، لذا فقد ترغب في التفكير في تعيين هذا الإعداد في وقت سابق إذا كنت تستخدم وظائف Transformers الأخرى قبل إنشاء كائن [`Trainer`].
</Tip>
على سبيل المثال، لتعيين التعليمات البرمجية والوحدات النمطية الرئيسية الخاصة بك لاستخدام نفس مُستوى التسجيل وفقًا لكل عقدة:
@ -373,8 +382,11 @@ trainer.train()
تم تقديم مُحسِّنات LOMO في [التدريب على المعلمات الكاملة لنماذج اللغة الكبيرة باستخدام موارد محدودة](https://hf.co/papers/2306.09782) و [AdaLomo: تحسين ذاكرة منخفضة بمعدل تعلم متكيف](https://hf.co/papers/2310.10195).
يتكون كلاهما من طريقة فعالة لضبط المعلمات الكاملة. تدمج محسنات LOMO حساب الاشتقاق وتحديث المعلمات في خطوة واحدة لتقليل استخدام الذاكرة. محسنات LOMO المدعومة هي `"lomo"` و `"adalomo"`. أولاً قم بتثبيت LOMO من pypi `pip install lomo-optim` أو قم بتثبيته من المصدر باستخدام `pip install git+https://github.com/OpenLMLab/LOMO.git`.
> [!TIP]
> وفقًا للمؤلفين، يوصى باستخدام `AdaLomo` بدون `grad_norm` للحصول على أداء أفضل وسرعة أعلى.
<Tip>
وفقًا للمؤلفين، يوصى باستخدام `AdaLomo` بدون `grad_norm` للحصول على أداء أفضل وسرعة أعلى.
</Tip>
فيما يلي نص برمجي بسيط يوضح كيفية ضبط نموذج [google/gemma-2b](https://huggingface.co/google/gemma-2b) على مجموعة بيانات IMDB في الدقة الكاملة:
@ -399,8 +411,9 @@ trainer.train()
### مُحسِّن GrokAdamW
تم تصميم مُحسِّن GrokAdamW لتعزيز أداء التدريب واستقراره، خاصةً للنماذج التي تستفيد من دوال إشارة `grokking`. لاستخدام `GrokAdamW`، قم أولاً بتثبيت حزمة المُحسِّن باستخدام `pip install grokadamw`.
> [!TIP]
> يُعد GrokAdamW مفيدًا بشكل خاص للنماذج التي تتطلب تقنيات تحسين مُتقدمة لتحقيق أداء واستقرار أفضل.
<Tip>
يُعد GrokAdamW مفيدًا بشكل خاص للنماذج التي تتطلب تقنيات تحسين مُتقدمة لتحقيق أداء واستقرار أفضل.
</Tip>
فيما يلي نص برمجى بسيط لشرح كيفية ضبط [google/gemma-2b](https://huggingface.co/google/gemma-2b) بدقة على مجموعة بيانات IMDB باستخدام مُحسِّن GrokAdamW:
```python
@ -469,8 +482,11 @@ trainer.train()
يتم تشغيل فئة [`Trainer`] بواسطة [تسريع](https://hf.co/docs/accelerate)، وهي مكتبة لتدريب نماذج PyTorch بسهولة في بيئات موزعة مع دعم عمليات التكامل مثل [FullyShardedDataParallel (FSDP)](https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/) و [DeepSpeed](https://www.deepspeed.ai/).
> [!TIP]
> تعرف على المزيد حول استراتيجيات تجزئة FSDP، وتفريغ وحدة المعالجة المركزية (CPU)، والمزيد مع [`Trainer`] في [دليل Fully Sharded Data Parallel](fsdp).
<Tip>
تعرف على المزيد حول استراتيجيات تجزئة FSDP، وتفريغ وحدة المعالجة المركزية (CPU)، والمزيد مع [`Trainer`] في [دليل Fully Sharded Data Parallel](fsdp).
</Tip>
لاستخدام Accelerate مع [`Trainer`]]، قم بتشغيل الأمر [`accelerate.config`](https://huggingface.co/docs/accelerate/package_reference/cli#accelerate-config) لإعداد التدريب لبيئة التدريب الخاصة بك. نشئ هذا الأمر ملف `config_file.yaml` الذي سيتم استخدامه عند تشغيل نص للتدريب البرمجى. على سبيل المثال، بعض تكوينات المثال التي يمكنك إعدادها هي:
@ -595,7 +611,6 @@ accelerate launch \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
يمكنك أيضًا تحديد المعلمات من ملف `config_file.yaml` مباشرة في سطر الأوامر:
@ -618,7 +633,6 @@ accelerate launch --num_processes=2 \
--learning_rate 5e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/ \
--overwrite_output_dir
```
اطلع على برنامج تعليمي [Launching your Accelerate scripts](https://huggingface.co/docs/accelerate/basic_tutorials/launch) لمعرفة المزيد حول `accelerate_launch` والتكوينات المخصصة.

View File

@ -72,8 +72,11 @@
>>> model = AutoModelForSequenceClassification.from_pretrained("google-bert/bert-base-cased", num_labels=5)
```
> [!TIP]
> سترى تحذيرًا بشأن بعض أوزان النموذج المُدرب مسبقًا لن تُستخدم وبعض الأوزان الأخرى ستُبدء بشكل عشوائي. لا تقلق، هذا أمر طبيعي تمامًا! يتم التخلص من رأس النموذج المُدرب مسبقًا لشبكة BERT، ويتم استبداله برأس تصنيف يُبدء بشكل عشوائي. سوف تقوم بضبط الرأس الجديد للنموذج بدقة على مهمة تصنيف التسلسلات الخاصة بك، مما ينقل المعرفة من النموذج المُدرب مسبقًا إليه.
<Tip>
سترى تحذيرًا بشأن بعض أوزان النموذج المُدرب مسبقًا لن تُستخدم وبعض الأوزان الأخرى ستُبدء بشكل عشوائي. لا تقلق، هذا أمر طبيعي تمامًا! يتم التخلص من رأس النموذج المُدرب مسبقًا لشبكة BERT، ويتم استبداله برأس تصنيف يُبدء بشكل عشوائي. سوف تقوم بضبط الرأس الجديد للنموذج بدقة على مهمة تصنيف التسلسلات الخاصة بك، مما ينقل المعرفة من النموذج المُدرب مسبقًا إليه.
</Tip>
### اختيار أحسن العوامل والمتغيرات للتدريب (Training hyperparameters)
@ -227,8 +230,11 @@ torch.cuda.empty_cache()
>>> model.to(device)
```
> [!TIP]
> احصل على وصول مجاني إلى وحدة معالجة رسومات سحابية إذا لم يكن لديك واحدة مع دفتر ملاحظات مستضاف مثل [Colaboratory](https://colab.research.google.com/) أو [SageMaker StudioLab](https://studiolab.sagemaker.aws/).
<Tip>
احصل على وصول مجاني إلى وحدة معالجة رسومات سحابية إذا لم يكن لديك واحدة مع دفتر ملاحظات مستضاف مثل [Colaboratory](https://colab.research.google.com/) أو [SageMaker StudioLab](https://studiolab.sagemaker.aws/).
</Tip>
رائع، الآن أنت مستعد للتدريب! 🥳

View File

@ -39,8 +39,9 @@ CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.17 GiB total capacit
- حاول استخدام [`gradient_accumulation_steps`](main_classes/trainer#transformers.TrainingArguments.gradient_accumulation_steps) في [`TrainingArguments`] لزيادة حجم الدُفعة بشكل فعال.
> [!TIP]
> راجع دليل [الأداء](performance) لمزيد من التفاصيل حول تقنيات توفير الذاكرة.
<Tip>
راجع دليل [الأداء](performance) لمزيد من التفاصيل حول تقنيات توفير الذاكرة.
</Tip>
## عدم القدرة على تحميل نموذج TensorFlow محفوظ
@ -137,8 +138,9 @@ tensor([[-0.1008, -0.4061]], grad_fn=<AddmmBackward0>)
يجب عليك في معظم الوقت توفير `attention_mask` للنموذج لتجاهل رموز الحشو لتجنب هذا الخطأ الصامت. الآن يتطابق مُخرجات التسلسل الثاني مع مُخرجاته الفعلية:
> [!TIP]
> بشكل افتراضي، ينشئ مجزىء النصوص `attention_mask` لك استنادًا إلى إعدادات المجزىء المحدد.
<Tip>
بشكل افتراضي، ينشئ مجزىء النصوص `attention_mask` لك استنادًا إلى إعدادات المجزىء المحدد.
</Tip>
```python
>>> attention_mask = torch.tensor([[1, 1, 1, 1, 1, 1], [1, 0, 0, 0, 0, 0]])

View File

@ -53,7 +53,7 @@ Lassen Sie uns daher ein wenig tiefer in das allgemeine Design der Bibliothek ei
### Überblick über die Modelle
Um ein Modell erfolgreich hinzuzufügen, ist es wichtig, die Interaktion zwischen Ihrem Modell und seiner Konfiguration zu verstehen,
[`PreTrainedModel`] und [`PretrainedConfig`]. Als Beispiel werden wir
[`PreTrainedModel`] und [`PreTrainedConfig`]. Als Beispiel werden wir
das Modell, das zu 🤗 Transformers hinzugefügt werden soll, `BrandNewBert` nennen.
Schauen wir uns das mal an:
@ -81,10 +81,10 @@ model.config # model has access to its config
```
Ähnlich wie das Modell erbt die Konfiguration grundlegende Serialisierungs- und Deserialisierungsfunktionalitäten von
[`PretrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
[`PreTrainedConfig`]. Beachten Sie, dass die Konfiguration und das Modell immer in zwei verschiedene Formate serialisiert werden
unterschiedliche Formate serialisiert werden - das Modell in eine *pytorch_model.bin* Datei und die Konfiguration in eine *config.json* Datei. Aufruf von
[`~PreTrainedModel.save_pretrained`] wird automatisch
[`~PretrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
[`~PreTrainedConfig.save_pretrained`] auf, so dass sowohl das Modell als auch die Konfiguration gespeichert werden.
### Code-Stil
@ -748,8 +748,11 @@ Tests erfolgreich sind, führen Sie
RUN_SLOW=1 pytest -sv tests/models/brand_new_bert/test_modeling_brand_new_bert.py::BrandNewBertModelIntegrationTests
```
> [!TIP]
> Falls Sie Windows verwenden, sollten Sie `RUN_SLOW=1` durch `SET RUN_SLOW=1` ersetzen.
<Tip>
Falls Sie Windows verwenden, sollten Sie `RUN_SLOW=1` durch `SET RUN_SLOW=1` ersetzen.
</Tip>
Zweitens sollten alle Funktionen, die speziell für *brand_new_bert* sind, zusätzlich in einem separaten Test getestet werden unter
`BrandNewBertModelTester`/`BrandNewBertModelTest`. Dieser Teil wird oft vergessen, ist aber in zweierlei Hinsicht äußerst nützlich

View File

@ -18,8 +18,11 @@ rendered properly in your Markdown viewer.
Bei so vielen verschiedenen Transformator-Architekturen kann es eine Herausforderung sein, eine für Ihren Checkpoint zu erstellen. Als Teil der 🤗 Transformers Kernphilosophie, die Bibliothek leicht, einfach und flexibel nutzbar zu machen, leitet eine `AutoClass` automatisch die richtige Architektur aus einem gegebenen Checkpoint ab und lädt sie. Mit der Methode `from_pretrained()` kann man schnell ein vortrainiertes Modell für eine beliebige Architektur laden, so dass man keine Zeit und Ressourcen aufwenden muss, um ein Modell von Grund auf zu trainieren. Die Erstellung dieser Art von Checkpoint-agnostischem Code bedeutet, dass Ihr Code, wenn er für einen Checkpoint funktioniert, auch mit einem anderen Checkpoint funktionieren wird - solange er für eine ähnliche Aufgabe trainiert wurde - selbst wenn die Architektur unterschiedlich ist.
> [!TIP]
> Denken Sie daran, dass sich die Architektur auf das Skelett des Modells bezieht und die Checkpoints die Gewichte für eine bestimmte Architektur sind. Zum Beispiel ist [BERT](https://huggingface.co/google-bert/bert-base-uncased) eine Architektur, während `google-bert/bert-base-uncased` ein Checkpoint ist. Modell ist ein allgemeiner Begriff, der entweder Architektur oder Prüfpunkt bedeuten kann.
<Tip>
Denken Sie daran, dass sich die Architektur auf das Skelett des Modells bezieht und die Checkpoints die Gewichte für eine bestimmte Architektur sind. Zum Beispiel ist [BERT](https://huggingface.co/google-bert/bert-base-uncased) eine Architektur, während `google-bert/bert-base-uncased` ein Checkpoint ist. Modell ist ein allgemeiner Begriff, der entweder Architektur oder Prüfpunkt bedeuten kann.
</Tip>
In dieser Anleitung lernen Sie, wie man:
@ -94,9 +97,12 @@ Sie können denselben Prüfpunkt problemlos wiederverwenden, um eine Architektur
>>> model = AutoModelForTokenClassification.from_pretrained("distilbert/distilbert-base-uncased")
```
> [!WARNING]
> Für PyTorch-Modelle verwendet die Methode `from_pretrained()` `torch.load()`, die intern `pickle` verwendet und als unsicher bekannt ist. Generell sollte man niemals ein Modell laden, das aus einer nicht vertrauenswürdigen Quelle stammen könnte, oder das manipuliert worden sein könnte. Dieses Sicherheitsrisiko wird für öffentliche Modelle, die auf dem Hugging Face Hub gehostet werden, teilweise gemildert, da diese bei jeder Übertragung [auf Malware](https://huggingface.co/docs/hub/security-malware) gescannt werden. Siehe die [Hub-Dokumentation](https://huggingface.co/docs/hub/security) für Best Practices wie [signierte Commit-Verifizierung](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) mit GPG.
>
> TensorFlow- und Flax-Checkpoints sind nicht betroffen und können in PyTorch-Architekturen mit den Kwargs `from_tf` und `from_flax` für die Methode `from_pretrained` geladen werden, um dieses Problem zu umgehen.
<Tip warning={true}>
Für PyTorch-Modelle verwendet die Methode `from_pretrained()` `torch.load()`, die intern `pickle` verwendet und als unsicher bekannt ist. Generell sollte man niemals ein Modell laden, das aus einer nicht vertrauenswürdigen Quelle stammen könnte, oder das manipuliert worden sein könnte. Dieses Sicherheitsrisiko wird für öffentliche Modelle, die auf dem Hugging Face Hub gehostet werden, teilweise gemildert, da diese bei jeder Übertragung [auf Malware](https://huggingface.co/docs/hub/security-malware) gescannt werden. Siehe die [Hub-Dokumentation](https://huggingface.co/docs/hub/security) für Best Practices wie [signierte Commit-Verifizierung](https://huggingface.co/docs/hub/security-gpg#signing-commits-with-gpg) mit GPG.
TensorFlow- und Flax-Checkpoints sind nicht betroffen und können in PyTorch-Architekturen mit den Kwargs `from_tf` und `from_flax` für die Methode `from_pretrained` geladen werden, um dieses Problem zu umgehen.
</Tip>
Im Allgemeinen empfehlen wir die Verwendung der Klasse "AutoTokenizer" und der Klasse "AutoModelFor", um trainierte Instanzen von Modellen zu laden. Dadurch wird sichergestellt, dass Sie jedes Mal die richtige Architektur laden. Im nächsten [Tutorial] (Vorverarbeitung) erfahren Sie, wie Sie Ihren neu geladenen Tokenizer, Feature Extractor und Prozessor verwenden, um einen Datensatz für die Feinabstimmung vorzuverarbeiten.

View File

@ -269,8 +269,11 @@ Sie können auch eine kleinere Anzahl an Tests angeben, um nur die Funktion, an
Standardmäßig werden langsame Tests übersprungen, aber Sie können die Umgebungsvariable `RUN_SLOW` auf `yes` setzen, um sie auszuführen. Dies wird den Download vieler Gigabyte an Modellen starten - stellen Sie also sicher, dass Sie sowohl genügend Festplattenspeicher als auch eine gute Internetverbindung oder die nötige Geduld haben!
> [!WARNING]
> Vergessen Sie nicht, einen *Pfad zu einem Unterordner oder einer Testdatei* anzugeben, um den Test auszuführen. Sonst führen Sie alle Tests im `tests` oder `examples` Ordner aus, was sehr lange dauern wird!
<Tip warning={true}>
Vergessen Sie nicht, einen *Pfad zu einem Unterordner oder einer Testdatei* anzugeben, um den Test auszuführen. Sonst führen Sie alle Tests im `tests` oder `examples` Ordner aus, was sehr lange dauern wird!
</Tip>
```bash
RUN_SLOW=yes python -m pytest -n auto --dist=loadfile -s -v ./tests/models/my_new_model

View File

@ -121,8 +121,11 @@ pip install -e .
Diese Befehle verknüpfen den Ordner, in den Sie das Repository geklont haben, mit den Pfaden Ihrer Python-Bibliotheken. Python wird nun in dem Ordner suchen, in den Sie geklont haben, zusätzlich zu den normalen Bibliothekspfaden. Wenn zum Beispiel Ihre Python-Pakete normalerweise in `~/anaconda3/envs/main/lib/python3.7/site-packages/` installiert sind, wird Python auch den Ordner durchsuchen, in den Sie geklont haben: `~/transformers/`.
> [!WARNING]
> Sie müssen den Ordner `transformers` behalten, wenn Sie die Bibliothek weiter verwenden wollen.
<Tip warning={true}>
Sie müssen den Ordner `transformers` behalten, wenn Sie die Bibliothek weiter verwenden wollen.
</Tip>
Jetzt können Sie Ihren Klon mit dem folgenden Befehl ganz einfach auf die neueste Version von 🤗 Transformers aktualisieren:
@ -151,15 +154,21 @@ Vorgefertigte Modelle werden heruntergeladen und lokal zwischengespeichert unter
3. Shell-Umgebungsvariable: `XDG_CACHE_HOME` + `/huggingface`.
> [!TIP]
> Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE` oder `PYTORCH_PRETRAINED_BERT_CACHE`, wenn Sie von einer früheren Iteration dieser Bibliothek kommen und diese Umgebungsvariablen gesetzt haben, sofern Sie nicht die Shell-Umgebungsvariable `TRANSFORMERS_CACHE` angeben.
<Tip>
Transformers verwendet die Shell-Umgebungsvariablen `PYTORCH_TRANSFORMERS_CACHE` oder `PYTORCH_PRETRAINED_BERT_CACHE`, wenn Sie von einer früheren Iteration dieser Bibliothek kommen und diese Umgebungsvariablen gesetzt haben, sofern Sie nicht die Shell-Umgebungsvariable `TRANSFORMERS_CACHE` angeben.
</Tip>
## Offline Modus
Transformers ist in der Lage, in einer Firewall- oder Offline-Umgebung zu laufen, indem es nur lokale Dateien verwendet. Setzen Sie die Umgebungsvariable `HF_HUB_OFFLINE=1`, um dieses Verhalten zu aktivieren.
> [!TIP]
> Fügen sie [🤗 Datasets](https://huggingface.co/docs/datasets/) zu Ihrem Offline-Trainingsworkflow hinzufügen, indem Sie die Umgebungsvariable `HF_DATASETS_OFFLINE=1` setzen.
<Tip>
Fügen sie [🤗 Datasets](https://huggingface.co/docs/datasets/) zu Ihrem Offline-Trainingsworkflow hinzufügen, indem Sie die Umgebungsvariable `HF_DATASETS_OFFLINE=1` setzen.
</Tip>
So würden Sie beispielsweise ein Programm in einem normalen Netzwerk mit einer Firewall für externe Instanzen mit dem folgenden Befehl ausführen:
@ -234,5 +243,8 @@ Sobald Ihre Datei heruntergeladen und lokal zwischengespeichert ist, geben Sie d
>>> config = AutoConfig.from_pretrained("./your/path/bigscience_t0/config.json")
```
> [!TIP]
> Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlädt](https://huggingface.co/docs/hub/how-to-downstream).
<Tip>
Weitere Informationen zum Herunterladen von Dateien, die auf dem Hub gespeichert sind, finden Sie im Abschnitt [Wie man Dateien vom Hub herunterlädt](https://huggingface.co/docs/hub/how-to-downstream).
</Tip>

View File

@ -68,17 +68,20 @@ Damit sich Ihr Modell so verhält, wie Sie es für Ihre Aufgabe erwarten, müsse
Lassen Sie uns über Code sprechen!
> [!TIP]
> Wenn Sie an der grundlegenden Verwendung von LLMs interessiert sind, ist unsere High-Level-Schnittstelle [`Pipeline`](pipeline_tutorial) ein guter Ausgangspunkt. LLMs erfordern jedoch oft fortgeschrittene Funktionen wie Quantisierung und Feinsteuerung des Token-Auswahlschritts, was am besten über [`~generation.GenerationMixin.generate`] erfolgt. Die autoregressive Generierung mit LLMs ist ebenfalls ressourcenintensiv und sollte für einen angemessenen Durchsatz auf einer GPU ausgeführt werden.
<Tip>
Wenn Sie an der grundlegenden Verwendung von LLMs interessiert sind, ist unsere High-Level-Schnittstelle [`Pipeline`](pipeline_tutorial) ein guter Ausgangspunkt. LLMs erfordern jedoch oft fortgeschrittene Funktionen wie Quantisierung und Feinsteuerung des Token-Auswahlschritts, was am besten über [`~generation.GenerationMixin.generate`] erfolgt. Die autoregressive Generierung mit LLMs ist ebenfalls ressourcenintensiv und sollte für einen angemessenen Durchsatz auf einer GPU ausgeführt werden.
</Tip>
<!-- TODO: update example to llama 2 (or a newer popular baseline) when it becomes ungated -->
Zunächst müssen Sie das Modell laden.
```py
>>> from transformers import AutoModelForCausalLM
>>> from transformers import AutoModelForCausalLM, BitsAndBytesConfig
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```
@ -116,12 +119,12 @@ Und das war's! Mit ein paar Zeilen Code können Sie sich die Macht eines LLM zun
Es gibt viele [Generierungsstrategien](generation_strategies), und manchmal sind die Standardwerte für Ihren Anwendungsfall vielleicht nicht geeignet. Wenn Ihre Ausgaben nicht mit dem übereinstimmen, was Sie erwarten, haben wir eine Liste der häufigsten Fallstricke erstellt und wie Sie diese vermeiden können.
```py
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
>>> tokenizer = AutoTokenizer.from_pretrained("openlm-research/open_llama_7b")
>>> tokenizer.pad_token = tokenizer.eos_token # Llama has no pad token by default
>>> model = AutoModelForCausalLM.from_pretrained(
... "openlm-research/open_llama_7b", device_map="auto", load_in_4bit=True
... "openlm-research/open_llama_7b", device_map="auto", quantization_config=BitsAndBytesConfig(load_in_4bit=True)
... )
```

View File

@ -27,8 +27,11 @@ In diesem Tutorial lernen Sie zwei Methoden kennen, wie Sie ein trainiertes oder
frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope;
picture-in-picture" allowfullscreen></iframe>
> [!TIP]
> Um ein Modell mit der Öffentlichkeit zu teilen, benötigen Sie ein Konto auf [huggingface.co](https://huggingface.co/join). Sie können auch einer bestehenden Organisation beitreten oder eine neue Organisation gründen.
<Tip>
Um ein Modell mit der Öffentlichkeit zu teilen, benötigen Sie ein Konto auf [huggingface.co](https://huggingface.co/join). Sie können auch einer bestehenden Organisation beitreten oder eine neue Organisation gründen.
</Tip>
## Repository-Funktionen

Some files were not shown because too many files have changed in this diff Show More