* Update expected values for one more `test_speculative_generation` after #40949 (#40967)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* FIX(trainer): ensure final checkpoint is saved when resuming training (#40347)
* fix(trainer): ensure final checkpoint is saved when resuming training
* add test
* make style && slight fix of test
* make style again
* move test code to test_trainer
* remove outdated test file
* Apply style fixes
---------
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add new model LFM2-VL (#40624)
* Add LFM2-VL support
* add tests
* linting, formatting, misc review changes
* add siglip2 to auto config and instantiate it in lfm2-vl configuration
* decouple image processor from processor
* remove torch import from configuration
* replace | with Optional
* remove layer truncation from modeling file
* fix copies
* update everything
* fix test case to use tiny model
* update the test cases
* fix finally the image processor and add slow tests
* fixup
* typo in docs
* fix tests
* the doc name uses underscore
* address comments from Yoni
* delete tests and unsuffling
* relative import
* do we really handle imports better now?
* fix test
* slow tests
* found a bug in ordering + slow tests
* fix copies
* dont run compile test
---------
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
* Fix outdated version checks of accelerator (#40969)
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix outdated version checks of accelerator
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use `skip_predictor=True` in vjepa2 `get_vision_features` (#40966)
use skip_predictor in vjepa2 `get_vision_features`
* [Trainer] Fix DP loss (#40799)
* fix
* style
* Fix fp16
* style
---------
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
* [timm_wrapper] better handling of "Unknown model" exception in timm (#40951)
* fix(timm): Add exception handling for unknown Gemma3n model
* nit: Let’s cater to this specific issue
* nit: Simplify error handling
* Fix Issue #39030: AutoTokenizer.from_pretrained does not propagate token (#40956)
* fix merge conflicts
* change token typing
---------
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
* [tests] Really use small models in all fast tests (#40945)
* start
* xcodec
* chameleon
* start
* layoutlm2
* layoutlm
* remove skip
* oups
* timm_wrapper
* add default
* doc
* consistency
* Add captured actual outputs to CI artifacts (#40965)
* fix
* fix
* Remove `# TODO: ???` as it make me `???`
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Revert change in `compile_friendly_resize` (#40645)
fix
* Track the CI (model) jobs that don't produce test output files (process being killed etc.) (#40981)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove `set_model_tester_for_less_flaky_tests` (#40982)
remove
* Benchmarking v2 GH workflows (#40716)
* WIP benchmark v2 workflow
* Container was missing
* Change to sandbox branch name
* Wrong place for image name
* Variable declarations
* Remove references to file logging
* Remove unnecessary step
* Fix deps install
* Syntax
* Add workdir
* Add upload feature
* typo
* No need for hf_transfer
* Pass in runner
* Runner config
* Runner config
* Runner config
* Runner config
* Runner config
* mi325 caller
* Name workflow runs properly
* Copy-paste error
* Add final repo IDs and schedule
* Review comments
* Remove wf params
* Remove parametrization from worfkflow files
* Fix callers
* Change push trigger to pull_request + label
* Add back schedule event
* Push to the same dataset
* Simplify parameter description
* ENH: Enable readline support for transformers chat (#40911)
ENH Enable readline support for chat
This small change enables GNU readline support for the transformers chat
command. This includes, among others:
- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l
Implementation
Although it may look strange, just importing readline is enough to
enable it in Python, see:
https://docs.python.org/3/library/functions.html#input
As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.
Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
* [testing] test `num_hidden_layers` being small in model tester (#40992)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* blt wip (#38579)
* blt wip
* cpu version
* cpu friendly with full entropy model (real time patching)
* adding config file instead of args file
* enable MPS
* refactoring unused code
* single config class in config file
* inherit from PreTrainedModel
* refactor LMTransformer --> BLTPatcher
* add conversion script
* load from new checkpoing with form_pretrained
* fixed demo from_pretrained
* clean up
* clean a few comments
* cleanup folder
* clean up dir
* cleaned up modeling further
* rename classes
* adding transformers Attention class and RotaryEmbedding class
* exchanged blt modules for transformers modules: attention, rotary_emb, create_causal_mask, etc
* seperate out patcher config, update modeling and conversion script
* rename vars to be more transformers-like
* rm unused functions
* adding cross attention from transformers
* pass arg
* rename weights
* updated conversion script
* overwritten commit! fixing PR
* apply feedback
* adding BLTRMSNorm like Llama
* add repeat_kv and eager_attention_forward copied from
* BLTMLP identical to MllamTextMLP
* clean up some args'
* more like mllama, but busier inits
* BLTTransformerLayer config
* decoder, encoder, global configs
* wip working on modular file
* cleaning up patch and configs
* clean up patcher helpers
* clean up patcher helpers further
* clean up
* some config renaming
* clean up unused configs
* clean up configs
* clean up configs
* update modular
* clean
* update demo
* config more like mllama, seperated subconfigs from subdicts
* read from config instead of self args
* update demo file
* model weights to causal lm weights
* missed file
* added tied weights keys
* BLTForCausalLM
* adding files after add-new-model-like
* update demo
* working on tests
* first running integration tests
* added integration tests
* adding tokenization tests, integration tests, and cleaned up tokenization file, + ruff
* tokenizer clean up
* modular file
* fixing rebase
* ruff
* adding correct basemodel output and updating config with checkpoint vals (for testing)
* BLTModelTests git status
* enabling inputs_embeds, although won't be equal to input_ids since need ids for patching logic
* fix sdpa == causal tests
* fix small model test and some gradient checkpointing
* skip training GC tests
* fix test
* updated modular
* update modular
* ruff
* adding modular + modeling
* modular
* more modern is_casual check
* cleaning up modular
* more modular reduction
* ruff
* modular fix
* fix styling
* return 2
* return 2
* fix some tests
* fix bltcrossattention after modular break
* some fixes / feedback
* try cache generate fix
* try cache generate fix
* fix generate tests
* attn_impl workaround
* refactoring to use recent TransformersKwargs changes
* fix hidden_states shape test
* refactor to new outputs
* simplify outputs a bit
* rm unneeded decoderlayer overwriting
* rename blt
* forgot tokenizer test renamed
* Reorder
* Reorder
* working on modular
* updates from modular
* new modular
* ruff and such
* update pretrainedmodel modular
* using cohere2 apply_rotary_pos_emb
* small changes
* apply feedback r2
* fix cross_attention
* apply more feedback
* update modeling fix
* load submodules from pretrainedmodel
* set initializer_range to subconfigs
* rm cross_attnetion_states pass when not needed
* add 7b projection layer support
* check repo
* make copies
* lost cohere2 rotate_half
* ruff
* copies?
* don't tie weights for submodules
* tie weights setting
* check docstrings
* apply feedback
* rebase
* rebased modeling
* update docs
* applying feedback
* few more fixes
* fix can_record_outputs
* fast tokenizer
* no more modulelist
* tok auto
* rm tokenizersss
* fix docs
* ruff
* fix after rebase
* fix test, configs are not subscriptable
---------
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
* [`RMSNorm`] Fix rms norm init for models that center around 1 (#40796)
* fix
* fixup inits
* oops
* fixup gemma
* fixup modular order
* how does this keep happen lol
* vaultgemma is new i forgot
* remove init check
* Make `EfficientLoFTRModelTest` faster (#41000)
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typoes in src and tests (#40845)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix more dates in model cards and wrong modalities in _toctree.yml (#40955)
* Fix model cards and modalities in toctree
* fix new models
* RUFF fix on CI scripts (#40805)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix dict like init for ModelOutput (#41002)
* fix dict like init
* style
* [tests] update `test_left_padding_compatibility` (and minimize overwrites) (#40980)
* update test (and overwrites)
* better test comment
* 0 as a default for
* Patch more `unittest.case.TestCase.assertXXX` methods (#41008)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* 🚨 [lightglue] fix: matches order changed because of early stopped indices (#40859)
* fix: bug that made early stop change order of matches
* fix: applied code suggestion
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: applied code suggestion to modular
* fix: integration tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Fix `PhimoeIntegrationTest` (#41007)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix Glm4v test (#41011)
fix
* Update after #41007 (#41014)
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix benchmark runner argument name (#41012)
* Adding support for Qwen3Omni (#41025)
* Add Qwen3Omni
* make fix-copies, import properly
* nit
* fix wrong setup. Why was audio_token_id renamed ?
* upds
* more processing fixes
* yup
* fix more generation tests
* down to 1?
* fix import issue
* style, update check repo
* up
* fix quality at my best
* final quality?
* fix doc building
* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE
* SKIP THE TEMPLATE ONE
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* Making compute_loss_func always take priority in Trainer (#40632)
* logger warn, if-else logic improved
* redundant if condition fix
* Modify Qwen3Omni parameter name since VL changed it (#41045)
Modify parameter name since VL changed it
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
* Fix Qwen video tests (#41049)
fix test
* [testing] Fix `qwen2_audio` (#41018)
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix typing of tuples (#41028)
* Fix tuple typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove optax (#41030)
Remove optax dep
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos in English/Chinese documentation (#41031)
* Fix typos and formatting in English docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos and formatting in Chinese docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use torch.autocast (#40975)
* Use torch.autocast
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Format code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* docs: improved RoPE function Docstrings (#41004)
* docs: improved RoPE functuon docstrings
* Update src/transformers/modeling_rope_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Fix condition for emitting warning when generation exceeds max model length (#40775)
correct warning when generation exceeds max model length
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
* Fix outdated torch version check (#40925)
Update torch minimum version check to 2.2
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add Whole Word Masking and Padding Strategy to DataCollatorForLanguageModeling (#39485)
* Add whole word masking
* Vectorize whole word masking functions
* Unit test whole word masking
* Remove support for TF in whole word masking
* [testing] Fix `seed_oss` (#41052)
* fix
* fix
* fix
* fix
* fix
* fix
* Update tests/models/seed_oss/test_modeling_seed_oss.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Remove repeated import (#40937)
* Remove repeated import
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix conflict
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Simplify unnecessary Optional typing (#40839)
Remove Optional
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add write token for uploading benchmark results to the Hub (#41047)
* Separate write token for Hub upload
* Address review comments
* Address review comments
* Ci utils (#40978)
* Add CI reports dir to gitignore
* Add utils to run local CI
* Review compliance
* Style
* License
* Fix CI jobs being all red 🔴 (false positive) (#41059)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Update quantization CI (#41068)
* fix
* new everything
* fix
* [i18n-bn] Add Bengali language README file (#40935)
* [i18n-bn] Add Bengali language README file and update links in existing language files
* Update Bengali README for clarity and consistency in model descriptions
* Improve documentation and errors in Mamba2-based models (#41063)
* fix bug in Mamba2 docs
* correct 'because on of' issue
* link to other Mamba2 model types
* github URL is not changed
* update error message in generated files
* Update team member list for some CI workflows (#41094)
* update list
* update list
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix crash when using chat to send 2+ request to gptoss (#40536)
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
* Minor addition, no split modules for VideoMAEE (#41051)
* added no split modules
* fixed typo
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
* Switch to `python:3.10-slim` for CircleCI docker images (#41067)
fix
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix argument name in benchmarking script (#41086)
* Fix argument name in benchmarking script
* Adjust vars
* Fix typos in documentation (#41087)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing (#40788)
* Fix optional typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix optional typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix schema typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing
* Fix typing
* Fix typing
* Fix typing
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Format code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Improve typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix quote string of np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix code
* Format
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove unused arguments (#40916)
* Fix unused arguments
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix wrong height and width when read video use torchvision (#41091)
* docs: Fix Tool Use links and remove dead RAG links (#41104)
docs: Fix tool use links. Remove dead RAG links. Fix style
* [tests] gpt2 + `CausalLMModelTester` (#41003)
* tmp commit
* tmp commit
* tmp commit
* rm old GPT2ModelTester
* nit bug
* add facilities for encoder-decoder tests; add comments on ALL overwrites/extra fns
* vision_encoder_decoder
* Fix `_get_test_info` for inherited tests (#41106)
* fix _get_test_info
* fix patched
* add comment
* ruff
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Remove bad test skips (#41109)
* remove bad skips
* remove more
* fix inits
* Format empty lines and white space in markdown files. (#41100)
* Remove additional white space and empty lines from markdown files
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add empty lines around code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update ruff to 0.13.1 + target Python 3.10 + apply fixes (#37809)
Update ruff to 0.13.1 target it to Python 3.10 and apply its fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Support loading LFM2 GGUF (#41111)
* add gguf config mapping for lfm2
* add lfm2 tensor process to unsqueeze conv weights
* adjust values from gguf config to HF config
* add test for lfm2 gguf
* ruff
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* [torchao safetensors] integrate torchao safetensors support with transformers (#40735)
* enable torchao safetensors
* enable torchao safetensors support
* add more version checking
* [Qwen3-next] Fix dimension mismatch in torch_chunk_gated_delta_rule and torch_recurrent_gated_delta_rule (#40963) (#41036)
* fix mismatched dims for qwen3 next
* propagate changes
* chore: renamed tot_heads to total_sequence_length
* Apply suggestion from @vasqu
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* minor fix to modular qwen3 next file
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Fix the error where a keyword argument appearing before *args (#41099)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix broken `` expressions in markdown files (#41113)
Fix broken expressions in markdown files
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove self-assignment (#41062)
* Remove self-assignment
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update src/transformers/integrations/flash_paged.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Clear pass
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* Fixed MXFP4 model storage issue (#41118)
* Fixed loading LongT5 from legacy checkpoints (#40724)
* Fixed loading LongT5 from legacy checkpoints
* Adapted the fix to work with missing lm_head
* dummy commit (#41133)
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
* dummy commit, nothing interesting
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix loading logic flaw with regards to unexpected and missing keys (#40850)
* Unexpected keys should be ignored at load with device map
* remove them all
* fix logic flaw
* fix
* simplify
* style
* fix
* revert caching allocator change
* add other test
* add nice doc
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Fix: align Qwen2.5-VL inference rope index with training by passing s… (#41153)
Fix: align Qwen2.5-VL inference rope index with training by passing second_per_grid_ts
* Fix single quotes in markdown (#41154)
Fix typos
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* extend gemma3n integration ut cases on XPU (#41071)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* Add Parakeet (#39062)
* first commit
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update to handle masking for bs>1
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* Add tests and docs
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update model ids
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update docs and improve style
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update librosa location
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* import guard torch too
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* ruff code checks fix
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* ruff format check
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* updated to parakeet names
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update script
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* Add tokenizer decoding
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* Remove other model dependency
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* clean tests
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix tests
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* linting
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix ruff lint warnings
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* move to seperate folders
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* add parakeet ctc model code
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* simplify encoder structure
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* update documentation
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* add parakeet to toctree
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix tests
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* add parakeet doc
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* Address comments
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* Update featurizer to compute lens directly
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix ruff tests
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix encoding format
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* fix minor ctc decoding
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
* revert modular_model_converter.py changes
* revert check_config_attributes.py changes
* refactor: fastconformer & parakeet_ctc -> parakeet
* modeling update
* test update
* propagate feature extractor updates
* propagate doc changes
* propagate doc changes
* propagate tokenization changes
* propagate conversion changes
* remove fastconformer tests
* remove modular
* update processor
* update processor
* tset update
* diverse fixes
* 100% macthing greedy batched
* Update conversion script.
* Refactor docs.
* Reafactor auto loading.
* Refactor and fix tokenization and processing.
* Update integration test.
* Modeling fixes:
- ensure correct attention mask shape
- ensure layer drop returns valid output
- correct blank token ID when computing CTC loss
* Format and repo consistency.
* Update model doc.
* Fix feature extraction tests.
* Fix (most) tokenizer tests.
* Add pipeline example.
* Fixes
* Use eager_attention_forward from Llama.
* Small tweaks.
* Replace Sequential with ModuleList
* Add check if not all layers copied
* Clean tokenizer.
* Standardize FastSpeech2ConformerConvolutionModule for Parakeet.
* Switch to modular for modeling and processing.
* Add processor tests.
* Fix modeling tests.
* Formating and docstrings.
* Add `return_attention_mask` like other feature extractors.
* clean up after merging main.
* nits on modeling
* configuration update
* nit
* simplification: use PretrainedTokenizerFast, simplify processor
* add dtype arg to mel_filter_bank
* feature extraction: simplify!
* modeling update
* change to ParakeetTokenizerFast
* correct attention mask handling
* auto update
* proc update
* test update
* feature extraction fixes
* modeling update
* conversion script update
* udpate tests feature integration
* update tokenization and tests
* processor tests
* revert audio_utils
* config docstring update
* blank_token -> pad_token
* modeling udpate
* doc update
* fix tests
* fix test
* fix tests
* address review comments
* add comment
* add comment
* explicitly not support flash
* atttention straightforward masking
* fix
* tokenizer update: skipping blank tokens by default
* doc update
* fix max_positions_embeddings handling
* nits
* change atol faeture extraction integration tests
* doc update + fix loss
* doc update
* nit
* update integration test for A10
* repo id name
* nit
---------
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
* Fix format of compressed_tensors.md (#41155)
* Fix table format
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix format
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Simplify and improve model loading logic (#41103)
* remove unexpected keys from inputs (they have nothing to do there)
* remove input
* simplify a lot init
* fix
* fix check for non-persistent buffer
* revert because too many old and bad models...
* remove comment
* type hint
* make it a real test
* remove model_to_load -> always use the same model
* typo
* remove legacy offload_folder (we never waste that memory anymore)
* do not change prefix anymore
* change very bad function name
* create adjust method
* remove useless method
* restrict
* BC
* remove unused method
* CI
* remove unused args
* small fix
* fix
* CI
* CI
* avoid too many loops
* fix regex
* cleaner
* typo
* fix
* fix
* Force new vision models addition to include a fast image processor (#40802)
* add test
* fix test and change cutoff date
* Add documentation to test
* Add language specifiers to code blocks of markdown files (#41114)
* Add language specifiers to code blocks of markdown files
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update docs/source/en/model_doc/qwen3_omni_moe.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/chat_templating_writing.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/chat_templating_writing.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update docs/source/en/chat_templating_writing.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Update nemotron.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update phimoe.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update README.md
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix syntax error
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Improve `add_dates` script (#41167)
* utils/add_dates.py
* put lfm2-vl in correct category
* Fix flash-attn for paged_attention when no kernels (#41078)
* Fix non-kernels flash attention paged implementation
* Cover all cases
* Style
* Update src/transformers/integrations/flash_paged.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Apply style fixes
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Remove data from examples (#41168)
Remove telemetry
* Enable fa in amd docker (#41069)
* Add FA to docker
* Use caching mechanism for qwen2_5
* Fix a typo in important models list
* Partial fixes for gemma3
* Added a commit ID for FA repo
* Detailled the expectation storage format
* Rebase fix
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* handle flash slow tests (#41072)
* handle flash slow tests
* update patch mask to 1/0 for flash
* don't skip flash
* flash
* raise tols
* rm flash support :(
* nits
---------
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
* Modernbert fix (#41056)
* Add FA to docker
* Fixed padding for mdernbert
* Fixed logits and hidden states extraction in ModernBertForMultipleChoice
* Added a test for ModernBertForMultipleChoice
* fixes
* More fixes and GREEN CI
* consistency
* moar consistency
* CI Runners - move amd runners mi355 and 325 to runner group (#41193)
* Update CI workflows to use devmi355 branch
* Add workflow trigger for AMD scheduled CI caller
* Remove unnecessary blank line in workflow YAML
* Add trigger for workflow_run on main branch
* Update workflow references from devmi355 to main
* Change runner_scale_set to runner_group in CI config
* [XPU] Add MXFP4 support for XPU (#41117)
* XPU supports gpt-oss MXFP4
* Complete MXFP4 UT file and comment information
* Complete MXFP4 UT file and comment information
* Fix code style
* Fix code style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* [tests] `CausalLMTester` automatically infers other test classes from `base_model_class` 🐛🔫 (#41066)
* halfway through the models
* update test checks
* refactor all
* another one
* use tuples
* more deletions
* solve bad inheritance patterns
* type
* PR ready?
* automatic model class inference from the base class
* vaultgemma
* make fixup
* make fixup
* rebase with gpt2
* make fixup :'(
* gpt2 is special
* More typing fixes (#41102)
* Fix noqa
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use np.ndarray
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* remove noqa
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix chars
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* enable flex attention ut cases on XPU (#40989)
* enable flex attention ut cases on XPU
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix(trainer): Avoid moving model with device_map (#41032)
* fix(trainer): Avoid moving model with device_map
When a model is loaded with `device_map="auto"` and is too large to fit on a single GPU, `accelerate` will offload some layers to the CPU or disk. The `Trainer` would previously attempt to move the entire model to the specified device, causing a `RuntimeError` because a model dispatched with `accelerate` hooks cannot be moved.
This commit fixes the issue by adding a check in `_move_model_to_device` to see if the model has an `hf_device_map` attribute. If it does, the device placement is assumed to be handled by `accelerate`, and the `model.to(device)` call is skipped.
A regression test is added to ensure the `Trainer` can be initialized with a model that has a `hf_device_map` that simulates offloading without raising an error.
* Added the logger warning for the move model
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
* Fix attention sink implementation in flex attention (#41083)
* Fix attention sink implementation in flex attention
* fix dim
* fix
* Remove print
* raisae error when return_lse is False yet s_aux is providewd
* Clean test files for merge
* Update src/transformers/integrations/flex_attention.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* force return lse
* Add to doc
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Separate docker images for Nvidia and AMD in benchmarking (#41119)
Separate docker images for Nvidia and AMD
* Make quantizers good citizens loading-wise (#41138)
* fix param_needs_quantization
* rewrite most hqq
* clean
* fix
* comment
* remove it from exception of safetensors
* start on bnb 4bits
* post-rebase fix
* make bnb4 bit a good citizen
* remove forgotten print
* make bnb 8bits a good citizen
* better hqq
* fix
* clean
* remove state dict from signature
* switch method
* make torchao a good citizen
* fixes
* fix torchao
* add check
* typo
* [`Kernels Attention`] Change fallback logic to error out on explicit kernels request and include FA3 (#41010)
* fix
* be more strict
* change logic to include fa3
* fix the case where nothing is requested
* modify old tests + add kernels related tests
* style
* Add EdgeTAM (#39800)
* initial comment
* test
* initial conversion for outline
* intermediate commit for configuration
* chore:init files for sam2
* adding arbitary undefined config
* check
* add vision
* make style
* init sam2 base model
* Fix imports
* Linting
* chore:sam to sam2 classes
* Linting
* Add sam2 to models.__init__
* chore:match prompt encoder with sam2 code
* chore:prepare kwargs for mask decoder
* Add image/video predictors
* Add CUDA kernel
* Add output classes
* linting
* Add logging info
* tmp commit
* docs for sam2
* enable image processing
* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize
* enable promptencoder of sam2
* fix promprencoder
* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)
* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)
* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)
* SamModel is now available (Need more chore for name)
* make fix-copies
* make style
* make CI happy
* Refactor VisionEncoder and PostioinEmbedding
* TO DO : fix the image_embeddings and sparse_embeddings part
* pure image inference done
* reusable features fix and make style
* styling
* refactor memoryattention
* tmp
* tmp
* refactor memoryencoder
TO DO : convert and inference the video pipeline
* TO DO : fix the image_encoder shape
* conversion finish
TO DO: need to check video inference
* make style
* remove video model
* lint
* change
* python utils/check_docstringspy --check_all
* python utils/check_config_attributes.py
* remove copies for sam2promptencoder due to configuration
* change __init__.py
* remove tensorflow version
* fix that to not use direct comparison
* make style
* add missing import
* fix image_embedding_size
* refactor Sam2 Attention
* add fully working video inference (refactoring todo)
* clarify _prepare_memory_conditioned_features
* simplify modeling code, remove unused paths
* use one model
* use auto_docstring
* refactor rope embeddings
* nit
* not using multimask when several points given
* add all sam2.1
* add video tmp
* add Sam2VideoSessionState + fast image proc + video proc
* remove init_states from model
* fix batch inference
* add image integration tests
* uniformize modeling code with other sam models and use modular
* pass vision tests an most model tests
* All tests passing
* add offloading inference state and video to cpu
* fix inference from image embedding and existing mask
* fix multi_boxes mask inference
* Fix batch images + batch boxes inference
* improve processing for image inference
* add support for mask generation pipeline
* add support for get_connected_components post processing in mask generation
* add fast image processor sam, image processor tests and use modular for sam2 image processor
* fix mistake in sam after #39120
* fix init weights
* refactor convert
* add integration tests for video + other improvements
* add needed missing docstrings
* Improve docstrings and
* improve inference speed by avoiding cuda sync
* add test
* skip test for vision_model
* minor fix for vision_model
* fix vision_model by adding sam2model and change the torch dependencies
* remove patch_size
* remove image_embedding_size
* fix patch_size
* fix test
* make style
* Separate hieradet and vision encoder in sam2
* fixup
* review changes part 1
* remove MemoryEncoderConfig and MemoryAttentionConfig
* pass q_stride instead of q_pool module
* add inference on streamed videos
* explicitely process streamed frames
* nit
* Improve docstrings in Sam2Model
* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel
* improve video inference api
* change inference_state to inference_session
* use modular for Sam2Model
* fix convert sam2 hf
* modular
* Update src/transformers/models/sam2/video_processing_sam2.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix minor config
* fix attention loading error
* update modeling tests to use hub checkpoints
* Use CI A10 runner for integration tests values + higher tolerance for video integration tests
* PR review part 1
* fix doc
* nit improvements
* enforce one input format for points, labels and boxes
* nit
* last few nits from PR review
* fix style
* fix the input type
* fix docs
* add sam2 model as conversion script
* improve sam2 doc
* add rough necessarry changes
* first working edgetam
* fix issue with object pointers
* Use modular as much as possible
* nit fixes + optimization
* refactor spatial perceiver
* cleanup after merge
* add working edgetam
* improve perceiver resampler code
* simplify/unify rope attention logic
* Improve comments in apply_rotary_pos_emb_2d
* add working tests
* fix test timmwrapper
* add docs
* make fixup
* nits
* fix modular
* fix modular
* PR review part 1
* split apply_rotary_pos_emb_2d
* add granularity to _prepare_memory_conditioned_features
* add dates to doc
* add separate mlp for memory attention
* Fix memory on wrong device
* store processed frames in dict
* update checkpoints in tests
* update dates
---------
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Fix EXAONE-4.0 dummy id (#41089)
* Fix EXAONE-4.0 dummy id
* Fix exaone4 dummy (#1)
* fix
* fix
* fix
* fix
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix 8bit bnb loading (#41200)
* Fix 8bit
* oups forgot the case where it is not prequantized
* Fix docker quantization (#41201)
* launch docker
* remove gptq for now
* run tests
* Revert "run tests"
This reverts commit f85718ce3a21d5937bf7405b8925c125c67d1a3e.
* revert
* Embed interactive timeline in docs (#41015)
* embed timeline in docs (test web componentand Iframe)
* test scaling
* test multiple scales
* compensate scale in width
* set correct syle and scale
* remove bottom space created by scale
* add timeline as a separate page
* reformulate docs after review
* [docs] Fix links (#41110)
fix
* Remove unnecessary Optional typing (#41198)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* docs/examples(speech): pin CTC commands to Hub datasets; add Windows notes (#41027)
* examples(speech): load Common Voice from Hub; remove deprecated dataset-script references (Windows-friendly notes)
* docs/examples(speech): pin CTC streaming & other CTC commands to Hub datasets; add Windows notes
* make style
* examples(speech): align DataTrainingArguments help with datasets docs; minor wording fixes
* docs/examples(speech): address review remove Hub subsection & Whisper tip; align dataset help text
* style: apply ruff/black/usort/codespell on examples/speech-recognition
* Apply style fixes
* Update examples/pytorch/speech-recognition/README.md
* update doc to match load_dataset
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix Qwen3-Omni audio_token_id serialization issue (#41192)
Fix Qwen3-Omni audio_token_id serialization by overriding parent's attribute_map
- Override attribute_map in Qwen3OmniMoeThinkerConfig to prevent inheritance of incorrect mapping
- Parent class maps audio_token_id -> audio_token_index, but implementation uses audio_token_id directly
- Fixes issue where custom audio_token_id values were not preserved during save_pretrained/from_pretrained cycles
Fixes#41191
* Wait for main process in _save_checkpoint to ensure best checkpoint exists (#40923)
* Update trainer.py
* fix
* fix format
* move barrier, delete redundant
* Avoid assumption that model has config attribute in deepspeed (#41207)
Avoid assumption that model has config in deepspeed
* Trainer: Pass `num_items_in_batch` to `compute_loss` in `prediction_step` (#41183)
* Add num_items_in_batch computation to predict_step.
* address comments.
* Fix test cases.
* fixup
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* [ESM] add accepts_loss_kwargs=False to EsmPreTrainedModel (#41006)
add accepts_loss_kwargs=False to EsmPreTrainedModel
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Align pull request template to bug report template (#41220)
The only difference is that I don't users to https://discuss.huggingface.co/ for hub issues.
* [generate] cache missing custom generate file (#41216)
* cache missing custom generate file
* make fixup
* Remove old Python code (#41226)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore (#41143)
Adapt to the SDPA interface to enable the NPU to call FlashAttentionScore.
Co-authored-by: frozenleaves <frozen@Mac.local>
* update code owners (#41221)
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Unify is_torchvision_v2_available with is_torchvision_available (#41227)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typing of train_args (#41142)
* Fix typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix fsdp typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix sliding window attn mask (#41228)
* Fix sliding window attn mask
* Clearer test
* Apply style fixes
* If Picasso made ascii drawings he would have made this
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults" (#41124)
* Revert "Fix DeepSpeed mixed precision precedence over Accelerate defaults (#3…"
This reverts commit df67cd35f0ca1a1cbf7147b2576db31b16200cf4.
* fix
* [docs] Fix tp_plan (#41205)
remove manual
* Fix white space in documentation (#41157)
* Fix white space
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Revert changes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix autodoc
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix qwen text config (#41158)
* fix qwen text config
* fix tests
* fix one more test
* address comments
* Video processor accepts single frames on cuda (#41218)
* fix
* why was is np if input is in torch
* Use math.log2 (#41241)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* fix TrainerIntegrationDeepSpeed UT failures (#41236)
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* [repo utils] Update `models_to_deprecate.py` (#41231)
* update models_to_deprecate
* exclude this file
* handle typos and aliases
* don't commit files
* PR suggestions; make fixup
* Use removeprefix and removesuffix (#41240)
* Use removeprefix and removesuffix
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* More fixes
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix pylint warnings (#41222)
* Remove unused variables
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove reimported packages
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix pylint warnings
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Simplify
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Remove all instances of `is_safetensors_available` (#41233)
* safetensors is a core dep
* fix
* ok
* simplify branching
* keep it for now
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* FP-Quant NVFP4 and Python 3.9 support (#39876)
* quartet
* quartet qat -> quartet
* format
* bf16 backward
* interfaces
* forward_method
* quartet -> fp_quant
* style
* List -> list
* list typing
* fixed format and annotations
* test_fp_quant
* docstrings and default dtypes
* better docstring and removed noop checks
* docs
* pseudoquantization support to test on non-blackwell
* pseudoquant
* Pseudoquant docs
* Update docs/source/en/quantization/fp_quant.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update docs/source/en/quantization/fp_quant.md
* Update docs/source/en/quantization/fp_quant.md
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update tests/quantization/fp_quant_integration/test_fp_quant.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update tests/quantization/fp_quant_integration/test_fp_quant.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* small test fixes
* dockerfile update
* spec link
* removed `_process_model_after_weight_loading`
* toctree
* nvfp4
* nvfp4 tests
* FP-Quant version bumped
* nvfp4 default and docs update
* trainable
* cpu if pseudoquant
* proper group size selection
* gsr
* qutlass requirement version bumo
* Upstream docker copy
* docs update
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* [`FA3`] Fix masking and loading logic in same process (#41217)
fix loading and fa3 masking
* [t5gemma] fix `get_text_config` and related fixes (#40939)
* tmp commit
* t5gemma fixes
* Don't convert to `safetensors` on the fly if the call is from testing (#41194)
* don't convert
* disable
* Update src/transformers/modeling_utils.py
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* fix
* disable
* disable
* disable
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* Resolve remote custom module path warnings (#41243)
* add peft team members to issue/pr template (#41262)
* add
* Update .github/PULL_REQUEST_TEMPLATE.md
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* docs: update bitsandbytes platform support (#41266)
* add more activation kernels, follow up (#40944)
* add more activation kernels
* fixing style
* fix version
* fix asr pipeline ut failures (#41275)
* fix asr pipeline ut failures
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* make style
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
* Use regex defailed flags (#41264)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix multi-video timestamp bug in Qwen-3-VL and GLM4V (#41229)
* fix multi-video timestamp bug in qwen3vl,glm4v
* run make fix-copies to sync modular files
* run make fix-copies to sync modular files
---------
Co-authored-by: UBT <daqin.luo@ubtrobot.com>
* Fix binding of video frames to video placeholder in `InternVL` model (#41237)
* Fix binding video frames to video placeholder in prompt
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
* Add test on binding video frames to prompt
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
* Fix code style issues
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
* Fix broken tests on `InternVLProcessor`
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
* Add `return_tensors` to video processor defaults
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
---------
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
* Deprecate Trackio environment variables and deploy to Spaces by default (#40950)
* allow prive space id for trackio
* complete docstring
* Deprecate environment variables for Trackio integration; use TrainingArguments instead and deploy by default
* style
* Enhance documentation for Trackio Space ID in TrainingArguments
* Allow private Space id for Trackio (#40948)
* allow prive space id for trackio
* complete docstring
* fix async client for transformers chat (#41255)
* fix-client
* fix
* Unify is_torchvision_v2_available with is_torchvision_available (#41259)
Fix is_torchvision_v2_available
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Use max/min (#41280)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Biogptlogits (#41270)
added logits slicing to BioGpt for seq classifier
Signed-off-by: Aviral <aviralkamaljain@gmail.com>
* Fix unnecessary single-item container checks (#41279)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix pylint generator warnings (#41258)
Fix pylint generator warnings
Signed-off-by: cyy <cyyever@outlook.com>
* feat: use `aws-highcpu-32-priv` for amd docker img build (#41285)
* feat: use `aws-highcpu-32-priv` for amd docker img build
* feat: add `workflow_dispatch` event to docker build CI
* Add processor and intergration test for qwen3vl (#41277)
* support aux loss in qwen3vlmoe
* update qwen3vl processor test!
* add integration tests for qwen3vl-30a3
* remove duplicated decorator
* code clean
* fix consistency
* do not inherit from nn.Linear for better quantization
* pass check
* Remove `test_initialization` (#41261)
remove it
* Remove some previous team members from allow list of triggering Github Actions (#41263)
* delete
* delete
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Build doc in 2 jobs: `en` and `other languages` (#41290)
* separate
* separate
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Fix mxfp4 dequantization (#41292)
fix
* [`Flex Attn`] Fix lse x attention sinks logic (#41249)
fix
* FIX: Bug in PEFT integration delete_adapter method (#41252)
The main content of this PR is to fix a bug in the delete_adapter method
of the PeftAdapterMixin. Previously, it did not take into account
auxiliary modules from PEFT, e.g. those added by modules_to_save. This
PR fixes this oversight.
Note that the PR uses a new functionality from PEFT that exposes
integration functions like delete_adapter. Those will be contained in
the next PEFT release, 0.18.0 (yet unreleased). Therefore, the bug is
only fixed when users have a PEFT version fullfilling this requirement.
I ensured that with old PEFT versions, the integration still works the
same as previously. The newly added test for this is skipped if the PEFT
version is too low.
(Note: I tested locally with that the test will pass with PEFT 0.18.0)
While working on this, I also cleaned up the following:
- The active_adapter property has been deprecated for more than 2 years
(#26407). It is safe to remove it now.
- There were numerous small errors or outdated pieces of information in
the docstrings, which have been addressed.
When PEFT < 0.18.0 is used, although we cannot delete modules_to_save,
we can still detect them and warn about it.
* Italian translation for README.md (#41269)
chore: add Italian translation for README.md
* Fix README.md error when installing from source (#41303)
* download and use HF Hub Cache (#41181)
use hub cache
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix some merge issues
* [test_all]
* [test-all]
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
Signed-off-by: Wang, Yi <yi.a.wang@intel.com>
Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: Daniel Bershatsky <daniel.bershatsky@gmail.com>
Signed-off-by: Aviral <aviralkamaljain@gmail.com>
Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Rangehow <88258534+rangehow@users.noreply.github.com>
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
Co-authored-by: Yuanyuan Chen <cyyever@outlook.com>
Co-authored-by: Hamish Scott <41787553+hamishs@users.noreply.github.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Harshal Janjani <75426551+harshaljanjani@users.noreply.github.com>
Co-authored-by: Branden <brandenkmurray@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-27-253.ec2.internal>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: Ákos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Ita Zaporozhets <31893021+itazap@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-30.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-103.ec2.internal>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-36.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-45.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-121.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-103.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-178.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-79.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-169-239.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-111.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-100.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-153.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-15.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-131.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-138.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-215.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-142.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-147.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-0.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-58.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-165-202.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-244.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-174-186.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-192.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-14.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-249.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-164-75.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-161-78.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-134.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-162-180.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-175-241.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-160-225.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-9.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-34.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-68.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-167-175.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-170-160.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-168-95.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-172-73.ec2.internal>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: StevenBucaille <steven.bucaille@gmail.com>
Co-authored-by: BakerBunker <17872844+BakerBunker@users.noreply.github.com>
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Ayush <ayushtanwar1729@gmail.com>
Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Yannick Schnider <Yannick.Schnider1@ibm.com>
Co-authored-by: Ralph Gleaton <70818603+rjgleaton@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Saidur Rahman Pulok <59414463+saidurpulok@users.noreply.github.com>
Co-authored-by: Nick Doiron <ndoiron@mapmeld.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
Co-authored-by: Duygu Altinok <duygu.altinok12@gmail.com>
Co-authored-by: Jinde.Song <juude.song@gmail.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: hbenoit <60629420+HaroldBenoit@users.noreply.github.com>
Co-authored-by: liangel-02 <liangel@meta.com>
Co-authored-by: nnul <107971634+notkisk@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: YangKai0616 <kai.yang@intel.com>
Co-authored-by: Karol Szustakowski <61427290+Szustarol@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Qile Xu <87457840+Xqle@users.noreply.github.com>
Co-authored-by: Yao Matrix <matrix.yao@intel.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Eric B <ebezzam@gmail.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-173-7.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-171-230.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-166-214.ec2.internal>
Co-authored-by: ita.zaporozhets@huggingface.co <ita_zaporozhets@ip-26-0-163-147.ec2.internal>
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Pk Patel <46714886+The5cheduler@users.noreply.github.com>
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: Samuel Barry <127697809+SamuelBarryCS@users.noreply.github.com>
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Kyungmin Lee <30465912+lkm2835@users.noreply.github.com>
Co-authored-by: OMOTAYO OMOYEMI <58476114+tayo4christ@users.noreply.github.com>
Co-authored-by: eun2ce <joeun2ce@gmail.com>
Co-authored-by: Sam Sharpe <ssharpe42y@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>
Co-authored-by: Pramodith Ballapuram <16939722+pramodith@users.noreply.github.com>
Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Co-authored-by: 魅影 <46097299+frozenleaves@users.noreply.github.com>
Co-authored-by: frozenleaves <frozen@Mac.local>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Xuehai Pan <XuehaiPan@pku.edu.cn>
Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>
Co-authored-by: tim120526 <43242086+tim120526@users.noreply.github.com>
Co-authored-by: UBT <daqin.luo@ubtrobot.com>
Co-authored-by: Daniel Bershatsky <daskol@users.noreply.github.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: 0xAvi <aviralkamaljain@gmail.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Federico Moretti <hello@federicomoretti.it>
Co-authored-by: Yangshen⚡Deng <yangshen.d@outlook.com>
* use consistent naming for padding
* no validation on pad size
* add warnings
* fix
* fox copies
* another fix
* fix some tests
* fix more tests
* fix lasts tests
* fix copies
* better docstring
* delete print
* working draft for LongCat
* BC changes to deepseek_v3 for modular
* format
* various modularities
* better tp plan
* better init
* minor changes
* make modular better
* clean up patterns
* Revert a couple of modular commits, because we won't convert in the end
* make things explicit.
* draft test
* toctree, tests and imports
* drop
* woops
* make better things
* update test
* update
* fixes
* style and CI
* convert stuff
* up
* ah, yes, that
* enable gen tests
* fix cache shape in test (sum of 2 things)
* fix tests
* comments
* re-Identitise
* minimize changes
* better defaults
* modular betterment
* fix configuration, add documentation
* fix init
* add integration tests
* add info
* simplify
* update slow tests
* fix
* style
* some additional long tests
* cpu-only long test
* fix last tests?
* urg
* cleaner tests why not
* fix
* improve slow tests, no skip
* style
* don't upcast
* one skip
* finally fix parallelism
* Support training florence2
* update doc and testing model to florence-community
* fix florence-2 test, use head dim 16 instead of 8 for fa2
* skip test_sdpa_can_dispatch_on_flash
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix#40067 : add UMT5 support in GGUF loader (config, tokenizer, test)
* chore: fix code formatting and linting issues
* refactor: move UMT5 GGUF test to quantization directory and clean up comments
* chore: trigger CI pipeline
* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.
* Add regression check to UMT5 encoder GGUF test
Verify encoder output against reference tensor values with appropriate tolerances for stability.
* Update tests/quantization/ggml/test_ggml.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update tests/quantization/ggml/test_ggml.py
remove comments
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Improve module name handling for local custom code
* Use `%lazy` in logging messages
* Revert "Use `%lazy` in logging messages"
This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.
* Add notes for sanitization rule in docstring
* Remove too many underscores
* Update src/transformers/dynamic_module_utils.py
* Update src/transformers/dynamic_module_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Move variable output controls to `prepare_inputs_for_generation`
* fix a bunch of models
* back to basics
* final touches
* Fix for CB attn mask and refactor
* Tests for CB (not all passing)
* Passing tests and a logger fix
* Fixed the KV metrics that were broken when we moved to hybrid alloc
* Fix circular import and style
* Added tests for FA
* Unfolded test to have device expectations
* Fixes for H100
* more fixes for h100
* H100 are good
* Style
* Adding some comments from #40831
* Rename test
* Avoid 1 letter variables
* Dictonnary is only removed during kwargs
* Test for supported sample
* Fix a unvoluntary slice
* Fixes for non-sliced inputs and small example improvments
* Slice inputs is more understandabe
* Style
* Update no split modules in T5Gemma model
* Update no_split_modules also for T5Gemma modular
* Remove model_split_percents from test cases
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Fix edge case for tokenize (#36277)
* Fix tokenizing dtype for float input cases
* add test for empty input string
* deal empty list of list like [[]]
* add tests for tokenizer for models with input that is not plain text
* created robust token counting by using existing include_num_input_tokens_seen variable and kept bool for backward compatibility and added string also to ensure everything goes well and kept default as is. also robust test cases are created
* some codebase mismatched in my local and remote, commiting to solve it and also solved code quality issue
* ci: retrigger tests
* another attemp to trigger CI for checks
* Fix DeepSpeed mixed precision precedence over Accelerate defaults
Resolves issue where Accelerate would default to bf16 mixed precision
when a DeepSpeed config specifies fp16, causing a ValueError. The fix
ensures DeepSpeed config takes precedence over TrainingArguments defaults
while preserving explicit user settings.
Changes:
- Add override_training_args_from_deepspeed() method to handle config precedence
- Reorder mixed precision environment variable setting in TrainingArguments
- Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices
Fixes#39849
* Add tests for DeepSpeed mixed precision precedence fix
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests
- Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults
- Test user explicit settings being preserved over DeepSpeed config
- Test precedence hierarchy: user settings > DeepSpeed config > defaults
- Replace massive 934-line test bloat with concise 50-line test suite
- Tests cover core functionality of PR #39856 mixed precision precedence fix
* Fix module loading for models with dots in names
* quality check
* added test
* wrong import
* Trigger CI rerun after making test model public
* Update src/transformers/dynamic_module_utils.py
* Update tests/utils/test_dynamic_module_utils.py
* Update tests/utils/test_dynamic_module_utils.py
* Move test
* make fixup
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
* CB example: better compare feature
* Cache managers, still issue w/ effective length
* WIP -- fix for effective length
* Renames
* Wroking, need better parity checks, we mind be missing 1 token
* Small fixes
* Fixed wrong attn mask and broke cache into pieces
* Warmup is slowing down things, disabling it
* Cache was too big, fixed
* Simplified index objects
* Added a profile option to the example
* Avoid calls to memory reporing tools
* Restore full attention read indices for better latency
* Adressed some TODOS and style
* Docstrings for cache managers
* Docstrings for Schedulers
* Refactor scheudlers
* [Important] Cache fix for sliding window, check with small sw size
* Updated doc for cache memory compute and cache as a whole
* Moved a todo
* Nits and style
* Fix for when sliding window is smaller than max batch per token
* Paged interface update
* Support for FLash in new API
* Fix example CB
* Fix bug in CB for paged
* Revert example
* Style
* Review compliance
* Style
* Styleeeee
* Removed NO_SLIDING_WINDOW
* Review #2 compliance
* Better art
* Turn cum_seqlens_k in a dict
* Attn mask is now a dict
* Update examples/pytorch/continuous_batching.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Adressed McPatate pro review
* Style and fix
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing
* Fix fast processor output format and add comprehensive tests
* Fix trailing whitespace in test file
* Apply ruff formatting to test file
* simplify pair validation logic
* add superglue tests to fast image processor
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Fix continue_final_message parameter in apply_chat_template
* after run fixup
* Handle trim in the template
* after fixup
* Update src/transformers/utils/chat_template_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* feat: err when unsupported attn impl is set w/ `--continuous_batching`
* refactor: move defaults and support list to CB code
* feat: add action item in error msg
* fix(serve): add default attn implementation
* feat(serve): add log when `attn_implementation` is `None`
* feat: raise Exception when attn_implementation is not supported by CB
* change |= operator to use torch logical or for friendly export to different backends
* change |= operator to use torch logical or for friendly export to different backends in grounding dino model
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
* initial commit
* initial setup
* Overiding imageGPT specific functions
* imported is_torch_available and utilized it for importing torch in imageGPT fast
* Created init and ImageGPTFastImageProcessorKwargs
* added return_tensors, data_format, and input_data_format to ImageGPTFastImageProcessorKwargs
* set up arguments and process and _preprocess definitions
* Added arguments to _preprocess
* Added additional optional arguments
* Copied logic over from base imageGPT processor
* Implemented 2nd draft of fast imageGPT preprocess using batch processing
* Implemented 3rd draft of imageGPT fast _preprocessor. Pulled logic from BaseImageProcessorFast
* modified imageGPT test file to properly run fast processor tests
* converts images to torch.float32 from torch.unit8
* fixed a typo with self.image_processor_list in the imagegpt test file
* updated more instances of image_processing = self.image_processing_class in the test file to test fast processor
* standardized normalization to not use image mean or std
* Merged changes from solution2 branch
* Merged changes from solution2 test file
* fixed testing through baseImageGPT processor file
* Fixed check_code_quality test. Removed unncessary list comprehension.
* reorganized imports in image_processing_imagegpt_fast
* formatted image_processing_imagegpt_fast.py
* Added arg documentation
* Added FastImageProcessorKwargs class + Docs for new kwargs
* Reformatted previous
* Added F to normalization
* fixed ruff linting and cleaned up fast processor file
* implemented requested changes
* fixed ruff checks
* fixed formatting issues
* fix(ruff after merging main)
* simplify logic and reuse standard equivalenec tests
---------
Co-authored-by: Ethan Ayaay <ayaayethan@gmail.com>
Co-authored-by: chris <christine05789@gmail.com>
Co-authored-by: Ethan Ayaay <98191976+ayaayethan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Squashed previous branch
* unify assisted generate to common decoding method signature
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Fix broken Llama4 accuracy in MoE part
Llama4 accuracy is broken by a bug in
https://github.com/huggingface/transformers/pull/39501 . It forgot to
transpose the router_scores before applying it to routed_in, causing
Llama4 to generate garbage output.
This PR fixes that issue by adding back the transpose() and adding some
comments explaining why the transpose() is needed.
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
* remove comment
---------
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* feat: support request cancellation
* test: add cancellation test
* refactor: use exisitng fn to check req cancellation
* feat(cb): make cancellation thread safe
* refactor(serve): update test to use `requests` instead of `httpx`
* Add instance attribute to DacVectorQuantize for use in DacResidualVectorQuantize.from_latents
* add from_latent tests
* style fix
* Fix style for test_modeling_dac.py
* add seq class for gemma3 text model
* add Gemma3TextForSequenceClassification to modeling file
* After run make fixup
* let's just check
* thiis is why it was crashing, tests were just failing...
* skip it, tested only for seq clf
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
* fix MetaCLIP 2 wrong link & wrong model names in the documentation and docstrings
* ruff reformatted
* update files generated by modular
* update meta_clip2 to metaclip_2 to match the original
* _supports_flash_attn = False
---------
Co-authored-by: Yung-Sung Chuang <yungsung@meta.com>
* Support MUSA (Moore Threads GPU) backend in transformers
Add accelerate version check, needs accelerate>=0.33.0
* Support TF32 flag for MUSA backend
* fix typo
* fix: continuous batching in `transformers serve`
* fix: short circuit inner gen loop when prepare_next_batch prepared nothing
* docs: add comment explaining FastAPI lifespan
* test: add CB serving tests
* refactor: remove gen cfg max new tokens override bc unnecessary
* docs: add docstring for `ServeCommand::run`
* feat: use new `DecodeStream` API
* Expectations for gemma3
* Fixes for Qwen2_5_VL tests
* Added expectation but underlying pb is still there
* Better handling of mrope section for Qwen2_5_vl
* Fixes for FA2 tests and reformat batch test for Qwen2_5_Omni
* Fix multi-device error in qwen2_5_omni
* Styel and repo-consistency
* Removed inherited test because fix in common
* slow tests fixes
* Style
* Fixes for qwen2_5_vl or omni for FA test
* update make nested image list
* fix make flat list of images
* update type anno
* fix image_processing_smolvlm
* use first image
* add verbose comment
* fix images
* rollback
* fix ut
* Update image_processing_smolvlm.py
* Update image_processing_idefics3.py
* add tests and fix some processors
* fix copies
* fix after rebase
* make the test cover chat templates
* sjip udop, no point in fixing it
* fix after rebase
* fix a few more tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: raushan <raushan@huggingface.co>
* porting not maintained jieba to rjieba
* Fix format
* replaced the line with rjieba instead of removing it
* cut_all is not included as a parameter. cut_all is a seperate function rjieba
* rev
* jieba remove installation
* Trigger tests
* Update tokenization_cpm.py
* Update tokenization_cpm_fast.py
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Add bfloat16 support detection for MPS (Apple Silicon) in is_torch_bf16_gpu_available
bfloat16 seems to have been supported for a few years now in Metal and torch.mps.
Make sure to allow it and not throw on bf16 usage with "Your setup doesn't support bf16/gpu." from TrainingArguments.
* Check bf16 support for MPS using torch method
Actually seems method exists: 5859edf113/torch/_dynamo/device_interface.py (L519)
It simply checks if you are on MacOs 14 or higher.
* Document Metal emulation for bf16 support
Add note about Metal emulation for bf16 support on M1/M2.
* Update bf16 support check for MPS backend
is_bf16_supported() not exposed even if defined on MPSInterface, use same approach as in accelerate pr.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* first step if flash not installed but you set to use it
* try importing
* now default to using it
* update our tests as well
* wow yesterday I was not awake
* fixup
* style
* lol the fix was very very simple
* `RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
` for updated dockers
* push review comments
* fix
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* dump ugly option to check again tomorrow
* tiny update
* do not save as nested dict yet!
* fix and add tests
* fix dia audio tokenizers
* rename the flag and fix new model Evolla
* fix style
* address comments
* broken from different PRp
* fix saving layoutLM
* delete print
* delete!
* init swissai model
* AutoModelForCausalLM
* AutoModelForCausalLM mapping
* qk norm and post ln optional
* fix wrong shape of qk norm: megatron uses head_dim
* automodel fixes
* minor fix in forward
* fix rope validation to accept llama3 scaling
* `SwissAIForTokenClassification` support
* Align `SwissAI` to v4.52.4
* Align `SwissAI` to v4.53.1
* Init CUDA xIELU
* `SwissAI*`->`Apertus*`
* ci fix
* check_docstring ignore ApertusConfig
* Licensing and placeholder tests
* Placeholder doc
* XIELU syntax
* `_xielu_python` optimization
* Fix xIELU
* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint
* Modular `Apertus`
* CUDA xIELU logging
* ci fix
* ci fix
* ci fix
* Update license
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update tests/models/apertus/test_modeling_apertus.py
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* `.utils.import_utils.is_torchdynamo_compiling`
* `Apertus` class ordering
* `past_key_value{->s}`, `make fix-copies`
* ci fix
* Remove unused configuration parameters
* `{beta,eps}` saved in checkpoint
* `{beta,eps}` Temporarily on CPU
* Suggestions
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* ci fix
* remove fx_compatible (deprecated)
* remove `rotary_embedding_layer`
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
* fully removing `Mask4DTestHard` class
Not needed (for now)
* switch to `dtype` instead of `torch_dtype`
Following this:
https://github.com/huggingface/transformers/pull/39782
* remove unused imports
* remove `cache_implementation="static"`
* +Apertus to `docs/source/en/_toctree.yml` for the doc builder
---------
Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
* docs(pixtral): Update Pixtral model card to new format
* docs(pixtral): Change cuda into auto for device_map
* docs(pixtral): Apply suggestions from review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Apply suggestions from review, changing mistral-community into Mistral AI
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Apply suggestions from review [!TIP] part
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Finalize model card with tested code examples
This commit finalizes the update for the Pixtral model card.
* Fix the hfoption by the right one
* @BryanBradfo docs(pixtral): Changing the redirection of bitsandbytes
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Add of ` to highlight the tokens
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Move image block per final review
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix in modular
* remove leftover print
* fix everything except when it's in assignment
* fix assignment as well
* more general
* better
* better
* better comment
* docstring
* cleaner
* remove base
* doc
* Rework of the CB example
* Further rework of CB example
* Refactor PA cache, slice on tokens, add debug prints -- WIP
* Slice cache -- WIP
* Added a mechanism to check batched outputs in CB script
* Less logging, debug flag for slice, !better reset! -- WIP
* QOL and safety margins
* Refactor and style
* Better saving of cb example
* Fix
* Fixes and QOL
* Mor einformations about metrics
* Further logging
* Style
* Licenses
* Removed some comments
* Add a slice input flag
* Fix in example
* Added back some open-telemetry deps
* Removed some aux function
* Added FA2 option to example script
* Fixed math (all of it)
* Added a simple example
* Renamed core to classes
* Made allocation of attention mask optionnal
* Style
* Relaxed assumptions on cache_config
* Review compliance
* Style
* Styyyle
* Removed default and added args
* Rebase mishapfix
* Propagate args to TorchExportableModuleForDecoderOnlyLM
* Fix the test I wanted fixed in this PR
* Added some AMD expectation related to cache tests
* draft update two models for now
* batch update all VLMs first
* update some more image processors
* update
* fix a few tests
* just make CI green for now
* fix copies
* update once more
* update
* unskip the test
* fix these two
* fix torchcodec audio loading
* maybe
* yay, i fixed torchcodec installation and now can actually test it
* fix copies deepseek
* make sure the metadata is returrned when users request it
* add docs
* update
* fixup
* Update src/transformers/audio_utils.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/glm4v/video_processing_glm4v.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* update
* what if we set some metadata attr to `None`
* fix CI
* fix one test
* fix 4 channel test
* fix glm timestemps
* rebase gone wrong
* raise warning once
* fixup
* typo
* fix copies
* ifx smolvlm test
* this is why torch's official benchmark was faster, set threads to `0`
* Apply style fixes
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* initial context_parallel_size support in trainer
* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens
* use parallelism_config.cp_enabled
* add parallelism_config to trainer state
* warn when auto-enabling FSDP
* fix some reviews
* WIP: somewhat matching loss
* Feat: add back nested_gather
* Feat: cleanup
* Fix: raise on non-sdpa attn
* remove context_parallel_size from TrainingArguments
* if we have parallelism_config, we defer to get_state_dict from accelerate
* fix form review
* Feat: add parallelism config support
* Chore: revert some unwanted formatting changes
* Fix: check None
* Check none 2
* Fix: remove duplicate import
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fin
* require accerelate 1.10.1 and higer
---------
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add `tokenizer_kwargs` arg to text generation pipeline.
* chore: re-run CI
* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline
* Fix `tokenizer_encode_kwargs` doc string.
* Fix note related to `tokenizer _kwargs` in text generation pipeline
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add a test
* tempdir
* fix import issue[
* wow I am tired
* properly init
* i am not super familiar with quantizer api :|
* set to TRUE fro now
* full support
* push current changes
* will clean this later but the imports are a shitshow here
* this correctly saves the block and scales but forward seems broken
* quanitze was not correct
* fix storage
* why were bias even included
* finally!
* style
* fix style
* remove print
* lazy import
* up
* not sure what happens this works now?
* holy molly it was not so far
* okay this seems to work!
* workings!!!
* allow save_pretrained to create PR
* Apply suggestions from code review
* fixup
* add deqyabtze fakse as wek
* working new
* fix
* rm swizzle and unswizzle during saving
* rm print
* Update src/transformers/modeling_utils.py
* fix
* style
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
* Fix label smoothing incompatibility with multi-label classification (#40258)
* Improve label smoothing multi-label check based on reviewer feedback
- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.
Co-authored-by: pranam-gf <pranam@goodfin.com>
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py
These typos were in variable names used for parsing path components in weight conversion scripts.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
* fix to the typings which are unmatched to FA function signature
cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching
It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`
* format changes by ruff
* Update src/transformers/integrations/flash_paged.py
unused function arg in `PagedAttentionCache.update`
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* revert continuous_batching signiture, which is more meaningful
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* this should be explicit
* fix Nth ESM exception
* tryout decoder
* this as well
* revert again
* 🧠
* aaah ESM has two modelings aaah
* broom broom
* format
* wrong copies
* copies
* modular cleanups
* format
* modularities
* wrong mergefix
* seriously
* align with new model
* new model
* update everywhere
* style
* pipelines
* switch it everywhere in tests
* switch it everywhere in docs
* switch in converters everywhere
* update in examples
* update in model docstrings
* style
* warnings
* style
* Update configuration_utils.py
* fix
* Update configuration_utils.py
* fixes and add first test
* add pipeline tests
* Update test_pipelines_common.py
* add config test
* Update test_modeling_common.py
* add new ones
* post rebase
* add new
* post rebase adds
* Update trainer.md
* Update trainer.md
Removed the detail about label_names argument usage from the tip/ warning section
* Update training_args.py
Added the label_names usage clarification in the docstring
* Update trainer.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* handle support for cache classes when num enc layers != num dec layers
* handle overwrites
* one more corner case
* Update src/transformers/generation/utils.py
* Update src/transformers/generation/utils.py
* Apply suggestions from code review
* handle corner case :o
* fix
* cleanup, revert aimv2 fa changes
* fix aria
* i searched a long time but the cross dependency is for the recent models so...
* this was something... evolla
* fix modernbert decoder + make fa test more robust
* nit
* Clean up xcodec addition.
* Clean up config.
* Switch to fixtures test.
* Small stuff.
* Polish XCodec and standardize across codecs.
* Update src/transformers/models/xcodec/modeling_xcodec.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Format and fix test.
* Update tol.
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* make visualizer rely on create causal mask
* format
* fixup
* fixup
* read token
* read token, duh
* what is up with that token
* small tests?
* adjust
* try with flush
* normalize for ANSI
* buffer shenanigans
* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
* add basic type hints to import module
* run make fixup
* remove optional
* fixes
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* it was long due!
* use the official kernel
* more permissive
* update the kernel as well
* mmm should it be this?
* up pu
* fixup
* Update test_modeling_gpt_oss.py
* style
* start with 20b
* Update modeling_utils.py
* make sure we update with the module's plan
* use public api
* oups
* update
* fix failing test
* Update src/transformers/integrations/tensor_parallel.py
* Update src/transformers/integrations/tensor_parallel.py
* fix
* make the API more friendly!
* fix tests
* fix styling
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* init
* add modular
* fixup
* update configuration
* add processing file
* update auto files
* update
* update modular
* green setup_and_quality ci
* it works
* fix some tests
* commit florence2
* update test
* make test cases done - 16 left
* style
* fix few test cases
* fix some tests
* fix init test
* update florence2 vision style
* hope is green
* fix init test
* fix init
* update modular
* refactor vision module
* fix: channel attention use dynamic scale
* update modular
* update
* update attention mask
* update
* fix naming
* Update src/transformers/models/florence2/processing_florence2.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* spatial block works
* more beautiful
* more more beautiful
* merge main
* merge main and fixup
* fix typing hint
* update modeling
* fix eager matches sdpa
* fix style
* fix compile test - all green
* remove florence2 language
* remove Florence2LanguageModel things
* fix style
* update florence2 model
* override prepare encoder_decoder for generation
* add weight conversion script
* rewrite channel attention to use sdpa
* eleminate 1 tranpose op
* support fa2
* fix quality check
* chore: reformat `test_modeling_florence2.py`
* some refactor for processor
* some refactor for processor
* update naming convention and remove BC
* make it pass the test
* fix: correct Embedding Cosine
* update comments and docstring
* support input_embeds
* support input embeds ideally
* fix style
* fix style
* fix style again :D
* add test prcoessor
* refactor processor and add test for processor
* reformat test processor
* make fixup
* fix schema check
* remove image_token
* ensure image token in tokenizer and fix integration tests
* fix processor test
* add more integration tests for large model and rename test_processor to test_processing
* test_assisted_decoding_sample should pass
* update doc and make model work with image text to text pipeline
* docs: add sdpa bagde
* resolve cyril's comments
* fix import torch error
* add helper get_placeholder_mask
* inherit from llava
* florence2 may not _supports_attention_backend because of bart ...
* move florence2 model card to multimodal
* let base model always return_dict
* fix style
* tiny update doc
* set _checkpoint_conversion_mapping = {}
* fix code quality
* support flex and compile graph and move external func to internal func
* remove condition because it always true
* remove window funcs
* move post processor config out
* fix ci
* new intro to trigger test
* remove `kernel_size` argument
---------
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix: pass adamw optimizer parameters to StableAdamW
* add test for stable_adamw initialization with trainer arguments
* address copilot suggestion
* fix: update weight_decay handling in stable_adamw kwargs
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update GPT-NeoX-Japanese model card
* Apply suggestions from code review
* Update gpt_neox_japanese.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Standardize RAG model card
Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge
* Standardize RAG model card
Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig
* Fix chat CLI GPU loading and request_id validation issues (#40230)
This commit addresses two critical bugs in the transformers chat CLI:
1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
- Chat CLI now automatically uses GPU when available instead of defaulting to CPU
- Matches the behavior of the underlying serving infrastructure
2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
- Fixes "Unexpected keys in the request: {'request_id'}" error on second message
- Allows request_id to be properly sent and validated by the server
Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message
* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming
* Update image_processing_perception_lm_fast.py
Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.
* Update processing_perception_lm.py to match kwargs vision input type
* Update image_processing_perception_lm_fast.py kwargs to signature args
* Skipping pytree registration in case fsdp is enabled
* Beauty changes
* Beauty changes
* Moved the is_fsdp_available function to import utils
* Moved is_fsdp_available to integrations.fsdp
* Skipping pytree registration in case fsdp is enabled
* Beauty changes
* Beauty changes
* Moved the is_fsdp_available function to import utils
* Moved is_fsdp_available to integrations.fsdp
* Added pytree registration inside dynamic cache class
* Making ci/cd lords happy
* Adding a check if DynamicCache is already a leaf
* Adding try/catch for multiple initializations of DynamicCache in test suites
* Moving dynamic cache pytree registration to executorch
* Adding try catch back
* set inputs_embeds to None while generate to avoid audio encoder forward in generation process
* set input_features to none instead
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
* Add expectation to t5 for rocm 9.4
* Made EncoderDecoderCache compatible with nn.DataParallel
* Fixed t5gemma EncoderDecoderCache
* Added todos in autoformer
* Ruff
* Init is self-contained
* Review compliance
* Fixed kwargs init of EncoderDecoderCache
* add jinja2 as a dependency
* Make jinja2 a core dependency in install_requires
- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Make jinja2 a core dependency in install_requires
* Make jinja2 an extra dependency instead of adding a core dep
---------
Co-authored-by: Claude <noreply@anthropic.com>
* remove transpose_for_scores call
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* fix copied evolla code
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* fix similar errer at qwen2_vl and do make fix-copies
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* Apply style fixes
---------
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"
This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* docs: Update LayoutLM model card with standardized format
* Apply suggestions from code review
This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Address remaining review comments
* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header
* Update layoutlm.md
* Update layoutlm.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update check_tokenizers.py
chore(typing): add type hints to check_tokenizers script
- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)
* Update check_tokenizers.py
chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py
- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic
* [MINOR:TYPO] Update base.py
All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)
Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns: `ValueError: The task does not provide any default models for options ('EN', 'FR')`
It might be a good idea to allow for uppercase, but that's for another issue.
* [MINOR:TYPO] Update __init__.py
* update
* fix the test for DepthPro
* PR comments
* wait, I didn't delete this in prev commit?
* fix
* better way
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* added dates to the models with a single hf papers link
* added the dates for models with multiple papers
* half of no_papers models done
* rest of no_papers models also done, only the exceptions left
* added copyright disclaimer to sam_hw, cohere, cohere2 + dates
* some more fixes, hf links + typo
* some new models + a rough script
* the script looks robust, changed all paper links to hf
* minor change to handle technical reports along with blogs
* ran make fixup to remove the white space
* refactor
* build: add TvpImageProcessorFast
- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.
* fix: TvpImageProcessorFast with new resize method and update processing logic
* build: add TvpImageProcessorFast
* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests
- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.
* fix: Enhance TvpFastImageProcessorKwargs and update documentation
- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.
* fix: tested now with python 3.9
* fix: remove tvp kwargs from docs
* simplify processing
* remove import and fix tests
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* fix: changed is_causal to be False
* fix: Added original cross attention bug
* fix: fixed the way bordel removal is computed
* fix: added missing normalization on coarse features
* test: fixed integration tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* initial comment
* test
* initial conversion for outline
* intermediate commit for configuration
* chore:init files for sam2
* adding arbitary undefined config
* check
* add vision
* make style
* init sam2 base model
* Fix imports
* Linting
* chore:sam to sam2 classes
* Linting
* Add sam2 to models.__init__
* chore:match prompt encoder with sam2 code
* chore:prepare kwargs for mask decoder
* Add image/video predictors
* Add CUDA kernel
* Add output classes
* linting
* Add logging info
* tmp commit
* docs for sam2
* enable image processing
* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize
* enable promptencoder of sam2
* fix promprencoder
* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)
* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)
* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)
* SamModel is now available (Need more chore for name)
* make fix-copies
* make style
* make CI happy
* Refactor VisionEncoder and PostioinEmbedding
* TO DO : fix the image_embeddings and sparse_embeddings part
* pure image inference done
* reusable features fix and make style
* styling
* refactor memoryattention
* tmp
* tmp
* refactor memoryencoder
TO DO : convert and inference the video pipeline
* TO DO : fix the image_encoder shape
* conversion finish
TO DO: need to check video inference
* make style
* remove video model
* lint
* change
* python utils/check_docstringspy --check_all
* python utils/check_config_attributes.py
* remove copies for sam2promptencoder due to configuration
* change __init__.py
* remove tensorflow version
* fix that to not use direct comparison
* make style
* add missing import
* fix image_embedding_size
* refactor Sam2 Attention
* add fully working video inference (refactoring todo)
* clarify _prepare_memory_conditioned_features
* simplify modeling code, remove unused paths
* use one model
* use auto_docstring
* refactor rope embeddings
* nit
* not using multimask when several points given
* add all sam2.1
* add video tmp
* add Sam2VideoSessionState + fast image proc + video proc
* remove init_states from model
* fix batch inference
* add image integration tests
* uniformize modeling code with other sam models and use modular
* pass vision tests an most model tests
* All tests passing
* add offloading inference state and video to cpu
* fix inference from image embedding and existing mask
* fix multi_boxes mask inference
* Fix batch images + batch boxes inference
* improve processing for image inference
* add support for mask generation pipeline
* add support for get_connected_components post processing in mask generation
* add fast image processor sam, image processor tests and use modular for sam2 image processor
* fix mistake in sam after #39120
* fix init weights
* refactor convert
* add integration tests for video + other improvements
* add needed missing docstrings
* Improve docstrings and
* improve inference speed by avoiding cuda sync
* add test
* skip test for vision_model
* minor fix for vision_model
* fix vision_model by adding sam2model and change the torch dependencies
* remove patch_size
* remove image_embedding_size
* fix patch_size
* fix test
* make style
* Separate hieradet and vision encoder in sam2
* fixup
* review changes part 1
* remove MemoryEncoderConfig and MemoryAttentionConfig
* pass q_stride instead of q_pool module
* add inference on streamed videos
* explicitely process streamed frames
* nit
* Improve docstrings in Sam2Model
* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel
* improve video inference api
* change inference_state to inference_session
* use modular for Sam2Model
* fix convert sam2 hf
* modular
* Update src/transformers/models/sam2/video_processing_sam2.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix minor config
* fix attention loading error
* update modeling tests to use hub checkpoints
* Use CI A10 runner for integration tests values + higher tolerance for video integration tests
* PR review part 1
* fix doc
* nit improvements
* enforce one input format for points, labels and boxes
* nit
* last few nits from PR review
* fix style
* fix the input type
* fix docs
* add sam2 model as conversion script
* improve sam2 doc
* nit fixes + optimization
* split sam2 and sam2_video in two models
* PR review part 1
* fix None for default slow processor of sam2
* remove unecessary code path in sam2_video
* refactor/simplify RoPE
* replace embedding module list with embedding matrix
* fix tests
* remove kernel
* nit
* use lru_cache for sine_pos_embeddings
* reorder sam2_video methods
* simplify sam2_video
* PR review part 1
* simplify sam2 video a lot
* more simplification
* update integration tests with updated conftest
* more explicit config for hieradet
* do post_processing outside of sam2 video model
* Improve Sam2VideoVisionRotaryEmbedding
* fix tests
* update docs and fix mask2former/oneformer
* avoid unnecessary reshapes/permute
* fix device concatenating points
* small dtype fix
* PR review
* nit
* fix style and finish up doc
* fix style
* fix docstrings
* fix modular
---------
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* docs: ko: main_classes/optimizer_schedules
* feat: nmt draft
* fix: improve TOC anchors and expressions in optimizer_schedules
- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions
* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'
Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.
* fix: Use more natural Korean inheritance expression
Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.
* fix: Use consistent '미세 조정' translation for 'finetuned models'
Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT
* fix min torchvision version
* use InterpolationMode directly
* remove unused is_torchvision_greater_or_equal,
* nit
* Add initial collated reports script and job definition
* provide commit hash for this run. Also use hash in generated artifact name. Json formatting
* tidy
* Add option to upload collated reports to hf hub
* Add glob pattern for test report folders
* Fix glob
* Use machine_type as path filter instead of glob. Include machine_type in collated report
* fix flash attention
* i got a stroke reading that comment
* change dropout kwarg back to before
* rename _fa3... as it's used for multiple variants and should work as fallback instead
* simplify imports and support kwargs for fa
* style
* fix comments order
* small fix
* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart
* style
* allow fullgraph by preloading on init
* make globals "private"
* ci pls be happy
* change skip conditions based on backend flag (indicating missing mask interface)
* move globals support to a function to prepare kwargs
* style
* generalize supported kwargs
* small change to doc
* fix
* add comments
* style
* revert prep during generate
* style
* revert weird style changes
* add fa kwarg prep during generate with fixes back
* how did this even happen
* how
* add comment
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
* Fix PerceptionLM image preprocessing for non-tiled image input.
* Add test for single tile vanilla image processing.
* ruff format
* recover missing test skip
* Simplify test.
* minor test name fix
* Update HuBERT model card according to template
Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section.
* Address review comments and changes requested to hubert.md
* Update hubert.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* init
* update
* uupdate
* ruff
* t patch is 2 defalut not 1
* draft
* back
* back1
* update
* config update
* update using glm-41 format
* add self.rope_scaling = config.rope_scaling
* update config
* update
* remove the processor
* update
* fix tests
* update
* for test
* update
* update 2126
* self.rope_scaling is missing in GLM4MOE lets add it
* update
* update
* Update modular_glm4v_moe.py
* change config
* update apply_multimodal_rotary_pos_emb
* format
* update
* Delete 3-rollout_qas_thinking_answers.py
* use right name
* update with place holder
* update
* use right rotary
* Update image_processing_glm4v_fast.py
* rope_config_validation needs to rewrite the entire config file in modular
* update
* changed name
* update
* Update modeling_glm4v_moe.py
* _init_weights shoud be add in Glm4vMoePreTrainedModel
* remove use_qk_norm
* Update modular_glm4v_moe.py
* remove use_qk_norm as it is not use
* fix style
* deprecations are not needed on new models
* fix merge issues
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* all modulars and llama
* apply modular
* bert and gpt2 copies
* fix imports
* do it everywhere
* fix import
* finalize it
* fix
* oups set it in modular
* style
* fix
* Add 1 version to deprecation cycle
* Update modeling_layers.py
* Fix missing video inputs for PerceptionLM.
* Minor fix for vanilla input image (only C,H,W, no tiles dim).
* Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)."
This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.
* Add amd expectation in internvl
* Add amd expectation to llama
* Added bnb decorator for a llava test that requires bnb
* Added amd expectation for mistral3
* Style
* Support input_embeds in torch exportable decoders
* Hybrid cache update
* Manually change some callsites
* AI changes the rest of the call sites
* Make either input_ids/inputs_embeds mandatory
* Clean up
* Ruff check --fix
* Fix test
* pr review
* Revert config/generation_config changes
* Ruff check
* chore: update Deformable_Detr model card
* fix: added pipeline, automodel examples and checkpoints link
* Update deformable_detr.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix MXFP4 quantizer validation to enable CPU dequantization
Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.
* Add tests for MXFP4 quantizer CPU dequantization validation
* fix: format mxfp4 test file with ruff
* fix
* nice
* where i am at
* Bro this works
* Update src/transformers/integrations/tensor_parallel.py
* cleanups
* yups that was breaking
* Update src/transformers/models/openai_moe/modeling_openai_moe.py
* gather on experts and not mlp
* add changes for latest convert branch
* adds options to get output_router_logits from config
* bring chat temlate + special tokens back into the script.
* initial commmit
* update
* working with shards
* add model.safetensors.index.json
* fix
* fix
* mxfp4 flag
* rm print
* Fix PAD/EOS/BOS (#18)
* fix pad/eos/bos
* base model maybe one day
* add some doc
* special tokens based on harmony.
* add in tokenizer config as well.
* prepare for rebase with main
* Fix for initialize_tensor_parallelism now returning 4-tuple
```
[rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```
* mxfp4
* mxfp4 draft
* fix
* fix import
* draft
* draft impl
* finally working !
* simplify
* add import
* working version
* consider blocks and scales
* device mesh fix
* initial commit
* add working dequant + quant logic
* update
* non nan, gibberish output
* working EP + quantization finally !
* start cleaning
* remove reversing process
* style
* some cleaning
* initial commmit
* more cleaning
* more cleaning
* simplify
* more cleaning
* rm duplicated function
* changing tp_plan
* update tp plan check
* add loading attribute
* dequantizing logic
* use subfunctions
* import cleaning
* update_param_name
* adds clamped swiglu
* add clamping to training path
* simplify dequant logic
* update
* Bad merge
* more simplifications & tests
* fix !
* fix registering custom attention
* fix order
* fixes
* some test nits
* nits
* nit
* fix
* Clamp sink logits
* Clean
* Soft-max trick
* Clean up
* p
* fix deepspeed
* update both modeling and modular for cleanup
* contiguous
* update tests
* fix top_k router call
* revert renaming
* test nits
* small fixes for EP
* fix path for our local tests
* update as I should not have broken that!
* fix the loss of mixtral
* revert part of the changes related to router_scores, kernel probably no ready for that!
* deleting a small nit
* update arch
* fix post processing
* update
* running version but not expected output
* moving to cuda
* initial commit
* revert
* erroring when loading on cpu
* updates
* del blocks, scales
* fix
* style
* rm comm
* comment
* add comment
* style
* remove duplicated lines
* Fix minor issue with weight_map conversion script
* fix sampling params
* rename to final name
* upate pre-final version of template
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* fix batched inference
* serve fixes
* swizzle !
* update final chat template by Matt.
* fix responses; pin oai
* sinplify
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix
* Use ROCm kernels from HUB
* Make kernel modes explicit
* update final chat template by Matt. x2
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Fix installation
* Update setup.py
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
* allow no content
* fix: update message handling in write_tokenizer function
* Fix template logic for user message role
* last nits for CB and flash_paged!
* there was one bad merge
* fix CB (hardcode for now, its just using kv groups instead)
* fix
* better fix for device_map
* minor device fix
* Fix flash paged
* updates
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* fix merge
* fix
* Fix line break when custom model indentity
* nits testing
* to locals first and pass sliding window to flash paged
* register modes for MegaBlocksMoeMlp
* add integration test in fixtures -> now update the tests to use it!
* update integration tests
* initial fix
* style and update tests
* fix
* chore(gpt oss): remove mlp_bias from configuration
It was just a leftover.
* stats
* Integration tests
* whoops
* Shouldn't move model
* Ensure assistant messages without thinking always go to "final" channel
* More checks to ensure expected format
* Add pad_token_id to model configuration in write_model function (#51)
* Add oai fix fast tests (#59)
* Fix some fast tests
* Force some updates
* Remove unnecessary fixes
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* reasoning -> Reasoning
* Add additional integration tests
* fixup
* Slight fixes
* align chat template with harmony
* simplify
* Add comment
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* Revert fixup
* skip 2 test remove todo
* merge
* padding side should be left for integration tests
* fix modular wrt to changes made to modeling
* style
* isort
* fix opies for the loss
* mmmm
---------
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* style?
When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment,
the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb."
This was confusing because wandb was actually installed, just disabled via the environment variable.
The fix detects this specific case and provides a clear, actionable error message explaining
the conflict and how to resolve it.
* Update model card for DETR
* fix: applied suggested changes
* fix: simplified pipeline and modified notes and resources
* Update detr.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* added code for handling video object ,as dictionary of frames and metadata, in chat template
* added new test where videos are passed as objects (dict of frames, metadata) in the chat template
* modified hardcoded video_len check that does not match with increased number of tests cases.
* Modify hardcoded video_len check that fails with increased number of tests
* update documentation of multi-modal chat templating with extra information about including video object in chat template.
* add array handling in load_video()
* temporary test video inlcuded
* skip testing smolvlm with videos that are list of frames
* update documentation & make fixup
* Address review comments
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models
* refactor: added copied from
* fix: make style
* fix: repo consistency
* fix: make style
* docs: added missing method in SuperGlue docs
* first commit
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.
TODO: Some tests are failing so that needs to be fixed.
* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model
* cleaned up a hack in the conversion script
* Fixed the expected values in integration tests
Cross att masking and cpu-gpu consistency tests are still failing however.
* changes for make style and quality
* add documentation
* clean up contrastive embedding
* add mm grounding dino to loss mapping
* add model link to config docstring
* hack fix for mm grounding dino consistency tests
* add special cases for unused config attr check
* add all models and update docs
* update model doc to the new style
* Use super_kwargs for modular config
* Move init to the _init_weights function
* Add copied from for tests
* fixup
* update typehints
* Fix-copies for tests
* fix-copies
* Fix init test
* fix snippets in docs
* fix consistency
* fix consistency
* update conversion script
* fix nits in readme and remove old comments from conversion script
* add license
* remove unused config args
* remove unnecessary if/else in model init
* fix quality
* Update references
* fix test
* fixup
---------
Co-authored-by: qubvel <qubvel@gmail.com>
* fix?
* fixme and style
* Update src/transformers/modeling_utils.py
* update
* update
* fix
* small fixees
* nit
* nits
* fix init check?
* fix
* fix default
* or fucks me
* nits
* include a small nit
* does this make it hapy?
* fixup
* fix the remaining ones
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025
* update and add modualr file
* update processors and check with orig impl later
* delete unused files
* image processor reduce LOC and re-use GotOCR2
* update the config to use modular
* model tests pass
* processor fixes
* check model outputs decorator
* address one more comment
* Update tokens. Temp - need to read from tokenizer'
* fix for multi-gpu
* Fix image token handling
* upadte image token expansion logic
* fix a few issues with remote code loading
* not related but modular forces us to change all files now
* Add overview and code sample to cohere vision docs
* add scripts. TMP.
* Update inference script
* Create script
* set dtype in export script
* TO revert: modular export fix
* Fix scripts
* Revert "TO revert: modular export fix"
This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.
* Use modular weights
* Upload to hub
Removed OOD weights ad script
* Updated docs
* fix import error
Update docs
Added pipeline test
* Updated docs
* Run modular script
remove modular for config
Added patch_size
Added docstrings in modular
Fix OOM
Add docs, fixup integration tests. 8-gpu passing
* tiny updates
* address comments + fixup
* add test for chat template
* check model outputs workaround
* aya vision fix check model inputs
* Revert "add test for chat template"
This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.
* reveert more changes
* last revert
* skip and merge
* faulty copy from
---------
Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
* feat(tokenization): add encode_message to tokenize messages one by one
* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.
* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.
* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.
* Docs fix
* Revert changes in docstring of pad()
* Revert changes in docstring
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.
* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.
---------
Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous
* test cache_position
* move test
* propagate changes
---------
Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
* Add callback to monitor progress in whisper transcription
* Added `` around variables, rewording
* Add example of `monitor_progress`.
---------
Co-authored-by: Eric B <ebezzam@gmail.com>
* docs: ko: main_classes/peft.md
* feat: nmt draft
* docs: add missing TOC to documentation for `PeftAdapterMixin` section
Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).
* fix: Improve naturalness of purpose expression in Korean
Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.
* fix: Simplify plural form and make expression more concise
Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.
* fix: Replace technical term '주입' with more natural '적용'
Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:
'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise
'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.
* fix: update toctree path for PEFT to lowercase
Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.
* docs: update as per reviewer feedback after rebase
* Add Fast Segformer Processor
* Modified the params according to segformer model
* modified test_image_processing_Segformer_fast args
- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class
* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast
* fixed code_quality
* added recommended fixes and tests to make sure everything processess smoothly
* Fixed SegmentationMapsLogic
- modified the preprocessing of segmentation maps to use tensors
- added batch support
* fixed some mismatched files
* modified the tolerance for tests
* use modular
* fix ci
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* feat: superpoint fast image processor
* fix: reran fast cli command to generate fast config
* feat: updated test cases
* fix: removed old model add
* fix: format fix
* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* fix: ported to torch and made requested changes
* fix: removed changes to init
* fix: init fix
* fix: init format fix
* fixed testcases and ported to torch
* fix: format fixes
* failed
test case fix
* fix superpoint fast
* fix docstring
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Add missing cache_position argument.
* Pass cache_position to language model.
* Overwrite prepare_inputs_for_generation.
* Set model to half precision for Flash Attention test.
* Cast model to bfloat16.
* add tests for helpers
* duplicate test for each model
* why llava next video has no helper
* oops must have been in the commit
* fix test after rebase
* add copy from
* support `typing.Literal` as type of tool parameters
* validate the `args` of `typing.Literal` roughly
* add test to get json schema for `typing.Literal` type hint
* fix: add `"type"` attribute to the parsed result of `typing.Literal`
* test: add argument `booleanish` to test multi-type literal
* style: auto fixup
* EP + updates
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* remove unrelated change
* not working yet but let's see where it goes!
* update the api a bit
* udpate
* where I am at for now
* fix ep
* refactor the API
* yups
* fix
* fixup
* clean modeling
* just support llama4 for now!
* properly avoid
* fix
* nits
* Update src/transformers/models/llama4/modeling_llama4.py
* Update src/transformers/integrations/tensor_parallel.py
* style
* ,,,,
* update
---------
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* upload initial code
* update deepseek-vl adaptor
* update hierarchy of vision model classes
* udpate aligner model
* add text model
* Added Image Processor
* Added Image Processor
* Added Image Processor
* apply masks
* remove projection; add aligner
* remove interpolate_pos_encoding
* remove unused params in config
* cleaning
* Add the __init__ file
* added processing deepseek_vl class
* modified the deepseek-vl processor
* modified the deepseek-vl processor
* update __init__
* Update the image processor class name
* Added Deepseek to src/transformers/__init__.py file
* Added Deepseek to image_processing_auto.py
* update the __init__ file
* update deepseek_vl image processor
* Update Deepseek Processor
* upload fast image processor
* Revert "upload fast image processor"
This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.
* update image processor
* flatten heirarchy
* remove DeepseekVLModel
* major update (complete modeling)
* auto modeling and other files
* formatting
* fix quality
* replace torchvision in modeling
* set default do_normalize to False
* add fast image processor template using tool
* update image processors
* add fast image processor to other files
* update liscense
* Added deepseek image testcases
* update image test
* update processor
* write CHAT_TEMPLATE
* update model for processor
* fix processor
* minor fixes and formatting
* fix image processing and tests
* fix interpolation in sam
* fix output_attentions in DeepseekVLModel
* upload test_modeling
* fix tests because of vocab size
* set use_high_res_vision=False in tests
* fix all modeling tests
* fix styling
* remove explicit background_color from image processors
* added test_processor
* added test_processor
* fix processor tests
* update docs
* update docs
* update docs
* update conversion script
* Fixed typos
* minor fixes from review
- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025
* fix type in config docstring
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* update get_image_features
* fix config
* improve DeepseekVLImageProcessor.preprocess
* return image_hidden_states
* use AutoTokenizer and AutoImageProcessor in Processor
* fix model outputs
* make num_image_tokens configurable
* fix docstring of processor
* move system prompt to chat template
* fix repo consistency
* fix return_dict
* replace SamVisionEncoder with SamVisionModel
* update to remove deepcopy
* 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid)
* fix quality checks
* add missing hybrid in auto modeling
* run make style
* update sam_hq
* update high_res_size in test
* update docs following #36979
* update code with auto_docstring
* update conversion scripts
* fix style
* fix failing test because of tuple
* set weights_only=True in conversion script
* use safetensors.torch.load_file instead of torch.load in conversion script
* make output_dir optional in conversion script
* fix code snippets in docs (now the examples work fine)
* integration tests for DeepseekVL
* update expected texts
* make style
* integration tests for DeepseekVLHybrid
* fix class name
* update expected texts for hybrid
* run "make style"
* update since changes in main
* run make-style
* nits since changes in main
* undo changes in sam
* fix tests
* fix tests; update with main
* update with main: output_attention/output_hidden_states
* fix copied part in deepseek_vl
* run fix-copies
* fix output_hidden_states
* sam: fix _init_weigths
* use modular for DeepseekVL
* make image processor more modular
* modular: use JanusPreTrainedModel
* janus: provide kwargs in loss
* update processors in conversion script
* Revert "sam: fix _init_weigths"
This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.
* run fix-copies
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* init
* Force qwen2VL image proc to fast
* refactor qwen2 vl fast
* fix copies
* Update after PR review and update tests to use return_tensors="pt"
* fix processor tests
* add BC for min pixels/max pixels
* fix most tests
* skip a few more tests
* address comments
* fix chameleon tests
* forgot to uncomment
* qwen has its own tests with images, rename it as well
* add owlv2 fast image processor
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* change references to owlVit to owlv2 in docstrings for post process methods
* change type hints from List, Dict, Tuple to list, dict, tuple
* remove unused typing imports
* add disable grouping argument to group images by shape
* run make quality and repo-consistency
* use modular
* fix auto_docstring
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* docs: Standardize OPT model card with enhanced details
* Remove incorrect link from OPT model card
* Address review feedback on OPT model card
* Update opt.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
* First attempt
* fix
* fix
* Enhance TrackioCallback to log GPU memory usage and allocation
* Enhance Trackio integration in callbacks and training arguments documentation
* re order
* remove unused lines
* fix torch optional
* use partial to wrap around `transformers` utils!
* try to refactor?
* revert one wrong change
* just a nit
* push
* reverter watever was wrong!
* some nits
* fixes when there is no attention mask
* bring the licence back
* some fixes
* nit
* style
* remove prints
* correct dtype
* fa flags for testing
* update
* use paged attention if requested!
* updates
* a clone was needed, not sure why
* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute
* simplify and improve?
* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else
* fix!
* protect kernels import
* update
* properly parse generation config being passed
* revert and update
* add two tests
* some fixes
* fix test FA2
* takes comment into account
* fixup
* revert changes
* revert the clone, it is only needed because the metal kernel is not doing it?
* [docs] update attention implementation and cache docs (#39547)
* update docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* applu suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix mps on our side for now
* Update src/transformers/integrations/flash_paged.py
* no qa
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification
* ruff fix
* refactor + add test for not supported model
* ruff
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* initial commit
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: various typos, typehints, refactors from suggestions
* fix: fine_matching method
* Added EfficientLoFTRModel and AutoModelForKeypointMatching class
* fix: got rid of compilation breaking instructions
* docs: added todo for plot
* fix: used correct hub repo
* docs: added comments
* fix: run modular
* doc: added PyTorch badge
* fix: model repo typo in config
* fix: make modular
* fix: removed mask values from outputs
* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor
* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list
* fix: reformat
* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride
* doc: added q, kv aggregation kernel size and stride doc to config
* refactor: converted efficientloftr implementation from modular to copied from mechanism
* tests: overwrote batching_equivalence for "keypoints" specific tests
* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils
* fix: make fix-copies
* fix: make style
* fix: update rope function to make meta tests pass
* fix: rename plot_keypoint_matching to visualize_output for clarity
* refactor: optimize image pair processing by removing redundant target size calculations
* feat: add EfficientLoFTRImageProcessor to image processor mapping
* refactor: removed logger and updated attention forward
* refactor: added auto_docstring and can_return_tuple decorators
* refactor: update type imports
* refactor: update type hints from List/Dict to list/dict for consistency
* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching
* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]
* fix typing
* fix some typing issues
* nit
* a few more typehint fixes
* Remove output_attentions and output_hidden_states from modeling code
* else -> elif to support efficientloftr
* nit
* tests: added EfficientLoFTR image processor tests
* refactor: reorder functions
* chore: update copyright year in EfficientLoFTR test file
* Use default rope
* Add docs
* Update visualization method
* fix doc order
* remove 2d rope test
* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py
* fix docs
* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py
* update gradient
* refactor: removed unused codepath
* Add motivation to keep postprocessing in modeling code
* refactor: removed unnecessary variable declarations
* docs: use load_image from image_utils
* refactor: moved stage in and out channels computation to configuration
* refactor: set an intermediate_size parameter to be more explicit
* refactor: removed all mentions of attention masks as they are not used
* refactor: moved position_embeddings to be computed once in the model instead of every layer
* refactor: removed unnecessary hidden expansion parameter from config
* refactor: removed completely hidden expansions
* refactor: removed position embeddings slice function
* tests: fixed broken tests because of previous commit
* fix is_grayscale typehint
* not refactoring
* not renaming
* move h/w to embeddings class
* Precompute embeddings in init
* fix: replaced cuda device in convert script to accelerate device
* fix: replaced stevenbucaille repo to zju-community
* Remove accelerator.device from conversion script
* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module
* fix: removed unused attributes in configuration
* fix: missing self
* fix: refactoring and tests
* fix: make style
---------
Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* improve handlike of other image-like inputs in fast image processors
* fix issues with _prepare_images_structure
* update sam image processor fast
* use dict update
* init
* copied from remote
* add proper structure and llama like structure
* fixup
* revert to state that works
* get closer to llama
* slow and steady
* some removal
* masks work
* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm
* nice
* getting closer
* closer to transformers style
* let's simplify this, batching works now
* simplified
* working version with modular
* it is indeed the rotation per weights, make it complete llama style
* cleanup conversion, next to look at -> tokenizer
* remove llama artefacts
* fix modeling tests (common ones)
* style
* integration test + first look into tokenization (will need more work, focussing on modeling other models first)
* style
* working moe version, based on remote
* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)
* more cleanup
* refactor namings and remove addition forXXX classes
* our moe won't cut it it seems, correction bias seems to be missing in remote code version
* tokenization change (remote)
* our moe version works when adding normalization :D
* cleanup moe
* nits
* cleanup modeling -> let's get to modular next
* style
* modular v1
* minor things + attempt at conversion (which doesn't work)
* no conversion follow glm, fixup modular and other nits
* modular cleanup
* fixes
* tests, tests, tests + some moe dtype forcing
* simplify modular, fix fatal fa2 bug, remaining tests
* fix import issue?
* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float
* fix sdpa test, load on init dtype only
* fixup post merge
* style
* fix doc links
* tokenization cleanup beginnings
* simplify tokenizer by a lot as its basically llama
* tokenizer is full llama with different defaults + extra special tokens
* sync og special tokens of ernie
* fix decoding with numbers (also in remote done what a timing), begin of tok tests
* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)
* nits
* docs
* my daily post merge it is
* check
* tokenization update with explanations and conversion script
* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)
* post merge fixes
* fixup tokenization, llama fast is the way to go
* more fixups
* check
* import fixes
* correction bias following the paddle code
* fix
* fix TP plan, fix correction bias sharding during forward
* style
* whoops
* fix tied weights
* docs and last nit
* license
* flasky tests
* move repo id, update when merged on the hub
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* revert again
* 🧠
* aaah ESM has two modelings aaah
* add informative but short comment
* add `input_embed_layer` mixin attribute
* style
* walrus has low precedence
* modular fix
* this was breaking parser
Enable average_tokens_across_devices by default in TrainingArguments
Fixes#39392
This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.
Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix qwen2 vl packing in FA2
* why? delete!
* qwen2-5-vl seems to work now
* update
* fix tests
* start by adapting FA2 tests
* add similar tests for sdpa/eager
* address comments
* why is this even in conditional model and not base model?
* fix type order
* change all Union[str, dict] to Union[dict, str]
* add hf_parser test && fix test order
* add deepspeed dependency
* replace deepspeed with accelerator
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
* experiment: base model from typed dict
* audio working
* fix bad rebase
* load audio with librosa
* implement timed models
* almost working
* make fixup
* fix tests
* transcription request type
* tokenizer -> processor
* add example in docs
---------
Co-authored-by: Lysandre <hi@lysand.re>
* Add the `device` option for `generate()`
* Add device for default tensors to avoid tensor mismatch
* [test] Enable test_static_cache_exportability for torch_device
* infer device from the prompt_token_ids
* Add device for generated tensor
* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU
* fix format
* infer device from the model
* wip: adding first version of the IJEPA model card.
* refactor based on the @stevhliu feedbacks
* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture
- general context updation.
* - changes of model for example quantization.
- merging the quantization content.
Fix indentation bug in Idefics3 image processor
- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors
cc @yonigozlan
Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
* Fix typo in generation configuration for Janus model weight conversion
* Fix typo
* Update Janus model generation configuration
* Update Janus model to use generation_kwargs
* dump
* push other models
* fix simple greedy generation
* xmod
* add fmst and clean up some mentions of old cache format
* gpt-bigcode now follows standards
* delete tuple cache reference in generation
* fix some models
* fix some models
* fix mambas and support cache in tapas
* fix some more tests
* fix copies
* delete `_reorder_cache`
* another fix copies
* fix typos and delete unnecessary test
* fix rag generate, needs special cache reordering
* fix tapas and superglue
* reformer create special cache
* recurrent gemma `reorder_cache` was a no-op, delete
* fix-copies
* fix blio and musicgen pipeline tests
* fix reformer
* fix reformer, again...
* delete `_supports_cache_class`
* delete `supports_quantized_cache`
* fix failing tests
* fix copies
* some minor clean up
* style
* style
* fix copies
* fix tests
* fix copies
* create causal mask now needs positions?
* fixc copies
* style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* clean-up of non-generative model after merging main
* check `is_decoder` for cache
* delete transpose for scores
* remove tuple cache from docs everywhere
* fix tests
* fix copies
* fix copies once more
* properly deprecate `encoder_attention_mask` in Bert-like models
* import `deprecate_kwarg` where needed
* fix copies again
* fix copies
* delete `nex_decoder_cache`
* fix copies asks to update for PLM
* fix copies
* rebasing had a few new models, fix them and merge asap!
* fix copies once more
* fix slow tests
* fix tests and updare PLM checkpoint
* add read token and revert accidentally removed line
* oh com -on, style
* just skip it, read token has no access to PLM yet
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.
* Fixed issue with
* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers
---------
Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
* add test scanner
* add doc + license
* refactor for only 1 tree traversal
* add back test of only one method
* document single method scan
* format
* fixup generate tests
* minor fix
* fixup
* fixup doc
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer
* Update src/transformers/optimization.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update optimization.py
fix the error of the unclosed "("
* Update optimization.py
remove whitespace in line 402 in order to pass the quality test
* Update src/transformers/optimization.py
* Update src/transformers/optimization.py
* Apply style fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM
In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix vlm with retrieval
* we can't use AutoModel because new ColQwen was released after refactor
* no need for colqwen
* tied weight keys are necessary, if using IMageTextToText
* need to apply renaming in tied weights, only for ColPali
* overwrite tied keys in ColPali
* fix copies, modular can't handle if-statements
* working locally; need to style and test
* added docs and initial tests; need to debug and flesh out
* fixed tests
* working long context; batches
* working fa2 and eager
* update tests
* add missing confnigs
* remove default autoset
* fix spacing
* fix most tests
* fixed tests
* fix to init
* refactor to match new transformers updates
* remove static cache option
* fa2 fix
* fix docs
* in progress
* working on tests
* fixed issue with attn outputs
* remove debug
* fix local config attr
* update doc string
* fix docstring
* add docs to toc
* correct typo in toc
* add new updates from main w.r.t. ModernBERT RoPE
* fix local param
---------
Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
* Update modeling_qwen2_5_vl.py
### 🐛 Bug Description
When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42cef21662c304583368ff6645116a45), the model crashes due to a type mismatch in the attention mask handling.
---
### 🔥 Error Traceback
* Fix dtype compatibility in attention mask processing
Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:
Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* # Fix: Use appropriate function based on dtype
* Update modular_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* Fix: Use appropriate function based on dtype
* Fix: Use appropriate function based on dtype
* Updatet modeling_glm4v.py
* Only apply conversion for floating point tensors (inverted masks)
* corrected the format issue
reformatted modeling_glm4v.py
All done! ✨🍰✨
1 file reformatted
* Fix: Cast to float before applying torch.finfo
Corrected the format issue
* Fix torch.finfo() for integer attention mask
#39333
* Run make fix-copies and make style for CI compliance
- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC
* Fix torch.finfo() TypeError for
Fix torch.finfo() TypeError for integer attention_mask_tensor #39333
* Fix torch.finfo() TypeError for integer
* Updated CamemBERT model card to new standardized format
* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution
* Updated CamemBERT usage examples, quantization, badges, and format
* Updated CamemBERT badges
* Fixed CLI Section
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`
More verbose exceptions in `fix_docstring` on docstring formatting issues.
* plm template
* A working plm with fixed image features
* hacked processor
* First version that reproduced PLM output using PE from timm.
* Simplify and fix tie_word_embeddings
* Use PIL resize. Simplify converstion.
* First version that works with video input.
* simplifed image preprocessing (not batched)
* Minor fixes after rebasing on main.
* Video processor based on new API.
* Revert to use _preprocess for image processor.
* refactor with modular
* fix tie_word_embedding
* Testing with timm PE
* check in missed converstion from modular to model.py
* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.
* address review comments
* Fixed batching if video and image examples mixed.
* Simplify PE configuration.
* Enable AutoModel for PerceptionEncoder.
* Update PE config style.
* update all headers
* Minor fixes.
* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.
* Fix for testing_modeling_perception_lm.py
* Image processing refactoring to use more common parts.
* Fix processor test.
* update tests to use model from hub
* More test fixes.
* integration test GT update after rebasing; probably due to video preprocessing
* update test media path to hub
* Stop tracking local scripts
* address some review comments
* refactor image processing.
* small fixes
* update documentation and minor fixes
* remove scripts
* Minor fix for CI
* Fix image processing
* CI and doc fix
* CI formatting fix
* ruff fix
* ruff formatting
* ran utils/sort_auto_mappings.py
* update docstring
* more docstring udpates
* add vision_input_type default fallback for image processing
* more verbose variable naming
* test update
* Remove PE and PEConfig use AutoModel(TimmWrapper) instead
* Minor cleanup.
* Minor Fix: remove any ref to PE. Ruff format and check.
* fix docstring
* Fix modular/model consistency.Improvex docstringfor .
* Fix PerceptionLMForConditionalGenerationModelTest
* ruff fix
* fix for check_repo
* minor formatting
* dummy size arg to fix for processor test.
* Update docstring for PerceptionLMConfig
* Minor fixes from review feedback.
* Revert some minor changes per reviewer feedback.
* update base_model_prefix
* address reviewer feedback
* fix comment in modeling file
* address reviewer feedback
* ruff format
* Pre-merge test update.
* reapply modular and fix checkpoint name
* processor test path
* use modular a bit more
* remove dead code
* add token decorator
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Updated Switch Transformers model card with standardized format (Issue #36979)
* Apply reviewer suggestions to the new standardised Switch Transformer's model card
* Update switch_transformers.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes for video
* update modular
* change get_video_features
* update video token replacement
* update modular
* add test and fix typo
* lint
* fix order
* lint
* fix
* remove dependency
* lint
* lint
* remove todo
* resize video for test
* lint..
* fix test
* new a processor for video_test
* fix test
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* ensure the query is updated during training
avoid unused parameters that DDP does not like
* avoid a crash when `kwargs` contain `padding=True`
trainers often pass this argument automatically
* minor
* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)
* minor - most feature extractors has a `sampling_rate` property
* speedup relative position embeddings
* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.
* minor
* minor
* minor
* fixing a crash without peft active
* add todo to replace einsum
* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.
* support audio.shape=(1,L)
* add initial structure
* doc fixes, add model base logic
* update init files
* some fixes to config and modular
* some improvements for attention
* format
* remove unused attn
* some fixes for moe layer and for decoder
* adapt _compute_yarn_parameters for deepseek
* format
* small fix
* fix for decoder forward
* add tests, small refactoring
* fix dummies
* fix init
* fix doc
* fix config docs
* add sequce doc, fix init for gate
* fix issues in tests
* fix config doc
* remove unused args
* some fixes and refactoring after review
* fix doc for config
* small fixes for config args
* revert config refactoring
* small refactoring
* minor fixes after rebase
* small fix after merge
* fix modular
* remove rotaryembd from public init
* small test fix
* some rotary pos calculation improvement
* fix format
* some improvements and fixes
* fix config
* some refactoring
* adjust some unit tests
* skip test
* small fixes and tests adjustment
* reapply modular
* fix all tests except Integration
* fix integration testzs
* cleanup BC stuff
* rope
* fix integrations tests based on a10
* style
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Add Doge Model
* Fix code quality
* Rollback an error commit
* Fix config for open-source weights
* Revert "Fix config for open-source weights"
This reverts commit 229cdcac10a6a4274d1dd13b729bc14c98eb0c76.
* Add modular_doge
* Update Doge inherits from Llama
* Fix import bug
* [docs] Add usage of doge model
* Fix Doge import pretrainedconfig from modeling_utils to configuration_utils
* [docs] remove trust remote code from doge
* Fix dynamo bug in doge model
* Update docstrings
* Import apply_rotary_pos_emb and repeat_kv from Llama
* Fix all nits
* Fix code quality
* Fix some bugs
* Fix code quality
* Remove inherited `_update_causal_mask` from Llama
This leads to incorrect weight initialization.
* Fix the wrong tensor orderings in DogeCDMoE
* Fix attention mask bug
We have to provide attention_mask for dynamic mask computation
* Modify most implementations to inherit from Llama
But there are two problems:
1. `flex_attention_forward` is not updated properly
2. `Example` error in the forward method of DogeForCausalLM
* Modify CDMoE for batch efficient implementation
* Uniform MoE configuration names, just like QwenMoE
* Fix code quality
* Fix code quality
* Fix code quality
* Add tp plan of CDMoE Module
* Hybird DMA with sliding window
* Update valid tokens greater than window size
* Fix code quality
* Add `convert_doge_weights_to_hf`
* Fix STATE_DICT_MAPPING in convert_doge_weights_to_hf.py
* Fix nits in modular_doge
* Fix code quality
* Fix all nits
* Fix all nits
* Make sure the attention function is updated inside the class
* Fix code quality issues in the Doge model and add a test for it
* Fix `test_generate`
* Fix code quality
* Fix nits fllowing suggestions
* Fix code quality
* Fix code quality issues
* Fix nits
* Fix code quality nits
* Fix the missing parameters in the configuration.
* Fix the missing parameters in the configuration.
* Fix nits
* Add initialization of attention
* Fix last nits
* Simplify dynamic mask generation logic
* Rename router_logits to gate_logits for matching latest changes of MixtralModel
* Rename typings for matching latest changes of MixtralModel
* Fixes typo in comment
* Update src/transformers/models/doge/modular_doge.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix code quality issues to match other modular
* Fix code quality issues to match other modular
* Fix the static compilation errors
* Update model weights link
* Fix code quality issues to match other modular
* reapply modular and support for new outputs
* style
* simplify a lot
* fix import location
* reapply modular
* fix
* fix integration test
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Fix errors when use verl to train GLM4.1v model
* Support glm4v load from AutoModelForVision2Seq
* Set glm4v model _checkpoint_conversion_mapping attr from None to {}
* Update modeling_auto.py
* fix(decoding): stop beam search per-instance when heuristic satisfied
Previously, when early_stopping is set to `False`, the early-stopping heuristic only halted generation when **all** batch instances reached the criterion. This caused instances that are impossible (suggested by the heuristic) to improve keep generating, leading to inconsistent and overlong outputs across the batch.
Now we apply the heuristic **per-instance**: once a certain instance of batch has its all beams impossibe to improve, we mark that instance finished while letting others continue. This restores expected behavior and ensures consistency in batched generation.
* Add test case GenerationIntegrationTests.test_beam_search_early_stop_heuristic
* Update naming improvement_possibility -> is_early_stop_heuristic_unsatisfied
* Add comments for early stop heuristic
* Update src/transformers/generation/utils.py
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
- Complete Apache License text in Italian documentation
- Remove duplicate variable assignment in Perceiver converter
- Fix typo in MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES constant
* chameleon xpu bnb groundtruth update on bnb triton backend since we are
deprecating ipex backend
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* enable hqq uts on XPU, all passed
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* fix comment
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
* update the glm4 model readme
* update test
* update GLM-4.1V model
* update as format
* update
* fix some tests
* fix the rest
* fix on a10, not t4
* nit: dummy import
---------
Co-authored-by: raushan <raushan@huggingface.co>
* [video processors] Support float fps for precise frame sampling
Enable fractional fps values (e.g., 1.5, 29.97) in video processors
for more precise frame sampling control.
- Change fps type from int to float across all video processors
- Maintain backward compatibility with integer values
Extends: #38105
* [video processors] Refine fps typing to Union[int, float]
Change fps type from Optional[float] to Optional[Union[int, float]]
for more explicit type information about supporting both integer
and floating-point frame rates.
- Update type hints and docstrings across 8 files
- Maintain backward compatibility
- Clarify support for both int and float values
Extends: #38105
* Revert "[video processors] Support float fps for precise frame sampling"
This reverts commit 7360d6e661b413ca0239e5ef61f9b1abbeab8e65.
* just update 2 files
* update other models as well just making fix-copies
* also add the changes needed to modeling utils
* put this on the pretrained model instead
* nits and fixes
* update generic, fix to use config value
* update other modelings
* use transformers kwargs instead
* update
* update
* update other models
* update
* updates
* update
* update
* update
* fix
* finally
* very small nits
* this fixes more tests
* fix other models as well!
* update modularqwen2
* update models based on qwen2
* update
* update
* remove the **flash stuff in favor of noraml kwargs
* update
* propagate gemma?
* remove output attentions
* propagate
* support cross attention edge case
* same
* test this
* fixes
* more fix
* update
* update
* fix conflicts
* update
* fix emu3
* fix emu3
* move the fix a bit
* quel enfer
* some fixes, loss_kwargs should never had been
* finish fixing gemma3n
* fix small lm3
* fix another one
* fix csm now
* fux csm and mistral
* fix mistral now
* small fixes
* fix janusss
* only for some models
* fixup
* phix phi3
* more fixes?
* dose this fix it?
* update
* holy shit it was just graph breaks
* protect torch
* updates
* fix samhq?
* fix moonshine
* more moonshine fixes, 3 failures left!
* nits
* generic needs to support more
* more fixes to moonshine!
* fix cross attention outputs!
* fix csm!
* nits
* fix stupid kosmos2
* current updates
* fixes
* use output recorder?
* nicer!
* a little bit of magic
* update
* fix protect
* fix
* small fixes
* protect import
* fix a bunch of more models
* fix fixups
* fix some of the last ones
* nit
* partly fix phi
* update
* fix import path
* make something that is fullgraph compatible just to be sure
* typing was wrong on llama so the rest was wrong as well
* fucking ugly but at least it is still exportable
* syle
* supposed to fix moonshine, it still breaks
* fix some default
* fix the last bits of sam
* update samhq
* more fixes to am hq
* nit
* fix all output+hidden states and output_attentions!
* fix?
* fix diffllama
* updates to fix initialization on the sam pips
* ups there was a bug
* fix the last sam hq test
* fix gotocr
* fix gotocr2!
* fixes
* skip stupid tests
* there was one left :)
* fixup
* fix fix copies issues with this test file
* fix copies for sam_hq
* rm some comments
* skip 2 more failing tests
* fix
* fix everything
* Apply suggestions from code review
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* add more doc!
* fix public init
* fix modular qwen3
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* more torch.hpu patches
* increase top_k because it results in flaky behavior when Tempreture, TopP and TopK are used together, which ends up killing beams early.
* remove temporal fix
* fix scatter operation when input and src are the same
* trigger
* fix and reduce
* skip finding batch size as it makes the hpu go loco
* fix fsdp (yay all are passing)
* fix checking equal nan values
* style
* remove models list
* order
* rename to cuda_extensions
* Update src/transformers/trainer.py
* Expectations for llava_next_video
* Updated image src in aria
* Fix test_small_model_integration_test
* Fix small model integration llama
* Fix a bunch of tests
* Style
* Shortened generation in test from 900 to 90
* Fix index out of bounds exception on wrong kv reuse
* Prevent loading same model twice
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Fixed some devices errors
* Fixed other device issues and more expectations
* Reverted support flags
* style
* More granular support
* Fixed some rebase stuff
* add a not None check before .to
* fix FA2
* update is causal flag and remove mask for FA2
* update for FA2 with varlen path
* how the tests were passing with different devices?
* add comment and ref to the PR
* move mask preparation to base pretrained model
* seq len is the first dim, not second
* fix copies to fix GLM4V
* deprecate for 1 version
* style
* fix some tests
* fix esm
* skip for now, GC requires positional args but we have keyword args
* remove transpose for scores in modified models only
* skip fx trace tests
* remove the skips
* fix the epsilon to a small value (does not make sense otherwise)
* safeguard
* overload test_eager_matches_sdpa
* Update test_modeling_common.py
* skip appropriate tests
* correct no_split_layer
* fix all devices issue
* fix backward
* fix
# copilot-instructions.md Guide for Hugging Face Transformers
This copilot-instructions.md file provides guidance for code agents working with this codebase.
## Core Project Structure
-`/src/transformers`: This contains the core source code for the library
-`/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
-`/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
-`/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
-`/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.
RUN_SLOW:yes# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
name:Self-hosted runner scale set (AMD mi300 scheduled CI caller)
name:Self-hosted runner scale set (AMD mi325 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
name:Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows:["Self-hosted runner (AMD scheduled CI caller)"]
Like the slow tests, there are other environment variables available which are not enabled by default during testing:
- `RUN_CUSTOM_TOKENIZERS`: Enables tests for custom tokenizers.
More environment variables and additional information can be found in the [testing_utils.py](https://github.com/huggingface/transformers/blob/main/src/transformers/testing_utils.py).
@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
- The [example scripts]((https://github.com/huggingface/transformers/tree/main/examples)) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
## 100 projects using Transformers
@ -280,8 +280,8 @@ Expand each modality below to see a few example models for various use cases.
- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
- Keypoint detection with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue)
- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)
@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re
models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
@ -6,7 +6,7 @@ developers, researchers, students, professors, engineers, and anyone else to bui
In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [gpt4all](https://github.com/nomic-ai/gpt4all)
@ -49,7 +49,7 @@ Keywords: LLMs, Large Language Models, Agents, Chains
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
@ -257,7 +257,7 @@ Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusi
Keywords: Text-to-3D, Stable Diffusion
## [txtai](https://github.com/neuml/txtai)
[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
Keywords: Semantic search, LLM
@ -309,8 +309,8 @@ Keywords: OCR, LaTeX, Math formula
OpenCLIP is an open source implementation of OpenAI's CLIP.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
@ -596,7 +596,7 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active
## [BentoML](https://github.com/bentoml/BentoML)
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
Keywords: BentoML, Framework, Deployment, AI Applications
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash
# Include only specific benchmarks
python run_benchmarks.py --include llama
# Exclude specific benchmarks
python run_benchmarks.py --exclude old_benchmark
## Output Format
Results are saved as JSON files with the following structure:
```json
{
"model_name": "llama_2_7b",
"benchmark_scenarios": [
{
"scenario_name": "eager_variant",
"metadata": {
"timestamp": "2025-01-XX...",
"commit_id": "abc123...",
"hardware_info": {
"gpu_name": "NVIDIA A100",
"gpu_memory_total": 40960,
"cpu_count": 64
},
"config": {
"variant": "eager",
"warmup_iterations": 3,
"measurement_iterations": 5
}
},
"measurements": {
"latency": {
"mean": 2.45,
"median": 2.43,
"std": 0.12,
"min": 2.31,
"max": 2.67,
"p95": 2.61,
"p99": 2.65
},
"time_to_first_token": {
"mean": 0.15,
"std": 0.02
},
"tokens_per_second": {
"mean": 87.3,
"unit": "tokens/sec"
}
},
"gpu_metrics": {
"gpu_utilization_mean": 85.2,
"gpu_memory_used_mean": 12450
}
}
]
}
```
### Debug Mode
```bash
python run_benchmarks.py --log-level DEBUG
```
## Contributing
To add new benchmarks:
1. Create a new file in `benches/`
2. Implement the `ModelBenchmark` interface
3. Add a runner function (`run_<benchmark_name>` or `run_benchmark`)
@ -20,22 +20,21 @@ To generate the documentation, you first have to build it. Several packages are
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
Then you need to install our special tool that builds the documentation:
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml`& restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
> [!NOTE]
> The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
## Adding a new element to the navigation bar
@ -164,6 +159,9 @@ These classes should be added using our Markdown syntax. Usually as follows:
[[autodoc]] XXXConfig
```
> [!IMPORTANT]
> Always add a blank line after `[[autodoc]]` to ensure it passes the CI/CD checks.
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
1. Start with the `_toctree.yml` file that corresponds to your documentation chapter. This file is essential for rendering the table of contents on the website.
- If the `_toctree.yml` file doesn’t exist for your language, create one by copying the English version and removing unrelated sections.
- If the `_toctree.yml` file doesn't exist for your language, create one by copying the English version and removing unrelated sections.
- Ensure it is placed in the `docs/source/LANG-ID/` directory.
Here’s an example structure for the `_toctree.yml` file:
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "torch_dtype" كما فعلنا أعلاه.
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "dtype" كما فعلنا أعلاه.
ومن الممكن أيضًا النزول إلى أقل من 16 بت باستخدام "التكميم"، وهي طريقة لضغط أوزان النموذج بطريقة تفقد بعض المعلومات. يسمح هذا بضغط كل معلمة إلى 8 بتات أو 4 بتات أو حتى أقل. لاحظ أنه، خاصة في 4 بتات، قد تتأثر جودة ناتج النموذج سلبًا، ولكن غالبًا ما يكون هذا مقايضة تستحق القيام بها لتناسب نموذج محادثة أكبر وأكثر قدرة في الذاكرة. دعنا كيف يمكننا تطبيق ذلك باستخدام مكتبة `bitsandbytes`:
في هذا الدليل، سنستعرض التقنيات الفعالة لتُحسِّن من كفاءة نشر نماذج اللغة الكبيرة:
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization.md).
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization).
2.**اFlash Attention:** إن Flash Attention وهي نسخة مُعدَّلة من خوارزمية الانتباه التي لا توفر فقط نهجًا أكثر كفاءة في استخدام الذاكرة، ولكنها تحقق أيضًا كفاءة متزايدة بسبب الاستخدام الأمثل لذاكرة GPU.
3.**الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)]((https://huggingface.co/papers/2305.13245)).
3.**الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)](https://huggingface.co/papers/2305.13245).
على مدار هذا الدليل، سنقدم تحليلًا للتوليد التنبؤي التلقائي من منظور المُوتِّرات. نتعمق في مزايا وعيوب استخدام دقة أقل، ونقدم استكشافًا شاملاً لخوارزميات الانتباه الأحدث، ونناقش بنيات نماذج نماذج اللغة الكبيرة المحسنة. سندعم الشرح بأمثلة عملية تُبرِز كل تحسين على حدة.
@ -73,7 +73,7 @@ model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="aut
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", torch_dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
> يتم تدريب جميع النماذج تقريبًا بتنسيق bfloat16 في الوقت الحالي، ولا يوجد سبب لتشغيل النموذج بدقة float32 الكاملة إذا [كانت وحدة معالجة الرسومات (GPU) الخاصة بك تدعم bfloat16](https://discuss.pytorch.org/t/bfloat16-native-support/117155/5). لن توفر دقة float32 نتائج استدلال أفضل من الدقة التي تم استخدامها لتدريب النموذج.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"torch_dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., torch_dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
دعونا نحدد وظيفة `flush(...)` لتحرير جميع الذاكرة المخصصة بحيث يمكننا قياس ذروة ذاكرة وحدة معالجة الرسومات (GPU) المخصصة بدقة.
قبل مشاركة نموذج على Hub، ستحتاج إلى بيانات اعتماد حساب Hugging Face الخاصة بك. إذا كنت تستخدم منصة الأوامر، فقم بتشغيل الأمر التالي في بيئة افتراضية حيث تم تثبيت 🤗 Transformers. سيقوم هذا الأمر بتخزين رمز الدخول الخاص بك في مجلد تخزين المؤقت لـ Hugging Face (`~/.cache/` بشكل افتراضي):
```bash
huggingface-cli login
hf auth login
```
إذا كنت تستخدم دفتر ملاحظات مثل Jupyter أو Colaboratory، فتأكد من تثبيت مكتبة [`huggingface_hub`](https://huggingface.co/docs/hub/adding-a-library). تسمح لك هذه المكتبة بالتفاعل برمجيًا مع Hub.
| [كيفية ضبط نموذج بدقة على التلخيص](https://github.com/huggingface/notebooks/blob/main/examples/summarization.ipynb)| يوضح كيفية معالجة البيانات مسبقًا وضبط نموذج مُدرَّب مسبقًا بدقة على XSUM. | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)|
| [كيفية تدريب نموذج لغة من البداية](https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| تسليط الضوء على جميع الخطوات لتدريب نموذج Transformer بشكل فعال على بيانات مخصصة | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)|
| [كيفية إنشاء نص](https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| كيفية استخدام أساليب فك التشفير المختلفة لإنشاء اللغة باستخدام المحولات | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)|
| [كيفية إنشاء نص (مع قيود)](https://github.com/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| كيفية توجيه إنشاء اللغة باستخدام القيود التي يوفرها المستخدم | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)|
| [Reformer](https://github.com/huggingface/blog/blob/main/notebooks/03_reformer.ipynb)| كيف يدفع Reformer حدود النمذجة اللغوية | [](https://colab.research.google.com/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [](https://studiolab.sagemaker.aws/import/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)|
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `torch_dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
بدلاً من ذلك، يمكنك تعيين `device_map="auto"` لتحديد كيفية تحميل مخزنات النموذج وتخزينها تلقائيًا. يتطلب استخدام معامل `device_map` مكتبه 🤗 [Accelerate](https://huggingface.co/docs/accelerate):
@ -56,7 +56,7 @@ Dateien lassen sich auch in einem Repository leicht bearbeiten, und Sie können
Bevor Sie ein Modell für den Hub freigeben, benötigen Sie Ihre Hugging Face-Anmeldedaten. Wenn Sie Zugang zu einem Terminal haben, führen Sie den folgenden Befehl in der virtuellen Umgebung aus, in der 🤗 Transformers installiert ist. Dadurch werden Ihre Zugangsdaten in Ihrem Hugging Face-Cache-Ordner (standardmäßig `~/.cache/`) gespeichert:
```bash
huggingface-cli login
hf auth login
```
Wenn Sie ein Notebook wie Jupyter oder Colaboratory verwenden, stellen Sie sicher, dass Sie die [`huggingface_hub`](https://huggingface.co/docs/hub/adding-a-library) Bibliothek installiert haben. Diese Bibliothek ermöglicht Ihnen die programmatische Interaktion mit dem Hub.
Alle Skripte können Ihr endgültiges Modell in den [Model Hub](https://huggingface.co/models) hochladen. Stellen Sie sicher, dass Sie bei Hugging Face angemeldet sind, bevor Sie beginnen:
```bash
huggingface-cli login
hf auth login
```
Dann fügen Sie dem Skript das Argument `push_to_hub` hinzu. Mit diesem Argument wird ein Repository mit Ihrem Hugging Face-Benutzernamen und dem in `output_dir` angegebenen Ordnernamen erstellt.
You can also control the order of Intel XPUs with:
```bash
@ -120,7 +118,5 @@ For more information about device enumeration and sorting on Intel XPU, please r
</hfoption>
</hfoptions>
> [!WARNING]
> Environment variables can be exported instead of being added to the command line. This is not recommended because it can be confusing if you forget how the environment variable was set up and you end up using the wrong accelerators. Instead, it is common practice to set the environment variable for a specific training run on the same command line.
@ -128,7 +127,7 @@ It's faster to upload your pipeline code to the Hub because it doesn't require a
Add your pipeline code to the Hub in a Python file.
For example, a custom pipeline for sentence pair classification might look like the following code below. The implementation works for PyTorch and TensorFlow models.
For example, a custom pipeline for sentence pair classification might look like the following code below.
```py
importnumpyasnp
@ -168,13 +167,12 @@ Save the code in a file named `pair_classification.py`, and import and register
@ -187,9 +185,6 @@ The [register_pipeline](https://github.com/huggingface/transformers/blob/9feae5f
"pt":[
"AutoModelForSequenceClassification"
],
"tf":[
"TFAutoModelForSequenceClassification"
],
}
},
```
@ -219,11 +214,11 @@ Add your pipeline code as a new module to the [pipelines](https://github.com/hug
Next, add a new test for the pipeline in [transformers/tests/pipelines](https://github.com/huggingface/transformers/tree/main/tests/pipelines). You can look at the other tests for examples of how to test your pipeline.
The [run_pipeline_test](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L186) function should be very generic and run on the models defined in [model_mapping](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L48) and [tf_model_mapping](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L49). This is important for testing future compatibility with new models.
The [run_pipeline_test](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L186) function should be very generic and run on the models defined in [model_mapping](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L48). This is important for testing future compatibility with new models.
You'll also notice `ANY` is used throughout the [run_pipeline_test](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L186) function. The models are random, so you can't check the actual values. Using `ANY` allows the test to match the output of the pipeline type instead.
Finally, you should also implement the following 4 tests.
1. [test_small_model_pt](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L59) and [test_small_model_tf](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L150), use a small model for these pipelines to make sure they return the correct outputs. The results don't have to make sense. Each pipeline should return the same result.
1. [test_large_model_pt](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_zero_shot_image_classification.py#L187) nad [test_large_model_tf](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_zero_shot_image_classification.py#L220), use a realistic model for these pipelines to make sure they return meaningful results. These tests are slow and should be marked as slow.
1. [test_small_model_pt](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_text_classification.py#L59), use a small model for these pipelines to make sure they return the correct outputs. The results don't have to make sense. Each pipeline should return the same result.
1. [test_large_model_pt](https://github.com/huggingface/transformers/blob/db70426854fe7850f2c5834d633aff637f14772e/tests/pipelines/test_pipelines_zero_shot_image_classification.py#L187), use a realistic model for these pipelines to make sure they return meaningful results. These tests are slow and should be marked as slow.
@ -14,5 +14,9 @@ rendered properly in your Markdown viewer.
-->
# Agents
(deprecated)
> [!WARNING]
> Agents and tools were spun out into the standalone [smolagents](https://huggingface.co/docs/smolagents/index) library. They were removed from `transformers` in v4.52.
and it will stop printing the statements, as it now uses the `sdpa` attention.
This allows to quickly change an attention function, without needing to reload the model!
## Different attention per backbone in multimodal models
For multimodal models different attention functions may work better for each backbone module. For example, some vision backbones perform better in fp32, but are incompatible with FlashAttention. To continue using FlashAttention while keeping the vision encoder in fp32, create a dict and map each config to an attention implementation as shown below.
## What about new args needed in my custom attention function?
But indeed, what if the new function requires a new arg to be properly used? It's no issue! Models supporting the
@ -165,4 +193,4 @@ def custom_attention_mask(
It mostly works thanks to the `mask_function`, which is a `Callable` in the form of [torch's mask_mod functions](https://pytorch.org/blog/flexattention/), taking 4 indices as input and returning a boolean to indicate if this position should take part in the attention computation.
If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
If you cannot use the `mask_function` to create your mask for some reason, you can try to work around it by doing something similar to our [torch export workaround](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py).
custom_parameter (`type`, *optional*, defaults to `default_value`):
A concise description for custom_parameter if not defined or overriding the description in `args_doc.py`.
A concise description for custom_parameter if not defined or overriding the description in `auto_docstring.py`.
internal_helper_arg (`type`, *optional*, defaults to `default_value`):
A concise description for internal_helper_arg if not defined or overriding the description in `args_doc.py`.
A concise description for internal_helper_arg if not defined or overriding the description in `auto_docstring.py`.
"""
# ...
```
You should also use the `@auto_docstring` decorator for classes that inherit from [`~utils.ModelOutput`].
```python
@dataclass
@auto_docstring(
custom_intro="""
Custom model outputs with additional fields.
"""
)
classMyModelOutput(ImageClassifierOutput):
r"""
loss (`torch.FloatTensor`, *optional*):
The loss of the model.
custom_field (`torch.FloatTensor` of shape `(batch_size, hidden_size)`, *optional*):
A custom output field specific to this model.
"""
# Standard fields like hidden_states, logits, attentions etc. can be automatically documented if the description is the same as the standard arguments.
# However, given that the loss docstring is often different per model, you should document it in the docstring above.
# Custom fields need to be documented in the docstring above
custom_field:Optional[torch.FloatTensor]=None
```
</hfoption>
<hfoptionid="functions">
@ -118,7 +145,6 @@ Arguments can also be passed directly to `@auto_docstring` for more control. Use
The `Returns` and `Examples` parts of the docstring can also be manually specified.
```python
MODEL_COMMON_CUSTOM_ARGS=r"""
common_arg_1 (`torch.Tensor`, *optional*, defaults to `default_value`):
@ -171,11 +197,10 @@ class MyModel(PreTrainedModel):
There are some rules for documenting different types of arguments and they're listed below.
- Standard arguments (`input_ids`, `attention_mask`, `pixel_values`, etc.) are defined and retrieved from `args_doc.py`. It is the single source of truth for standard arguments and should not be redefined locally if an argument's description and shape is the same as an argument in `args_doc.py`.
- Standard arguments (`input_ids`, `attention_mask`, `pixel_values`, etc.) are defined and retrieved from `auto_docstring.py`. It is the single source of truth for standard arguments and should not be redefined locally if an argument's description and shape is the same as an argument in `auto_docstring.py`.
If a standard argument behaves differently in your model, then you can override it locally in a `r""" """` block. This local definition has a higher priority. For example, the `labels` argument is often customized per model and typically requires overriding.
- New or custom arguments should be documented within an `r""" """` block after the signature if it is a function or in the `__init__` method's docstring if it is a class.
```py
@ -185,9 +210,9 @@ There are some rules for documenting different types of arguments and they're li
This can span multiple lines.
```
* Include `type` in backticks.
* Add *optional* if the argument is not required or has a default value.
* Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
* Include `type` in backticks.
* Add *optional* if the argument is not required or has a default value.
* Add "defaults to X" if it has a default value. You don't need to add "defaults to `None`" if the default value is `None`.
These arguments can also be passed to `@auto_docstring` as a `custom_args` argument. It is used to define the docstring block for new arguments once if they are repeated in multiple places in the modeling file.
@ -245,7 +270,7 @@ When working with modular files (`modular_model.py`), follow the guidelines belo
The `@auto_docstring` decorator automatically generates docstrings by:
1. Inspecting the signature (arguments, types, defaults) of the decorated class' `__init__` method or the decorated function.
2. Retrieving the predefined docstrings for common arguments (`input_ids`, `attention_mask`, etc.) from internal library sources like [`ModelArgs`], [`ImageProcessorArgs`], and the `args_doc.py` file.
2. Retrieving the predefined docstrings for common arguments (`input_ids`, `attention_mask`, etc.) from internal library sources like [`ModelArgs`], [`ImageProcessorArgs`], and the `auto_docstring.py` file.
3. Adding argument descriptions in one of two ways as shown below.
| `r""" """` | add custom docstring content directly to a method signature or within the `__init__` docstring | document new arguments or override standard descriptions |
| `custom_args` | add custom docstrings for specific arguments directly in `@auto_docstring` | define docstring for new arguments once if they're repeated in multiple places in the modeling file |
4. Adding class and function descriptions. For model classes with standard naming patterns, like `ModelForCausalLM`, or if it belongs to a pipeline, `@auto_docstring` automatically generates the appropriate descriptions with `ClassDocstring` from `args_doc.py`.
4. Adding class and function descriptions. For model classes with standard naming patterns, like `ModelForCausalLM`, or if it belongs to a pipeline, `@auto_docstring` automatically generates the appropriate descriptions with `ClassDocstring` from `auto_docstring.py`.
`@auto_docstring` also accepts the `custom_intro` argument to describe a class or function.
7. Adding return values to the docstring. For methods like `forward`, the decorator automatically generates the `Returns` field in the docstring based on the method's return type annotation.
For example, if a method returns a [`~transformers.utils.ModelOutput`] subclass, `@auto_docstring` extracts the field descriptions from the class' docstring to create a comprehensive return value description. You can also manually specifiy a custom `Returns` field in a functions docstring.
For example, if a method returns a [`~transformers.utils.ModelOutput`] subclass, `@auto_docstring` extracts the field descriptions from the class' docstring to create a comprehensive return value description. You can also manually specify a custom `Returns` field in a functions docstring.
8. Unrolling kwargs typed with the unpack operator. For specific methods (defined in `UNROLL_KWARGS_METHODS`) or classes (defined in `UNROLL_KWARGS_CLASSES`), the decorator processes `**kwargs` parameters that are typed with `Unpack[KwargsTypedDict]`. It extracts the documentations from the `TypedDict` and adds each parameter to the function's docstring.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.