* tmp
* fix modular inheritance
* nit
* paligemma 1 doesn't have swa
* use same pattern as in models with hybrid layers
* PR comments
* helium also needs layer_typed (bc it relies on gemma)
* paligemma/gemma3: same mask creation fn in fwd and generate
* propagate changes to helium (gemma-based)
* tmp commit
* slow paligemma tests passing, let's see what breaks
* fix test_left_padding_compatibility
* tmp commit
* tmp commit
* rebase error
* docs
* reduce diff
* like this?
* t5gemma
* better comment
* shorter diff
* exception
* ffs type
* optional
* shorter modular_gemma.py
* helium model actually needs no changes -- the tester is the issue
* t5gemma modular config
* a few more modular; paligemma BC
* fix processor issues?
* rm config exception
* lift warning in gemma
* fix bug in Mamba2 docs
* correct 'because on of' issue
* link to other Mamba2 model types
* github URL is not changed
* update error message in generated files
* [i18n-bn] Add Bengali language README file and update links in existing language files
* Update Bengali README for clarity and consistency in model descriptions
* Fix typos and formatting in English docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Fix typos and formatting in Chinese docs
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
* Add Qwen3Omni
* make fix-copies, import properly
* nit
* fix wrong setup. Why was audio_token_id renamed ?
* upds
* more processing fixes
* yup
* fix more generation tests
* down to 1?
* fix import issue
* style, update check repo
* up
* fix quality at my best
* final quality?
* fix doc building
* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE
* SKIP THE TEMPLATE ONE
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* fix: bug that made early stop change order of matches
* fix: applied code suggestion
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: applied code suggestion to modular
* fix: integration tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
ENH Enable readline support for chat
This small change enables GNU readline support for the transformers chat
command. This includes, among others:
- advanced navigation and editing: ctrl + a ctrl + e alt + b alt + f
ctrl + k alt + d etc.
- navigate and search history: arrow up/down ctrl + p ctrl + n ctrl + r
- undo: ctrl + _
- clear screen: ctrl + l
Implementation
Although it may look strange, just importing readline is enough to
enable it in Python, see:
https://docs.python.org/3/library/functions.html#input
As readline is not available on some
platforms (https://docs.python.org/3/library/readline.html), the import
is guarded.
Readline should work on Linux, MacOS, and with WSL, I'm not sure about
Windows though. Ideally, someone can give it a try. It's possible that
Windows users would have to install
pyreadline (https://pypi.org/project/pyreadline3/).
* clean start to bert refactor
* some test fixes
* style
* fix last tests
* be strict on positional embeddings, fixup according tests
* cache support
* more cache fixes, new causal API
* simplify masks, fix tests for gen
* flex attn, static cache support, round of fixes
* ?
* this time
* style
* fix flash attention tests, flex attention requires torch 2.7.x to work with multiple classes (as recompile strats force a size call which is wrongly interpreted before)
* roberta
* fixup sdpa remains
* attention split, simplify args and kwargs, better typing
* fix encoder decoder
* fix test
* modular roberta
* albert
* data2vectext, making it modular tomorrow
* modular data2vec text
* tmp disable
* xmod + cache position fixes
* whoops
* electra + markuplm, small fixes
* remove wrong copy
* xlm_roberta + some embedding fixes
* roberta prelayernorm
* RemBert: remove copy, maybe doing it later
* ernie
* fix roberta offloading
* camembert
* copy fixes
* bert generation + fixes on eager
* xlm roberta xl
* bridgetower (text) + seamlessv2 copy fixes
* rocbert + small fixes
* whoops
* small round of fixups
* NOTE: kernels didnt load with an earlier version, some fixup (needs another look bc cross deps)
* the end of the tunnel?
* fixup nllbmoe + style
* we dont need this anymore
* megatron bert is barely used, low prio skip for now
* Modernize bert (template for others)
NOTE: trying to push this through, might be overdue if not in time possible
* check inputs for all others (if checkmarked)
* fix bridgetower
* style
* fix encoder decoder (partially but cause found and fix also, just needs to be done for everything else)
* proper fix for bert to force intermediate dict outputs
* propagate to others
* style
* xlm roberta xl investigation, its the layernorm...
* mobile bert
* revert this, might cause issues with composed models
* review
* style
* setup
* start the purge
* continue the purge
* more and more
* more
* continue the quest: remove loading tf/jax checkpoints
* style
* fix configs
* oups forgot conflict
* continue
* still grinding
* always more
* in tje zone
* never stop
* should fix doc
* fic
* fix
* fix
* fix tests
* still tests
* fix non-deterministic
* style
* remove last rebase issues
* onnx configs
* still on the grind
* always more references
* nearly the end
* could it really be the end?
* small fix
* add converters back
* post rebase
* latest qwen
* add back all converters
* explicitly add functions in converters
* re-add
* Add LFM2-VL support
* add tests
* linting, formatting, misc review changes
* add siglip2 to auto config and instantiate it in lfm2-vl configuration
* decouple image processor from processor
* remove torch import from configuration
* replace | with Optional
* remove layer truncation from modeling file
* fix copies
* update everything
* fix test case to use tiny model
* update the test cases
* fix finally the image processor and add slow tests
* fixup
* typo in docs
* fix tests
* the doc name uses underscore
* address comments from Yoni
* delete tests and unsuffling
* relative import
* do we really handle imports better now?
* fix test
* slow tests
* found a bug in ordering + slow tests
* fix copies
* dont run compile test
---------
Co-authored-by: Anna <anna@liquid.ai>
Co-authored-by: Anna Banaszak <48625325+ankke@users.noreply.github.com>
* fix(trainer): ensure final checkpoint is saved when resuming training
* add test
* make style && slight fix of test
* make style again
* move test code to test_trainer
* remove outdated test file
* Apply style fixes
---------
Co-authored-by: rangehow <rangehow@foxmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* use consistent naming for padding
* no validation on pad size
* add warnings
* fix
* fox copies
* another fix
* fix some tests
* fix more tests
* fix lasts tests
* fix copies
* better docstring
* delete print
* working draft for LongCat
* BC changes to deepseek_v3 for modular
* format
* various modularities
* better tp plan
* better init
* minor changes
* make modular better
* clean up patterns
* Revert a couple of modular commits, because we won't convert in the end
* make things explicit.
* draft test
* toctree, tests and imports
* drop
* woops
* make better things
* update test
* update
* fixes
* style and CI
* convert stuff
* up
* ah, yes, that
* enable gen tests
* fix cache shape in test (sum of 2 things)
* fix tests
* comments
* re-Identitise
* minimize changes
* better defaults
* modular betterment
* fix configuration, add documentation
* fix init
* add integration tests
* add info
* simplify
* update slow tests
* fix
* style
* some additional long tests
* cpu-only long test
* fix last tests?
* urg
* cleaner tests why not
* fix
* improve slow tests, no skip
* style
* don't upcast
* one skip
* finally fix parallelism
* Support training florence2
* update doc and testing model to florence-community
* fix florence-2 test, use head dim 16 instead of 8 for fa2
* skip test_sdpa_can_dispatch_on_flash
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Fix#40067 : add UMT5 support in GGUF loader (config, tokenizer, test)
* chore: fix code formatting and linting issues
* refactor: move UMT5 GGUF test to quantization directory and clean up comments
* chore: trigger CI pipeline
* refactor(tests): Move UMT5 Encoder GGUF test to GgufModelTests. This consolidates the new test into the main class for consistency.
* Add regression check to UMT5 encoder GGUF test
Verify encoder output against reference tensor values with appropriate tolerances for stability.
* Update tests/quantization/ggml/test_ggml.py
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Update tests/quantization/ggml/test_ggml.py
remove comments
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
---------
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* Improve module name handling for local custom code
* Use `%lazy` in logging messages
* Revert "Use `%lazy` in logging messages"
This reverts commit 5848755d5805e67177c5218f351c0ac852df9340.
* Add notes for sanitization rule in docstring
* Remove too many underscores
* Update src/transformers/dynamic_module_utils.py
* Update src/transformers/dynamic_module_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Move variable output controls to `prepare_inputs_for_generation`
* fix a bunch of models
* back to basics
* final touches
* Fix for CB attn mask and refactor
* Tests for CB (not all passing)
* Passing tests and a logger fix
* Fixed the KV metrics that were broken when we moved to hybrid alloc
* Fix circular import and style
* Added tests for FA
* Unfolded test to have device expectations
* Fixes for H100
* more fixes for h100
* H100 are good
* Style
* Adding some comments from #40831
* Rename test
* Avoid 1 letter variables
* Dictonnary is only removed during kwargs
* Test for supported sample
* Fix a unvoluntary slice
* Fixes for non-sliced inputs and small example improvments
* Slice inputs is more understandabe
* Style
* Update no split modules in T5Gemma model
* Update no_split_modules also for T5Gemma modular
* Remove model_split_percents from test cases
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Fix edge case for tokenize (#36277)
* Fix tokenizing dtype for float input cases
* add test for empty input string
* deal empty list of list like [[]]
* add tests for tokenizer for models with input that is not plain text
* created robust token counting by using existing include_num_input_tokens_seen variable and kept bool for backward compatibility and added string also to ensure everything goes well and kept default as is. also robust test cases are created
* some codebase mismatched in my local and remote, commiting to solve it and also solved code quality issue
* ci: retrigger tests
* another attemp to trigger CI for checks
* Fix DeepSpeed mixed precision precedence over Accelerate defaults
Resolves issue where Accelerate would default to bf16 mixed precision
when a DeepSpeed config specifies fp16, causing a ValueError. The fix
ensures DeepSpeed config takes precedence over TrainingArguments defaults
while preserving explicit user settings.
Changes:
- Add override_training_args_from_deepspeed() method to handle config precedence
- Reorder mixed precision environment variable setting in TrainingArguments
- Ensure DeepSpeed fp16/bf16 settings override defaults but not explicit choices
Fixes#39849
* Add tests for DeepSpeed mixed precision precedence fix
- Add TestDeepSpeedMixedPrecisionPrecedence class with 3 focused tests
- Test DeepSpeed fp16/bf16 config overriding TrainingArguments defaults
- Test user explicit settings being preserved over DeepSpeed config
- Test precedence hierarchy: user settings > DeepSpeed config > defaults
- Replace massive 934-line test bloat with concise 50-line test suite
- Tests cover core functionality of PR #39856 mixed precision precedence fix
* Fix module loading for models with dots in names
* quality check
* added test
* wrong import
* Trigger CI rerun after making test model public
* Update src/transformers/dynamic_module_utils.py
* Update tests/utils/test_dynamic_module_utils.py
* Update tests/utils/test_dynamic_module_utils.py
* Move test
* make fixup
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Matt <rocketknight1@gmail.com>
* CB example: better compare feature
* Cache managers, still issue w/ effective length
* WIP -- fix for effective length
* Renames
* Wroking, need better parity checks, we mind be missing 1 token
* Small fixes
* Fixed wrong attn mask and broke cache into pieces
* Warmup is slowing down things, disabling it
* Cache was too big, fixed
* Simplified index objects
* Added a profile option to the example
* Avoid calls to memory reporing tools
* Restore full attention read indices for better latency
* Adressed some TODOS and style
* Docstrings for cache managers
* Docstrings for Schedulers
* Refactor scheudlers
* [Important] Cache fix for sliding window, check with small sw size
* Updated doc for cache memory compute and cache as a whole
* Moved a todo
* Nits and style
* Fix for when sliding window is smaller than max batch per token
* Paged interface update
* Support for FLash in new API
* Fix example CB
* Fix bug in CB for paged
* Revert example
* Style
* Review compliance
* Style
* Styleeeee
* Removed NO_SLIDING_WINDOW
* Review #2 compliance
* Better art
* Turn cum_seqlens_k in a dict
* Attn mask is now a dict
* Update examples/pytorch/continuous_batching.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Adressed McPatate pro review
* Style and fix
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Add EfficientLoFTRImageProcessorFast for GPU-accelerated image processing
* Fix fast processor output format and add comprehensive tests
* Fix trailing whitespace in test file
* Apply ruff formatting to test file
* simplify pair validation logic
* add superglue tests to fast image processor
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Fix continue_final_message parameter in apply_chat_template
* after run fixup
* Handle trim in the template
* after fixup
* Update src/transformers/utils/chat_template_utils.py
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* feat: err when unsupported attn impl is set w/ `--continuous_batching`
* refactor: move defaults and support list to CB code
* feat: add action item in error msg
* fix(serve): add default attn implementation
* feat(serve): add log when `attn_implementation` is `None`
* feat: raise Exception when attn_implementation is not supported by CB
* change |= operator to use torch logical or for friendly export to different backends
* change |= operator to use torch logical or for friendly export to different backends in grounding dino model
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
* initial commit
* initial setup
* Overiding imageGPT specific functions
* imported is_torch_available and utilized it for importing torch in imageGPT fast
* Created init and ImageGPTFastImageProcessorKwargs
* added return_tensors, data_format, and input_data_format to ImageGPTFastImageProcessorKwargs
* set up arguments and process and _preprocess definitions
* Added arguments to _preprocess
* Added additional optional arguments
* Copied logic over from base imageGPT processor
* Implemented 2nd draft of fast imageGPT preprocess using batch processing
* Implemented 3rd draft of imageGPT fast _preprocessor. Pulled logic from BaseImageProcessorFast
* modified imageGPT test file to properly run fast processor tests
* converts images to torch.float32 from torch.unit8
* fixed a typo with self.image_processor_list in the imagegpt test file
* updated more instances of image_processing = self.image_processing_class in the test file to test fast processor
* standardized normalization to not use image mean or std
* Merged changes from solution2 branch
* Merged changes from solution2 test file
* fixed testing through baseImageGPT processor file
* Fixed check_code_quality test. Removed unncessary list comprehension.
* reorganized imports in image_processing_imagegpt_fast
* formatted image_processing_imagegpt_fast.py
* Added arg documentation
* Added FastImageProcessorKwargs class + Docs for new kwargs
* Reformatted previous
* Added F to normalization
* fixed ruff linting and cleaned up fast processor file
* implemented requested changes
* fixed ruff checks
* fixed formatting issues
* fix(ruff after merging main)
* simplify logic and reuse standard equivalenec tests
---------
Co-authored-by: Ethan Ayaay <ayaayethan@gmail.com>
Co-authored-by: chris <christine05789@gmail.com>
Co-authored-by: Ethan Ayaay <98191976+ayaayethan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Squashed previous branch
* unify assisted generate to common decoding method signature
* move checks to validate steps where possible
* fix csm and other models that override _sample
* ops dia you again
* opsie
* joao review
* Fix broken Llama4 accuracy in MoE part
Llama4 accuracy is broken by a bug in
https://github.com/huggingface/transformers/pull/39501 . It forgot to
transpose the router_scores before applying it to routed_in, causing
Llama4 to generate garbage output.
This PR fixes that issue by adding back the transpose() and adding some
comments explaining why the transpose() is needed.
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
* remove comment
---------
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* feat: support request cancellation
* test: add cancellation test
* refactor: use exisitng fn to check req cancellation
* feat(cb): make cancellation thread safe
* refactor(serve): update test to use `requests` instead of `httpx`
* Add instance attribute to DacVectorQuantize for use in DacResidualVectorQuantize.from_latents
* add from_latent tests
* style fix
* Fix style for test_modeling_dac.py
* add seq class for gemma3 text model
* add Gemma3TextForSequenceClassification to modeling file
* After run make fixup
* let's just check
* thiis is why it was crashing, tests were just failing...
* skip it, tested only for seq clf
---------
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
* fix MetaCLIP 2 wrong link & wrong model names in the documentation and docstrings
* ruff reformatted
* update files generated by modular
* update meta_clip2 to metaclip_2 to match the original
* _supports_flash_attn = False
---------
Co-authored-by: Yung-Sung Chuang <yungsung@meta.com>
* Support MUSA (Moore Threads GPU) backend in transformers
Add accelerate version check, needs accelerate>=0.33.0
* Support TF32 flag for MUSA backend
* fix typo
* fix: continuous batching in `transformers serve`
* fix: short circuit inner gen loop when prepare_next_batch prepared nothing
* docs: add comment explaining FastAPI lifespan
* test: add CB serving tests
* refactor: remove gen cfg max new tokens override bc unnecessary
* docs: add docstring for `ServeCommand::run`
* feat: use new `DecodeStream` API
* Expectations for gemma3
* Fixes for Qwen2_5_VL tests
* Added expectation but underlying pb is still there
* Better handling of mrope section for Qwen2_5_vl
* Fixes for FA2 tests and reformat batch test for Qwen2_5_Omni
* Fix multi-device error in qwen2_5_omni
* Styel and repo-consistency
* Removed inherited test because fix in common
* slow tests fixes
* Style
* Fixes for qwen2_5_vl or omni for FA test
* update make nested image list
* fix make flat list of images
* update type anno
* fix image_processing_smolvlm
* use first image
* add verbose comment
* fix images
* rollback
* fix ut
* Update image_processing_smolvlm.py
* Update image_processing_idefics3.py
* add tests and fix some processors
* fix copies
* fix after rebase
* make the test cover chat templates
* sjip udop, no point in fixing it
* fix after rebase
* fix a few more tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: raushan <raushan@huggingface.co>
* porting not maintained jieba to rjieba
* Fix format
* replaced the line with rjieba instead of removing it
* cut_all is not included as a parameter. cut_all is a seperate function rjieba
* rev
* jieba remove installation
* Trigger tests
* Update tokenization_cpm.py
* Update tokenization_cpm_fast.py
---------
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Add bfloat16 support detection for MPS (Apple Silicon) in is_torch_bf16_gpu_available
bfloat16 seems to have been supported for a few years now in Metal and torch.mps.
Make sure to allow it and not throw on bf16 usage with "Your setup doesn't support bf16/gpu." from TrainingArguments.
* Check bf16 support for MPS using torch method
Actually seems method exists: 5859edf113/torch/_dynamo/device_interface.py (L519)
It simply checks if you are on MacOs 14 or higher.
* Document Metal emulation for bf16 support
Add note about Metal emulation for bf16 support on M1/M2.
* Update bf16 support check for MPS backend
is_bf16_supported() not exposed even if defined on MPSInterface, use same approach as in accelerate pr.
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* first step if flash not installed but you set to use it
* try importing
* now default to using it
* update our tests as well
* wow yesterday I was not awake
* fixup
* style
* lol the fix was very very simple
* `RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/kernels@main#egg=kernels
` for updated dockers
* push review comments
* fix
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* dump ugly option to check again tomorrow
* tiny update
* do not save as nested dict yet!
* fix and add tests
* fix dia audio tokenizers
* rename the flag and fix new model Evolla
* fix style
* address comments
* broken from different PRp
* fix saving layoutLM
* delete print
* delete!
* init swissai model
* AutoModelForCausalLM
* AutoModelForCausalLM mapping
* qk norm and post ln optional
* fix wrong shape of qk norm: megatron uses head_dim
* automodel fixes
* minor fix in forward
* fix rope validation to accept llama3 scaling
* `SwissAIForTokenClassification` support
* Align `SwissAI` to v4.52.4
* Align `SwissAI` to v4.53.1
* Init CUDA xIELU
* `SwissAI*`->`Apertus*`
* ci fix
* check_docstring ignore ApertusConfig
* Licensing and placeholder tests
* Placeholder doc
* XIELU syntax
* `_xielu_python` optimization
* Fix xIELU
* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint
* Modular `Apertus`
* CUDA xIELU logging
* ci fix
* ci fix
* ci fix
* Update license
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update tests/models/apertus/test_modeling_apertus.py
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* `.utils.import_utils.is_torchdynamo_compiling`
* `Apertus` class ordering
* `past_key_value{->s}`, `make fix-copies`
* ci fix
* Remove unused configuration parameters
* `{beta,eps}` saved in checkpoint
* `{beta,eps}` Temporarily on CPU
* Suggestions
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* ci fix
* remove fx_compatible (deprecated)
* remove `rotary_embedding_layer`
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
* fully removing `Mask4DTestHard` class
Not needed (for now)
* switch to `dtype` instead of `torch_dtype`
Following this:
https://github.com/huggingface/transformers/pull/39782
* remove unused imports
* remove `cache_implementation="static"`
* +Apertus to `docs/source/en/_toctree.yml` for the doc builder
---------
Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
* docs(pixtral): Update Pixtral model card to new format
* docs(pixtral): Change cuda into auto for device_map
* docs(pixtral): Apply suggestions from review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Apply suggestions from review, changing mistral-community into Mistral AI
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Apply suggestions from review [!TIP] part
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Finalize model card with tested code examples
This commit finalizes the update for the Pixtral model card.
* Fix the hfoption by the right one
* @BryanBradfo docs(pixtral): Changing the redirection of bitsandbytes
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Add of ` to highlight the tokens
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* docs(pixtral): Move image block per final review
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix in modular
* remove leftover print
* fix everything except when it's in assignment
* fix assignment as well
* more general
* better
* better
* better comment
* docstring
* cleaner
* remove base
* doc
* Rework of the CB example
* Further rework of CB example
* Refactor PA cache, slice on tokens, add debug prints -- WIP
* Slice cache -- WIP
* Added a mechanism to check batched outputs in CB script
* Less logging, debug flag for slice, !better reset! -- WIP
* QOL and safety margins
* Refactor and style
* Better saving of cb example
* Fix
* Fixes and QOL
* Mor einformations about metrics
* Further logging
* Style
* Licenses
* Removed some comments
* Add a slice input flag
* Fix in example
* Added back some open-telemetry deps
* Removed some aux function
* Added FA2 option to example script
* Fixed math (all of it)
* Added a simple example
* Renamed core to classes
* Made allocation of attention mask optionnal
* Style
* Relaxed assumptions on cache_config
* Review compliance
* Style
* Styyyle
* Removed default and added args
* Rebase mishapfix
* Propagate args to TorchExportableModuleForDecoderOnlyLM
* Fix the test I wanted fixed in this PR
* Added some AMD expectation related to cache tests
* draft update two models for now
* batch update all VLMs first
* update some more image processors
* update
* fix a few tests
* just make CI green for now
* fix copies
* update once more
* update
* unskip the test
* fix these two
* fix torchcodec audio loading
* maybe
* yay, i fixed torchcodec installation and now can actually test it
* fix copies deepseek
* make sure the metadata is returrned when users request it
* add docs
* update
* fixup
* Update src/transformers/audio_utils.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/glm4v/video_processing_glm4v.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* update
* what if we set some metadata attr to `None`
* fix CI
* fix one test
* fix 4 channel test
* fix glm timestemps
* rebase gone wrong
* raise warning once
* fixup
* typo
* fix copies
* ifx smolvlm test
* this is why torch's official benchmark was faster, set threads to `0`
* Apply style fixes
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* initial context_parallel_size support in trainer
* For context parallelism, use AVG instead of SUM to avoid over-accounting tokens
* use parallelism_config.cp_enabled
* add parallelism_config to trainer state
* warn when auto-enabling FSDP
* fix some reviews
* WIP: somewhat matching loss
* Feat: add back nested_gather
* Feat: cleanup
* Fix: raise on non-sdpa attn
* remove context_parallel_size from TrainingArguments
* if we have parallelism_config, we defer to get_state_dict from accelerate
* fix form review
* Feat: add parallelism config support
* Chore: revert some unwanted formatting changes
* Fix: check None
* Check none 2
* Fix: remove duplicate import
* Update src/transformers/trainer.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/training_args.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fin
* require accerelate 1.10.1 and higer
---------
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Matej Sirovatka <54212263+S1ro1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Add `tokenizer_kwargs` arg to text generation pipeline.
* chore: re-run CI
* Rename `tokenizer_kwargs` to `tokenizer_encode_kwargs` for text generation pipeline
* Fix `tokenizer_encode_kwargs` doc string.
* Fix note related to `tokenizer _kwargs` in text generation pipeline
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* add a test
* tempdir
* fix import issue[
* wow I am tired
* properly init
* i am not super familiar with quantizer api :|
* set to TRUE fro now
* full support
* push current changes
* will clean this later but the imports are a shitshow here
* this correctly saves the block and scales but forward seems broken
* quanitze was not correct
* fix storage
* why were bias even included
* finally!
* style
* fix style
* remove print
* lazy import
* up
* not sure what happens this works now?
* holy molly it was not so far
* okay this seems to work!
* workings!!!
* allow save_pretrained to create PR
* Apply suggestions from code review
* fixup
* add deqyabtze fakse as wek
* working new
* fix
* rm swizzle and unswizzle during saving
* rm print
* Update src/transformers/modeling_utils.py
* fix
* style
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
* Fix label smoothing incompatibility with multi-label classification (#40258)
* Improve label smoothing multi-label check based on reviewer feedback
- Move check from LabelSmoother to Trainer.__init__() for better architecture
- Use model.config.problem_type instead of tensor inference for robustness
- Warn and disable smoothing instead of raising error for better UX
- Update test to verify warning behavior
Renamed wer metric variable to wer_metric to avoid naming conflict
with local variable assignment in compute_metrics function.
Co-authored-by: pranam-gf <pranam@goodfin.com>
Fixed 4 instances of the typo "seperator" → "separator" in variable names:
- 2 instances in src/transformers/models/shieldgemma2/convert_shieldgemma2_weights_orbax_to_hf.py
- 2 instances in src/transformers/models/gemma3/convert_gemma3_weights_orbax_to_hf.py
These typos were in variable names used for parsing path components in weight conversion scripts.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <noreply@anthropic.com>
* fix to the typings which are unmatched to FA function signature
cumulative_seqlens_q/k -> cu_seq_lens_q/k:
- in the FlashAttentionKwargs in modeling_flash_attention_utils
- in the TransformersKwargs in generic
- in the PagedAttentionArgs in continuous_batching
It is **BC**, because they are created in `ContinuousBatchProcessor.setup_static_tensors:L762`, used in `ContinuousBatchingManager._model_forward:L1233` and destroyed with `ContinuousBatchProcessor`
* format changes by ruff
* Update src/transformers/integrations/flash_paged.py
unused function arg in `PagedAttentionCache.update`
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* revert continuous_batching signiture, which is more meaningful
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* this should be explicit
* fix Nth ESM exception
* tryout decoder
* this as well
* revert again
* 🧠
* aaah ESM has two modelings aaah
* broom broom
* format
* wrong copies
* copies
* modular cleanups
* format
* modularities
* wrong mergefix
* seriously
* align with new model
* new model
* update everywhere
* style
* pipelines
* switch it everywhere in tests
* switch it everywhere in docs
* switch in converters everywhere
* update in examples
* update in model docstrings
* style
* warnings
* style
* Update configuration_utils.py
* fix
* Update configuration_utils.py
* fixes and add first test
* add pipeline tests
* Update test_pipelines_common.py
* add config test
* Update test_modeling_common.py
* add new ones
* post rebase
* add new
* post rebase adds
* Update trainer.md
* Update trainer.md
Removed the detail about label_names argument usage from the tip/ warning section
* Update training_args.py
Added the label_names usage clarification in the docstring
* Update trainer.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* handle support for cache classes when num enc layers != num dec layers
* handle overwrites
* one more corner case
* Update src/transformers/generation/utils.py
* Update src/transformers/generation/utils.py
* Apply suggestions from code review
* handle corner case :o
* fix
* cleanup, revert aimv2 fa changes
* fix aria
* i searched a long time but the cross dependency is for the recent models so...
* this was something... evolla
* fix modernbert decoder + make fa test more robust
* nit
* Clean up xcodec addition.
* Clean up config.
* Switch to fixtures test.
* Small stuff.
* Polish XCodec and standardize across codecs.
* Update src/transformers/models/xcodec/modeling_xcodec.py
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* Format and fix test.
* Update tol.
---------
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* make visualizer rely on create causal mask
* format
* fixup
* fixup
* read token
* read token, duh
* what is up with that token
* small tests?
* adjust
* try with flush
* normalize for ANSI
* buffer shenanigans
* Fix links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
* run fixup to update links in Glm4vMoe configuration classes to point to the correct Hugging Face model repository
* add basic type hints to import module
* run make fixup
* remove optional
* fixes
---------
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* it was long due!
* use the official kernel
* more permissive
* update the kernel as well
* mmm should it be this?
* up pu
* fixup
* Update test_modeling_gpt_oss.py
* style
* start with 20b
* Update modeling_utils.py
* make sure we update with the module's plan
* use public api
* oups
* update
* fix failing test
* Update src/transformers/integrations/tensor_parallel.py
* Update src/transformers/integrations/tensor_parallel.py
* fix
* make the API more friendly!
* fix tests
* fix styling
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* init
* add modular
* fixup
* update configuration
* add processing file
* update auto files
* update
* update modular
* green setup_and_quality ci
* it works
* fix some tests
* commit florence2
* update test
* make test cases done - 16 left
* style
* fix few test cases
* fix some tests
* fix init test
* update florence2 vision style
* hope is green
* fix init test
* fix init
* update modular
* refactor vision module
* fix: channel attention use dynamic scale
* update modular
* update
* update attention mask
* update
* fix naming
* Update src/transformers/models/florence2/processing_florence2.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* spatial block works
* more beautiful
* more more beautiful
* merge main
* merge main and fixup
* fix typing hint
* update modeling
* fix eager matches sdpa
* fix style
* fix compile test - all green
* remove florence2 language
* remove Florence2LanguageModel things
* fix style
* update florence2 model
* override prepare encoder_decoder for generation
* add weight conversion script
* rewrite channel attention to use sdpa
* eleminate 1 tranpose op
* support fa2
* fix quality check
* chore: reformat `test_modeling_florence2.py`
* some refactor for processor
* some refactor for processor
* update naming convention and remove BC
* make it pass the test
* fix: correct Embedding Cosine
* update comments and docstring
* support input_embeds
* support input embeds ideally
* fix style
* fix style
* fix style again :D
* add test prcoessor
* refactor processor and add test for processor
* reformat test processor
* make fixup
* fix schema check
* remove image_token
* ensure image token in tokenizer and fix integration tests
* fix processor test
* add more integration tests for large model and rename test_processor to test_processing
* test_assisted_decoding_sample should pass
* update doc and make model work with image text to text pipeline
* docs: add sdpa bagde
* resolve cyril's comments
* fix import torch error
* add helper get_placeholder_mask
* inherit from llava
* florence2 may not _supports_attention_backend because of bart ...
* move florence2 model card to multimodal
* let base model always return_dict
* fix style
* tiny update doc
* set _checkpoint_conversion_mapping = {}
* fix code quality
* support flex and compile graph and move external func to internal func
* remove condition because it always true
* remove window funcs
* move post processor config out
* fix ci
* new intro to trigger test
* remove `kernel_size` argument
---------
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix: pass adamw optimizer parameters to StableAdamW
* add test for stable_adamw initialization with trainer arguments
* address copilot suggestion
* fix: update weight_decay handling in stable_adamw kwargs
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update GPT-NeoX-Japanese model card
* Apply suggestions from code review
* Update gpt_neox_japanese.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Standardize RAG model card
Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added pipeline and AutoModel usage examples
- Included quantization example with BitsAndBytesConfig
- Added notes and resources sections
- Removed abstract and FlashAttention badge
* Standardize RAG model card
Update rag.md to follow the new Hugging Face model card template:
- Added friendly overview in plain language
- Added AutoModel usage example
- Included quantization example with BitsAndBytesConfig
* Fix chat CLI GPU loading and request_id validation issues (#40230)
This commit addresses two critical bugs in the transformers chat CLI:
1. **GPU Loading Issue**: Changed default device from "cpu" to "auto" in ChatArguments
- Chat CLI now automatically uses GPU when available instead of defaulting to CPU
- Matches the behavior of the underlying serving infrastructure
2. **Request ID Validation Error**: Added request_id field to TransformersCompletionCreateParamsStreaming schema
- Fixes "Unexpected keys in the request: {'request_id'}" error on second message
- Allows request_id to be properly sent and validated by the server
Both fixes target the exact root causes identified in issue #40230:
- Users will now get GPU acceleration by default when available
- Chat sessions will no longer break after the second message
* Remove unrelated request_id field from TransformersCompletionCreateParamsStreaming
* Update image_processing_perception_lm_fast.py
Allow for a proper override of vision_input_type in hf fast image processor, otherwise we need to resort to manually setting the attribute.
* Update processing_perception_lm.py to match kwargs vision input type
* Update image_processing_perception_lm_fast.py kwargs to signature args
* Skipping pytree registration in case fsdp is enabled
* Beauty changes
* Beauty changes
* Moved the is_fsdp_available function to import utils
* Moved is_fsdp_available to integrations.fsdp
* Skipping pytree registration in case fsdp is enabled
* Beauty changes
* Beauty changes
* Moved the is_fsdp_available function to import utils
* Moved is_fsdp_available to integrations.fsdp
* Added pytree registration inside dynamic cache class
* Making ci/cd lords happy
* Adding a check if DynamicCache is already a leaf
* Adding try/catch for multiple initializations of DynamicCache in test suites
* Moving dynamic cache pytree registration to executorch
* Adding try catch back
* set inputs_embeds to None while generate to avoid audio encoder forward in generation process
* set input_features to none instead
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
* Add expectation to t5 for rocm 9.4
* Made EncoderDecoderCache compatible with nn.DataParallel
* Fixed t5gemma EncoderDecoderCache
* Added todos in autoformer
* Ruff
* Init is self-contained
* Review compliance
* Fixed kwargs init of EncoderDecoderCache
* add jinja2 as a dependency
* Make jinja2 a core dependency in install_requires
- Add jinja2 to install_requires list in setup.py for automatic installation
- Add jinja2 to runtime version checks in dependency_versions_check.py
- Resolves issue where pip install transformers doesn't install jinja2
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* Make jinja2 a core dependency in install_requires
* Make jinja2 an extra dependency instead of adding a core dep
---------
Co-authored-by: Claude <noreply@anthropic.com>
* remove transpose_for_scores call
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* fix copied evolla code
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com>
* fix error vocab_size at Qwen2_5_VLForConditionalGeneration loss_function
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* fix similar errer at qwen2_vl and do make fix-copies
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* pass in kwargs for loss_func at qwen2_vl and qwen2_5_vl
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
* Apply style fixes
---------
Signed-off-by: luoxiaoc <xiaochuan.luo@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Revert "Pin torch to 2.7.1 on CircleCI for now (#40174)"
This reverts commit 31b6e6e1dac0d32f74ec5cd6b3c1868534ccd7b5.
* fix
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* docs: Update LayoutLM model card with standardized format
* Apply suggestions from code review
This commit incorporates all suggestions provided in the recent review. Further changes will be committed separately to address remaining comments.
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Address remaining review comments
* Address few more review comments:
1. remove transformer-cli section
2. put resources after notes
3. change API refs to 2nd level header
* Update layoutlm.md
* Update layoutlm.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Update check_tokenizers.py
chore(typing): add type hints to check_tokenizers script
- Annotate params/returns for helper functions
- Keep tokenizer instances as `Any` to avoid runtime coupling
- Make `check_LTR_mark` return `bool` explicitly (no behavior change)
* Update check_tokenizers.py
chore(typing): replace Any with PreTrainedTokenizerBase in check_tokenizers.py
- Use transformers.tokenization_utils_base.PreTrainedTokenizerBase for `slow` and `fast` params
- Covers both PreTrainedTokenizer and PreTrainedTokenizerFast
- Exposes required methods (encode, decode, encode_plus, tokenize)
- Removes generic Any typing while staying implementation-agnostic
* [MINOR:TYPO] Update base.py
All other occurrences in the docs use lowercase. (https://github.com/search?q=repo%3Ahuggingface%2Ftransformers%20translation_XX_to_YY&type=code)
Also, using uppercase doesn't work: tested with "translation_EN_to_FR" which doesn't work and instead returns: `ValueError: The task does not provide any default models for options ('EN', 'FR')`
It might be a good idea to allow for uppercase, but that's for another issue.
* [MINOR:TYPO] Update __init__.py
* update
* fix the test for DepthPro
* PR comments
* wait, I didn't delete this in prev commit?
* fix
* better way
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* added dates to the models with a single hf papers link
* added the dates for models with multiple papers
* half of no_papers models done
* rest of no_papers models also done, only the exceptions left
* added copyright disclaimer to sam_hw, cohere, cohere2 + dates
* some more fixes, hf links + typo
* some new models + a rough script
* the script looks robust, changed all paper links to hf
* minor change to handle technical reports along with blogs
* ran make fixup to remove the white space
* refactor
* build: add TvpImageProcessorFast
- Introduced TvpImageProcessorFast to enhance image processing capabilities.
- Updated image processing auto registration to include the new fast processor.
- Modified tests to accommodate both TvpImageProcessor and TvpImageProcessorFast, ensuring comprehensive coverage for both classes.
* fix: TvpImageProcessorFast with new resize method and update processing logic
* build: add TvpImageProcessorFast
* refactor: clean up whitespace and formatting in TvpImageProcessorFast and related tests
- Removed unnecessary whitespace and ensured consistent formatting in image_processing_tvp_fast.py.
- Updated import order in test_image_processing_tvp.py for clarity.
- Minor adjustments to maintain code readability and consistency.
* fix: Enhance TvpFastImageProcessorKwargs and update documentation
- Added TvpFastImageProcessorKwargs class to define valid kwargs for TvpImageProcessorFast.
- Updated the documentation in tvp.md to include the new class and its parameters.
- Refined the image processing logic in image_processing_tvp_fast.py for better handling of padding and resizing.
- Improved test cases in test_image_processing_tvp.py to ensure compatibility with the new processing logic and tensor inputs.
* fix: tested now with python 3.9
* fix: remove tvp kwargs from docs
* simplify processing
* remove import and fix tests
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* fix: changed is_causal to be False
* fix: Added original cross attention bug
* fix: fixed the way bordel removal is computed
* fix: added missing normalization on coarse features
* test: fixed integration tests
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* initial comment
* test
* initial conversion for outline
* intermediate commit for configuration
* chore:init files for sam2
* adding arbitary undefined config
* check
* add vision
* make style
* init sam2 base model
* Fix imports
* Linting
* chore:sam to sam2 classes
* Linting
* Add sam2 to models.__init__
* chore:match prompt encoder with sam2 code
* chore:prepare kwargs for mask decoder
* Add image/video predictors
* Add CUDA kernel
* Add output classes
* linting
* Add logging info
* tmp commit
* docs for sam2
* enable image processing
* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize
* enable promptencoder of sam2
* fix promprencoder
* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)
* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)
* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)
* SamModel is now available (Need more chore for name)
* make fix-copies
* make style
* make CI happy
* Refactor VisionEncoder and PostioinEmbedding
* TO DO : fix the image_embeddings and sparse_embeddings part
* pure image inference done
* reusable features fix and make style
* styling
* refactor memoryattention
* tmp
* tmp
* refactor memoryencoder
TO DO : convert and inference the video pipeline
* TO DO : fix the image_encoder shape
* conversion finish
TO DO: need to check video inference
* make style
* remove video model
* lint
* change
* python utils/check_docstringspy --check_all
* python utils/check_config_attributes.py
* remove copies for sam2promptencoder due to configuration
* change __init__.py
* remove tensorflow version
* fix that to not use direct comparison
* make style
* add missing import
* fix image_embedding_size
* refactor Sam2 Attention
* add fully working video inference (refactoring todo)
* clarify _prepare_memory_conditioned_features
* simplify modeling code, remove unused paths
* use one model
* use auto_docstring
* refactor rope embeddings
* nit
* not using multimask when several points given
* add all sam2.1
* add video tmp
* add Sam2VideoSessionState + fast image proc + video proc
* remove init_states from model
* fix batch inference
* add image integration tests
* uniformize modeling code with other sam models and use modular
* pass vision tests an most model tests
* All tests passing
* add offloading inference state and video to cpu
* fix inference from image embedding and existing mask
* fix multi_boxes mask inference
* Fix batch images + batch boxes inference
* improve processing for image inference
* add support for mask generation pipeline
* add support for get_connected_components post processing in mask generation
* add fast image processor sam, image processor tests and use modular for sam2 image processor
* fix mistake in sam after #39120
* fix init weights
* refactor convert
* add integration tests for video + other improvements
* add needed missing docstrings
* Improve docstrings and
* improve inference speed by avoiding cuda sync
* add test
* skip test for vision_model
* minor fix for vision_model
* fix vision_model by adding sam2model and change the torch dependencies
* remove patch_size
* remove image_embedding_size
* fix patch_size
* fix test
* make style
* Separate hieradet and vision encoder in sam2
* fixup
* review changes part 1
* remove MemoryEncoderConfig and MemoryAttentionConfig
* pass q_stride instead of q_pool module
* add inference on streamed videos
* explicitely process streamed frames
* nit
* Improve docstrings in Sam2Model
* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel
* improve video inference api
* change inference_state to inference_session
* use modular for Sam2Model
* fix convert sam2 hf
* modular
* Update src/transformers/models/sam2/video_processing_sam2.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix minor config
* fix attention loading error
* update modeling tests to use hub checkpoints
* Use CI A10 runner for integration tests values + higher tolerance for video integration tests
* PR review part 1
* fix doc
* nit improvements
* enforce one input format for points, labels and boxes
* nit
* last few nits from PR review
* fix style
* fix the input type
* fix docs
* add sam2 model as conversion script
* improve sam2 doc
* nit fixes + optimization
* split sam2 and sam2_video in two models
* PR review part 1
* fix None for default slow processor of sam2
* remove unecessary code path in sam2_video
* refactor/simplify RoPE
* replace embedding module list with embedding matrix
* fix tests
* remove kernel
* nit
* use lru_cache for sine_pos_embeddings
* reorder sam2_video methods
* simplify sam2_video
* PR review part 1
* simplify sam2 video a lot
* more simplification
* update integration tests with updated conftest
* more explicit config for hieradet
* do post_processing outside of sam2 video model
* Improve Sam2VideoVisionRotaryEmbedding
* fix tests
* update docs and fix mask2former/oneformer
* avoid unnecessary reshapes/permute
* fix device concatenating points
* small dtype fix
* PR review
* nit
* fix style and finish up doc
* fix style
* fix docstrings
* fix modular
---------
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* docs: ko: main_classes/optimizer_schedules
* feat: nmt draft
* fix: improve TOC anchors and expressions in optimizer_schedules
- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions
* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'
Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.
* fix: Use more natural Korean inheritance expression
Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.
* fix: Use consistent '미세 조정' translation for 'finetuned models'
Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT
* fix min torchvision version
* use InterpolationMode directly
* remove unused is_torchvision_greater_or_equal,
* nit
* Add initial collated reports script and job definition
* provide commit hash for this run. Also use hash in generated artifact name. Json formatting
* tidy
* Add option to upload collated reports to hf hub
* Add glob pattern for test report folders
* Fix glob
* Use machine_type as path filter instead of glob. Include machine_type in collated report
* fix flash attention
* i got a stroke reading that comment
* change dropout kwarg back to before
* rename _fa3... as it's used for multiple variants and should work as fallback instead
* simplify imports and support kwargs for fa
* style
* fix comments order
* small fix
* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart
* style
* allow fullgraph by preloading on init
* make globals "private"
* ci pls be happy
* change skip conditions based on backend flag (indicating missing mask interface)
* move globals support to a function to prepare kwargs
* style
* generalize supported kwargs
* small change to doc
* fix
* add comments
* style
* revert prep during generate
* style
* revert weird style changes
* add fa kwarg prep during generate with fixes back
* how did this even happen
* how
* add comment
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
* Fix PerceptionLM image preprocessing for non-tiled image input.
* Add test for single tile vanilla image processing.
* ruff format
* recover missing test skip
* Simplify test.
* minor test name fix
* Update HuBERT model card according to template
Standardized HuBERT doc, added ASR examples, Flash Attention 2 support, and quantization section.
* Address review comments and changes requested to hubert.md
* Update hubert.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* init
* update
* uupdate
* ruff
* t patch is 2 defalut not 1
* draft
* back
* back1
* update
* config update
* update using glm-41 format
* add self.rope_scaling = config.rope_scaling
* update config
* update
* remove the processor
* update
* fix tests
* update
* for test
* update
* update 2126
* self.rope_scaling is missing in GLM4MOE lets add it
* update
* update
* Update modular_glm4v_moe.py
* change config
* update apply_multimodal_rotary_pos_emb
* format
* update
* Delete 3-rollout_qas_thinking_answers.py
* use right name
* update with place holder
* update
* use right rotary
* Update image_processing_glm4v_fast.py
* rope_config_validation needs to rewrite the entire config file in modular
* update
* changed name
* update
* Update modeling_glm4v_moe.py
* _init_weights shoud be add in Glm4vMoePreTrainedModel
* remove use_qk_norm
* Update modular_glm4v_moe.py
* remove use_qk_norm as it is not use
* fix style
* deprecations are not needed on new models
* fix merge issues
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* all modulars and llama
* apply modular
* bert and gpt2 copies
* fix imports
* do it everywhere
* fix import
* finalize it
* fix
* oups set it in modular
* style
* fix
* Add 1 version to deprecation cycle
* Update modeling_layers.py
* Fix missing video inputs for PerceptionLM.
* Minor fix for vanilla input image (only C,H,W, no tiles dim).
* Revert "Minor fix for vanilla input image (only C,H,W, no tiles dim)."
This reverts commit 181d87b964e59c4118035a9fd4f530c6e551ba9f.
* Add amd expectation in internvl
* Add amd expectation to llama
* Added bnb decorator for a llava test that requires bnb
* Added amd expectation for mistral3
* Style
* Support input_embeds in torch exportable decoders
* Hybrid cache update
* Manually change some callsites
* AI changes the rest of the call sites
* Make either input_ids/inputs_embeds mandatory
* Clean up
* Ruff check --fix
* Fix test
* pr review
* Revert config/generation_config changes
* Ruff check
* chore: update Deformable_Detr model card
* fix: added pipeline, automodel examples and checkpoints link
* Update deformable_detr.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Fix MXFP4 quantizer validation to enable CPU dequantization
Move dequantize check before CUDA availability check to allow
CPU inference when quantization_config.dequantize is True.
This enables users to run MXFP4 models on CPU by automatically
converting them to BF16 format.
* Add tests for MXFP4 quantizer CPU dequantization validation
* fix: format mxfp4 test file with ruff
* fix
* nice
* where i am at
* Bro this works
* Update src/transformers/integrations/tensor_parallel.py
* cleanups
* yups that was breaking
* Update src/transformers/models/openai_moe/modeling_openai_moe.py
* gather on experts and not mlp
* add changes for latest convert branch
* adds options to get output_router_logits from config
* bring chat temlate + special tokens back into the script.
* initial commmit
* update
* working with shards
* add model.safetensors.index.json
* fix
* fix
* mxfp4 flag
* rm print
* Fix PAD/EOS/BOS (#18)
* fix pad/eos/bos
* base model maybe one day
* add some doc
* special tokens based on harmony.
* add in tokenizer config as well.
* prepare for rebase with main
* Fix for initialize_tensor_parallelism now returning 4-tuple
```
[rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```
* mxfp4
* mxfp4 draft
* fix
* fix import
* draft
* draft impl
* finally working !
* simplify
* add import
* working version
* consider blocks and scales
* device mesh fix
* initial commit
* add working dequant + quant logic
* update
* non nan, gibberish output
* working EP + quantization finally !
* start cleaning
* remove reversing process
* style
* some cleaning
* initial commmit
* more cleaning
* more cleaning
* simplify
* more cleaning
* rm duplicated function
* changing tp_plan
* update tp plan check
* add loading attribute
* dequantizing logic
* use subfunctions
* import cleaning
* update_param_name
* adds clamped swiglu
* add clamping to training path
* simplify dequant logic
* update
* Bad merge
* more simplifications & tests
* fix !
* fix registering custom attention
* fix order
* fixes
* some test nits
* nits
* nit
* fix
* Clamp sink logits
* Clean
* Soft-max trick
* Clean up
* p
* fix deepspeed
* update both modeling and modular for cleanup
* contiguous
* update tests
* fix top_k router call
* revert renaming
* test nits
* small fixes for EP
* fix path for our local tests
* update as I should not have broken that!
* fix the loss of mixtral
* revert part of the changes related to router_scores, kernel probably no ready for that!
* deleting a small nit
* update arch
* fix post processing
* update
* running version but not expected output
* moving to cuda
* initial commit
* revert
* erroring when loading on cpu
* updates
* del blocks, scales
* fix
* style
* rm comm
* comment
* add comment
* style
* remove duplicated lines
* Fix minor issue with weight_map conversion script
* fix sampling params
* rename to final name
* upate pre-final version of template
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* fix batched inference
* serve fixes
* swizzle !
* update final chat template by Matt.
* fix responses; pin oai
* sinplify
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix
* Use ROCm kernels from HUB
* Make kernel modes explicit
* update final chat template by Matt. x2
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Fix installation
* Update setup.py
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
* allow no content
* fix: update message handling in write_tokenizer function
* Fix template logic for user message role
* last nits for CB and flash_paged!
* there was one bad merge
* fix CB (hardcode for now, its just using kv groups instead)
* fix
* better fix for device_map
* minor device fix
* Fix flash paged
* updates
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* fix merge
* fix
* Fix line break when custom model indentity
* nits testing
* to locals first and pass sliding window to flash paged
* register modes for MegaBlocksMoeMlp
* add integration test in fixtures -> now update the tests to use it!
* update integration tests
* initial fix
* style and update tests
* fix
* chore(gpt oss): remove mlp_bias from configuration
It was just a leftover.
* stats
* Integration tests
* whoops
* Shouldn't move model
* Ensure assistant messages without thinking always go to "final" channel
* More checks to ensure expected format
* Add pad_token_id to model configuration in write_model function (#51)
* Add oai fix fast tests (#59)
* Fix some fast tests
* Force some updates
* Remove unnecessary fixes
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* reasoning -> Reasoning
* Add additional integration tests
* fixup
* Slight fixes
* align chat template with harmony
* simplify
* Add comment
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* Revert fixup
* skip 2 test remove todo
* merge
* padding side should be left for integration tests
* fix modular wrt to changes made to modeling
* style
* isort
* fix opies for the loss
* mmmm
---------
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
* Revert "remove dtensors, not explicit (#39840)"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* style?
When users set `report_to="wandb"` but also have `WANDB_DISABLED=true` in their environment,
the previous error message was misleading: "WandbCallback requires wandb to be installed. Run pip install wandb."
This was confusing because wandb was actually installed, just disabled via the environment variable.
The fix detects this specific case and provides a clear, actionable error message explaining
the conflict and how to resolve it.
* Update model card for DETR
* fix: applied suggested changes
* fix: simplified pipeline and modified notes and resources
* Update detr.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* added code for handling video object ,as dictionary of frames and metadata, in chat template
* added new test where videos are passed as objects (dict of frames, metadata) in the chat template
* modified hardcoded video_len check that does not match with increased number of tests cases.
* Modify hardcoded video_len check that fails with increased number of tests
* update documentation of multi-modal chat templating with extra information about including video object in chat template.
* add array handling in load_video()
* temporary test video inlcuded
* skip testing smolvlm with videos that are list of frames
* update documentation & make fixup
* Address review comments
* fix: deprecate plot_keypoint_matching and make visualize_keypoint_matching for all Keypoint Matching models
* refactor: added copied from
* fix: make style
* fix: repo consistency
* fix: make style
* docs: added missing method in SuperGlue docs
* first commit
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.
TODO: Some tests are failing so that needs to be fixed.
* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model
* cleaned up a hack in the conversion script
* Fixed the expected values in integration tests
Cross att masking and cpu-gpu consistency tests are still failing however.
* changes for make style and quality
* add documentation
* clean up contrastive embedding
* add mm grounding dino to loss mapping
* add model link to config docstring
* hack fix for mm grounding dino consistency tests
* add special cases for unused config attr check
* add all models and update docs
* update model doc to the new style
* Use super_kwargs for modular config
* Move init to the _init_weights function
* Add copied from for tests
* fixup
* update typehints
* Fix-copies for tests
* fix-copies
* Fix init test
* fix snippets in docs
* fix consistency
* fix consistency
* update conversion script
* fix nits in readme and remove old comments from conversion script
* add license
* remove unused config args
* remove unnecessary if/else in model init
* fix quality
* Update references
* fix test
* fixup
---------
Co-authored-by: qubvel <qubvel@gmail.com>
* fix?
* fixme and style
* Update src/transformers/modeling_utils.py
* update
* update
* fix
* small fixees
* nit
* nits
* fix init check?
* fix
* fix default
* or fucks me
* nits
* include a small nit
* does this make it hapy?
* fixup
* fix the remaining ones
* Add cohere2_vision to support CohereLabs/command-a-vision-07-2025
* update and add modualr file
* update processors and check with orig impl later
* delete unused files
* image processor reduce LOC and re-use GotOCR2
* update the config to use modular
* model tests pass
* processor fixes
* check model outputs decorator
* address one more comment
* Update tokens. Temp - need to read from tokenizer'
* fix for multi-gpu
* Fix image token handling
* upadte image token expansion logic
* fix a few issues with remote code loading
* not related but modular forces us to change all files now
* Add overview and code sample to cohere vision docs
* add scripts. TMP.
* Update inference script
* Create script
* set dtype in export script
* TO revert: modular export fix
* Fix scripts
* Revert "TO revert: modular export fix"
This reverts commit bdb2f305b61027a05f0032ce70d6ca698879191c.
* Use modular weights
* Upload to hub
Removed OOD weights ad script
* Updated docs
* fix import error
Update docs
Added pipeline test
* Updated docs
* Run modular script
remove modular for config
Added patch_size
Added docstrings in modular
Fix OOM
Add docs, fixup integration tests. 8-gpu passing
* tiny updates
* address comments + fixup
* add test for chat template
* check model outputs workaround
* aya vision fix check model inputs
* Revert "add test for chat template"
This reverts commit 42c756e397f588d76b449ff1f93292d8ee0202d8.
* reveert more changes
* last revert
* skip and merge
* faulty copy from
---------
Co-authored-by: Julian Mack <julian.mack@cohere.com>
Co-authored-by: kyle-cohere <kyle@cohere.com>
* feat(tokenization): add encode_message to tokenize messages one by one
* Fix the `encode_message` method, remove the `add_generation_prompt` parameter and add the corresponding error handling. Update the document to reflect this change and verify the error handling in the test.
* Optimize the `encode_message` method, improve the processing logic of the empty dialogue history, and ensure that the chat template can be applied correctly when the dialogue history is empty. Update the document to reflect these changes.
* The `_encode_message` method is deleted, the message coding logic is simplified, and the functional integrity of the `encode_message` method is ensured. Update the document to reflect these changes.
* Docs fix
* Revert changes in docstring of pad()
* Revert changes in docstring
* Update src/transformers/tokenization_utils_base.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Repair the call of the `encode_message` method, update it to `encode_message_with_chat_template` to support the chat template, and adjust the relevant test cases to reflect this change.
* Optimize the call format of the `apply_chat_template` method, and merge multi-line calls into a single line to improve code readability.
---------
Co-authored-by: pco111 <15262555+pco111@user.noreply.gitee.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix: cache_position: RuntimeError: Boolean value of Tensor with more than one value is ambiguous
* test cache_position
* move test
* propagate changes
---------
Co-authored-by: Masataro Asai <guicho2.71828@gmail.com>
* Add callback to monitor progress in whisper transcription
* Added `` around variables, rewording
* Add example of `monitor_progress`.
---------
Co-authored-by: Eric B <ebezzam@gmail.com>
* docs: ko: main_classes/peft.md
* feat: nmt draft
* docs: add missing TOC to documentation for `PeftAdapterMixin` section
Added a table of contents (TOC) to the documentation, specifically for the `transformers.integrations.PeftAdapterMixin` section, following the structure and content outlined in [this link](https://huggingface.co/docs/transformers/main/en/main_classes/peft#transformers.integrations.PeftAdapterMixin).
* fix: Improve naturalness of purpose expression in Korean
Changed '관리하기 위한' to '관리할 수 있도록' for more natural Korean expression when describing the purpose of providing functions.
* fix: Simplify plural form and make expression more concise
Changed '~할 수 없기 때문에' to '~할 수 없어' for more concise expression while maintaining clarity.
* fix: Replace technical term '주입' with more natural '적용'
Changed '주입할 수 없어' to '적용할 수 없어' for better readability.
Considered alternatives:
'삽입': Too literal translation of 'inject'
'입력': Could be misunderstood as data input
'통합': Implies merging two systems
'추가': Simple but less precise
'적용' was chosen as it's the most natural and widely used term in Korean technical documentation for this context.
* fix: update toctree path for PEFT to lowercase
Changed the toctree path from 'PEFT' (uppercase) to 'peft' (lowercase) to match the correct directory naming convention and prevent broken links.
* docs: update as per reviewer feedback after rebase
* Add Fast Segformer Processor
* Modified the params according to segformer model
* modified test_image_processing_Segformer_fast args
- removed redundant params like do_center_crop,center_crop which aren't present in the original segformer class
* added segmentation_maps processing logic form the slow segformer processing module with references from beitimageprocessing fast
* fixed code_quality
* added recommended fixes and tests to make sure everything processess smoothly
* Fixed SegmentationMapsLogic
- modified the preprocessing of segmentation maps to use tensors
- added batch support
* fixed some mismatched files
* modified the tolerance for tests
* use modular
* fix ci
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* feat: superpoint fast image processor
* fix: reran fast cli command to generate fast config
* feat: updated test cases
* fix: removed old model add
* fix: format fix
* Update src/transformers/models/superpoint/image_processing_superpoint_fast.py
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* fix: ported to torch and made requested changes
* fix: removed changes to init
* fix: init fix
* fix: init format fix
* fixed testcases and ported to torch
* fix: format fixes
* failed
test case fix
* fix superpoint fast
* fix docstring
---------
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* Add missing cache_position argument.
* Pass cache_position to language model.
* Overwrite prepare_inputs_for_generation.
* Set model to half precision for Flash Attention test.
* Cast model to bfloat16.
* add tests for helpers
* duplicate test for each model
* why llava next video has no helper
* oops must have been in the commit
* fix test after rebase
* add copy from
* support `typing.Literal` as type of tool parameters
* validate the `args` of `typing.Literal` roughly
* add test to get json schema for `typing.Literal` type hint
* fix: add `"type"` attribute to the parsed result of `typing.Literal`
* test: add argument `booleanish` to test multi-type literal
* style: auto fixup
* EP + updates
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* remove unrelated change
* not working yet but let's see where it goes!
* update the api a bit
* udpate
* where I am at for now
* fix ep
* refactor the API
* yups
* fix
* fixup
* clean modeling
* just support llama4 for now!
* properly avoid
* fix
* nits
* Update src/transformers/models/llama4/modeling_llama4.py
* Update src/transformers/integrations/tensor_parallel.py
* style
* ,,,,
* update
---------
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com>
Co-authored-by: drbh <drbh@users.noreply.github.com>
* upload initial code
* update deepseek-vl adaptor
* update hierarchy of vision model classes
* udpate aligner model
* add text model
* Added Image Processor
* Added Image Processor
* Added Image Processor
* apply masks
* remove projection; add aligner
* remove interpolate_pos_encoding
* remove unused params in config
* cleaning
* Add the __init__ file
* added processing deepseek_vl class
* modified the deepseek-vl processor
* modified the deepseek-vl processor
* update __init__
* Update the image processor class name
* Added Deepseek to src/transformers/__init__.py file
* Added Deepseek to image_processing_auto.py
* update the __init__ file
* update deepseek_vl image processor
* Update Deepseek Processor
* upload fast image processor
* Revert "upload fast image processor"
This reverts commit 68c8fd50bafbb9770ac70c9de02448e2519219b4.
* update image processor
* flatten heirarchy
* remove DeepseekVLModel
* major update (complete modeling)
* auto modeling and other files
* formatting
* fix quality
* replace torchvision in modeling
* set default do_normalize to False
* add fast image processor template using tool
* update image processors
* add fast image processor to other files
* update liscense
* Added deepseek image testcases
* update image test
* update processor
* write CHAT_TEMPLATE
* update model for processor
* fix processor
* minor fixes and formatting
* fix image processing and tests
* fix interpolation in sam
* fix output_attentions in DeepseekVLModel
* upload test_modeling
* fix tests because of vocab size
* set use_high_res_vision=False in tests
* fix all modeling tests
* fix styling
* remove explicit background_color from image processors
* added test_processor
* added test_processor
* fix processor tests
* update docs
* update docs
* update docs
* update conversion script
* Fixed typos
* minor fixes from review
- remove model_id comments in examples
- remove from pre-trained auto mapping
- move to image-text-to-text from vision-to-seq in auto mapping
- add image_token_index to __init__ for config
- remove outdated temporary config in conversion script
- update example to use chat_template in docstring example
- update liscense 2021->2025
* fix type in config docstring
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* update get_image_features
* fix config
* improve DeepseekVLImageProcessor.preprocess
* return image_hidden_states
* use AutoTokenizer and AutoImageProcessor in Processor
* fix model outputs
* make num_image_tokens configurable
* fix docstring of processor
* move system prompt to chat template
* fix repo consistency
* fix return_dict
* replace SamVisionEncoder with SamVisionModel
* update to remove deepcopy
* 🛠️ Major Architectural Changes (Adds DeepseekVLHybrid)
* fix quality checks
* add missing hybrid in auto modeling
* run make style
* update sam_hq
* update high_res_size in test
* update docs following #36979
* update code with auto_docstring
* update conversion scripts
* fix style
* fix failing test because of tuple
* set weights_only=True in conversion script
* use safetensors.torch.load_file instead of torch.load in conversion script
* make output_dir optional in conversion script
* fix code snippets in docs (now the examples work fine)
* integration tests for DeepseekVL
* update expected texts
* make style
* integration tests for DeepseekVLHybrid
* fix class name
* update expected texts for hybrid
* run "make style"
* update since changes in main
* run make-style
* nits since changes in main
* undo changes in sam
* fix tests
* fix tests; update with main
* update with main: output_attention/output_hidden_states
* fix copied part in deepseek_vl
* run fix-copies
* fix output_hidden_states
* sam: fix _init_weigths
* use modular for DeepseekVL
* make image processor more modular
* modular: use JanusPreTrainedModel
* janus: provide kwargs in loss
* update processors in conversion script
* Revert "sam: fix _init_weigths"
This reverts commit db625d0c68956c0dad45edd7a469b6a074905c27.
* run fix-copies
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
* init
* Force qwen2VL image proc to fast
* refactor qwen2 vl fast
* fix copies
* Update after PR review and update tests to use return_tensors="pt"
* fix processor tests
* add BC for min pixels/max pixels
* fix most tests
* skip a few more tests
* address comments
* fix chameleon tests
* forgot to uncomment
* qwen has its own tests with images, rename it as well
* add owlv2 fast image processor
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* add Owlv2ImageProcessorFast to Owlv2Processor image_processor_class
* change references to owlVit to owlv2 in docstrings for post process methods
* change type hints from List, Dict, Tuple to list, dict, tuple
* remove unused typing imports
* add disable grouping argument to group images by shape
* run make quality and repo-consistency
* use modular
* fix auto_docstring
---------
Co-authored-by: Lewis Marshall <lewism@elderda.co.uk>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
* docs: Standardize OPT model card with enhanced details
* Remove incorrect link from OPT model card
* Address review feedback on OPT model card
* Update opt.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
- Fix Cyrillic 'Р' to Latin 'P' in Portuguese language link (README.md)
- Fix 'meanginful' to 'meaningful' in training documentation
- Fix duplicate 'Cohere' reference in modular transformers documentation
- Fix duplicate 'the the' in trainer and chat command comments
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
* First attempt
* fix
* fix
* Enhance TrackioCallback to log GPU memory usage and allocation
* Enhance Trackio integration in callbacks and training arguments documentation
* re order
* remove unused lines
* fix torch optional
* use partial to wrap around `transformers` utils!
* try to refactor?
* revert one wrong change
* just a nit
* push
* reverter watever was wrong!
* some nits
* fixes when there is no attention mask
* bring the licence back
* some fixes
* nit
* style
* remove prints
* correct dtype
* fa flags for testing
* update
* use paged attention if requested!
* updates
* a clone was needed, not sure why
* automatically create cu seq lens when input is flash, this at least makes sure layers don't re-compute
* simplify and improve?
* flash attention is kinda broken on recent cuda version so allow the opportunity to use something else
* fix!
* protect kernels import
* update
* properly parse generation config being passed
* revert and update
* add two tests
* some fixes
* fix test FA2
* takes comment into account
* fixup
* revert changes
* revert the clone, it is only needed because the metal kernel is not doing it?
* [docs] update attention implementation and cache docs (#39547)
* update docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* applu suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* fix mps on our side for now
* Update src/transformers/integrations/flash_paged.py
* no qa
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* feat: add support for gradient checkpointing in TimmWrapperModel and TimmWrapperForImageClassification
* ruff fix
* refactor + add test for not supported model
* ruff
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Update src/transformers/models/timm_wrapper/modeling_timm_wrapper.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
---------
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* initial commit
* Apply suggestions from code review
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix: various typos, typehints, refactors from suggestions
* fix: fine_matching method
* Added EfficientLoFTRModel and AutoModelForKeypointMatching class
* fix: got rid of compilation breaking instructions
* docs: added todo for plot
* fix: used correct hub repo
* docs: added comments
* fix: run modular
* doc: added PyTorch badge
* fix: model repo typo in config
* fix: make modular
* fix: removed mask values from outputs
* feat: added plot_keypoint_matching to EfficientLoFTRImageProcessor
* feat: added SuperGlueForKeypointMatching to AutoModelForKeypointMatching list
* fix: reformat
* refactor: renamed aggregation_sizes config parameter into q, kv aggregation kernel size and stride
* doc: added q, kv aggregation kernel size and stride doc to config
* refactor: converted efficientloftr implementation from modular to copied from mechanism
* tests: overwrote batching_equivalence for "keypoints" specific tests
* fix: changed EfficientLoFTRConfig import in test_modeling_rope_utils
* fix: make fix-copies
* fix: make style
* fix: update rope function to make meta tests pass
* fix: rename plot_keypoint_matching to visualize_output for clarity
* refactor: optimize image pair processing by removing redundant target size calculations
* feat: add EfficientLoFTRImageProcessor to image processor mapping
* refactor: removed logger and updated attention forward
* refactor: added auto_docstring and can_return_tuple decorators
* refactor: update type imports
* refactor: update type hints from List/Dict to list/dict for consistency
* refactor: update MODEL_MAPPING_NAMES and __all__ to include LightGlue and AutoModelForKeypointMatching
* fix: change type hint for size parameter in EfficientLoFTRImageProcessor to Optional[dict]
* fix typing
* fix some typing issues
* nit
* a few more typehint fixes
* Remove output_attentions and output_hidden_states from modeling code
* else -> elif to support efficientloftr
* nit
* tests: added EfficientLoFTR image processor tests
* refactor: reorder functions
* chore: update copyright year in EfficientLoFTR test file
* Use default rope
* Add docs
* Update visualization method
* fix doc order
* remove 2d rope test
* Update src/transformers/models/efficientloftr/modeling_efficientloftr.py
* fix docs
* Update src/transformers/models/efficientloftr/image_processing_efficientloftr.py
* update gradient
* refactor: removed unused codepath
* Add motivation to keep postprocessing in modeling code
* refactor: removed unnecessary variable declarations
* docs: use load_image from image_utils
* refactor: moved stage in and out channels computation to configuration
* refactor: set an intermediate_size parameter to be more explicit
* refactor: removed all mentions of attention masks as they are not used
* refactor: moved position_embeddings to be computed once in the model instead of every layer
* refactor: removed unnecessary hidden expansion parameter from config
* refactor: removed completely hidden expansions
* refactor: removed position embeddings slice function
* tests: fixed broken tests because of previous commit
* fix is_grayscale typehint
* not refactoring
* not renaming
* move h/w to embeddings class
* Precompute embeddings in init
* fix: replaced cuda device in convert script to accelerate device
* fix: replaced stevenbucaille repo to zju-community
* Remove accelerator.device from conversion script
* refactor: moved parameter computation in configuration instead of figuring it out when instantiating a Module
* fix: removed unused attributes in configuration
* fix: missing self
* fix: refactoring and tests
* fix: make style
---------
Co-authored-by: steven <steven.bucaille@buawei.com>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* improve handlike of other image-like inputs in fast image processors
* fix issues with _prepare_images_structure
* update sam image processor fast
* use dict update
* init
* copied from remote
* add proper structure and llama like structure
* fixup
* revert to state that works
* get closer to llama
* slow and steady
* some removal
* masks work
* it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm
* nice
* getting closer
* closer to transformers style
* let's simplify this, batching works now
* simplified
* working version with modular
* it is indeed the rotation per weights, make it complete llama style
* cleanup conversion, next to look at -> tokenizer
* remove llama artefacts
* fix modeling tests (common ones)
* style
* integration test + first look into tokenization (will need more work, focussing on modeling other models first)
* style
* working moe version, based on remote
* lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view)
* more cleanup
* refactor namings and remove addition forXXX classes
* our moe won't cut it it seems, correction bias seems to be missing in remote code version
* tokenization change (remote)
* our moe version works when adding normalization :D
* cleanup moe
* nits
* cleanup modeling -> let's get to modular next
* style
* modular v1
* minor things + attempt at conversion (which doesn't work)
* no conversion follow glm, fixup modular and other nits
* modular cleanup
* fixes
* tests, tests, tests + some moe dtype forcing
* simplify modular, fix fatal fa2 bug, remaining tests
* fix import issue?
* some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float
* fix sdpa test, load on init dtype only
* fixup post merge
* style
* fix doc links
* tokenization cleanup beginnings
* simplify tokenizer by a lot as its basically llama
* tokenizer is full llama with different defaults + extra special tokens
* sync og special tokens of ernie
* fix decoding with numbers (also in remote done what a timing), begin of tok tests
* align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama)
* nits
* docs
* my daily post merge it is
* check
* tokenization update with explanations and conversion script
* review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio)
* post merge fixes
* fixup tokenization, llama fast is the way to go
* more fixups
* check
* import fixes
* correction bias following the paddle code
* fix
* fix TP plan, fix correction bias sharding during forward
* style
* whoops
* fix tied weights
* docs and last nit
* license
* flasky tests
* move repo id, update when merged on the hub
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* revert again
* 🧠
* aaah ESM has two modelings aaah
* add informative but short comment
* add `input_embed_layer` mixin attribute
* style
* walrus has low precedence
* modular fix
* this was breaking parser
Enable average_tokens_across_devices by default in TrainingArguments
Fixes#39392
This change improves loss calculation correctness for multi-GPU training by enabling proper token averaging across devices by default.
Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* fix qwen2 vl packing in FA2
* why? delete!
* qwen2-5-vl seems to work now
* update
* fix tests
* start by adapting FA2 tests
* add similar tests for sdpa/eager
* address comments
* why is this even in conditional model and not base model?
* fix type order
* change all Union[str, dict] to Union[dict, str]
* add hf_parser test && fix test order
* add deepspeed dependency
* replace deepspeed with accelerator
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* Scaffolding
* Explicit content
* Naïve Responses API streaming implementation
* Cleanup
* use openai
* validate request, including detecting unused fields
* dict indexing
* dict var access
* tmp commit (tests failing)
* add slow
* use oai output type in completions
* (little rebase errors)
* working spec?
* guard type hint
* type hints. fix state (CB can now load different models)
* type hints; fn names; error type
* add docstrings
* responses + kv cache
* metadata support; fix kv cache; error event
* add output_index and content_index
* docstrings
* add test_build_response_event
* docs/comments
* gate test requirements; terminate cb manager on model switch
* nasty type hints
* more type hints
* disable validation by default; enable force models
* todo
* experiment: base model from typed dict
* audio working
* fix bad rebase
* load audio with librosa
* implement timed models
* almost working
* make fixup
* fix tests
* transcription request type
* tokenizer -> processor
* add example in docs
---------
Co-authored-by: Lysandre <hi@lysand.re>
* Add the `device` option for `generate()`
* Add device for default tensors to avoid tensor mismatch
* [test] Enable test_static_cache_exportability for torch_device
* infer device from the prompt_token_ids
* Add device for generated tensor
* [Test] Make `test_export_static_cache` tests to run on devices rather than only CPU
* fix format
* infer device from the model
* wip: adding first version of the IJEPA model card.
* refactor based on the @stevhliu feedbacks
* refactor:
- revert the accidental removal of the autodoc api description and the image reerece architecture
- general context updation.
* - changes of model for example quantization.
- merging the quantization content.
Fix indentation bug in Idefics3 image processor
- Fix KeyError when do_image_splitting=False
- Move split_images_grouped assignment inside loop
- Ensures all image shapes are stored, not just the last one
- This fixes the bug in both Idefics3 and generated SmolVLM processors
cc @yonigozlan
Co-authored-by: Krishnan Vignesh <krishnanvignesh@Krishnans-MacBook-Air.local>
* Fix typo in generation configuration for Janus model weight conversion
* Fix typo
* Update Janus model generation configuration
* Update Janus model to use generation_kwargs
* dump
* push other models
* fix simple greedy generation
* xmod
* add fmst and clean up some mentions of old cache format
* gpt-bigcode now follows standards
* delete tuple cache reference in generation
* fix some models
* fix some models
* fix mambas and support cache in tapas
* fix some more tests
* fix copies
* delete `_reorder_cache`
* another fix copies
* fix typos and delete unnecessary test
* fix rag generate, needs special cache reordering
* fix tapas and superglue
* reformer create special cache
* recurrent gemma `reorder_cache` was a no-op, delete
* fix-copies
* fix blio and musicgen pipeline tests
* fix reformer
* fix reformer, again...
* delete `_supports_cache_class`
* delete `supports_quantized_cache`
* fix failing tests
* fix copies
* some minor clean up
* style
* style
* fix copies
* fix tests
* fix copies
* create causal mask now needs positions?
* fixc copies
* style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* clean-up of non-generative model after merging main
* check `is_decoder` for cache
* delete transpose for scores
* remove tuple cache from docs everywhere
* fix tests
* fix copies
* fix copies once more
* properly deprecate `encoder_attention_mask` in Bert-like models
* import `deprecate_kwarg` where needed
* fix copies again
* fix copies
* delete `nex_decoder_cache`
* fix copies asks to update for PLM
* fix copies
* rebasing had a few new models, fix them and merge asap!
* fix copies once more
* fix slow tests
* fix tests and updare PLM checkpoint
* add read token and revert accidentally removed line
* oh com -on, style
* just skip it, read token has no access to PLM yet
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
* Added StableAdamW as an optimizer option for Trainer. Also wrote tests to verify its behaviour.
* Fixed issue with
* Added docs for StableAdamW. Also fixed a typo in schedule free optimizers
---------
Co-authored-by: Gautham Krithiwas <gauthamkrithiwas2003@gmail.com>
* add test scanner
* add doc + license
* refactor for only 1 tree traversal
* add back test of only one method
* document single method scan
* format
* fixup generate tests
* minor fix
* fixup
* fixup doc
* add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in trainer
* Update src/transformers/optimization.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update optimization.py
fix the error of the unclosed "("
* Update optimization.py
remove whitespace in line 402 in order to pass the quality test
* Update src/transformers/optimization.py
* Update src/transformers/optimization.py
* Apply style fixes
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
fix: 🐛 Fixed a bug in calculating Cross Entropy loss in JetMoeForCausalLM
In the original code, we shift the logits and pass shift_logits into the self.loss_function, but in self.loss_function, the shift_logits will be shifted again, so we are actually doing "next next token prediction", which is incorrect. I have removed the logits shifting before calling self.loss_function.
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix vlm with retrieval
* we can't use AutoModel because new ColQwen was released after refactor
* no need for colqwen
* tied weight keys are necessary, if using IMageTextToText
* need to apply renaming in tied weights, only for ColPali
* overwrite tied keys in ColPali
* fix copies, modular can't handle if-statements
* working locally; need to style and test
* added docs and initial tests; need to debug and flesh out
* fixed tests
* working long context; batches
* working fa2 and eager
* update tests
* add missing confnigs
* remove default autoset
* fix spacing
* fix most tests
* fixed tests
* fix to init
* refactor to match new transformers updates
* remove static cache option
* fa2 fix
* fix docs
* in progress
* working on tests
* fixed issue with attn outputs
* remove debug
* fix local config attr
* update doc string
* fix docstring
* add docs to toc
* correct typo in toc
* add new updates from main w.r.t. ModernBERT RoPE
* fix local param
---------
Co-authored-by: oweller2 <oweller2@dsailogin.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l07.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@n02.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l08.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l01.mgmt.ai.cluster>
Co-authored-by: oweller2 <oweller2@l02.mgmt.ai.cluster>
* Update modeling_qwen2_5_vl.py
### 🐛 Bug Description
When using Unsloth’s Qwen2.5-VL vision models (both 3B and 7B) with the latest HuggingFace Transformers (commit: 520b9dcb42cef21662c304583368ff6645116a45), the model crashes due to a type mismatch in the attention mask handling.
---
### 🔥 Error Traceback
* Fix dtype compatibility in attention mask processing
Replace hardcoded torch.finfo() usage with dtype-aware function selection to handle both integer and floating-point attention mask tensors.
Technical Details:
Problem: Line 1292 assumes floating-point dtype for attention_mask_tensor
Solution: Add dtype check to use torch.iinfo() for integer types and torch.finfo() for float types
Files Modified: transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Update modeling_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* # Fix: Use appropriate function based on dtype
* Update modular_qwen2_5_vl.py
* Fix: Cast to float before applying torch.finfo
* Fix: Use appropriate function based on dtype
* Fix: Use appropriate function based on dtype
* Updatet modeling_glm4v.py
* Only apply conversion for floating point tensors (inverted masks)
* corrected the format issue
reformatted modeling_glm4v.py
All done! ✨🍰✨
1 file reformatted
* Fix: Cast to float before applying torch.finfo
Corrected the format issue
* Fix torch.finfo() for integer attention mask
#39333
* Run make fix-copies and make style for CI compliance
- Updated dependency versions table
- Fixed code formatting and style issues
- Sorted auto mappings
- Updated documentation TOC
* Fix torch.finfo() TypeError for
Fix torch.finfo() TypeError for integer attention_mask_tensor #39333
* Fix torch.finfo() TypeError for integer
* Updated CamemBERT model card to new standardized format
* Applied review suggestions for CamemBERT: restored API refs, added examples, badges, and attribution
* Updated CamemBERT usage examples, quantization, badges, and format
* Updated CamemBERT badges
* Fixed CLI Section
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`
More verbose exceptions in `fix_docstring` on docstring formatting issues.
* plm template
* A working plm with fixed image features
* hacked processor
* First version that reproduced PLM output using PE from timm.
* Simplify and fix tie_word_embeddings
* Use PIL resize. Simplify converstion.
* First version that works with video input.
* simplifed image preprocessing (not batched)
* Minor fixes after rebasing on main.
* Video processor based on new API.
* Revert to use _preprocess for image processor.
* refactor with modular
* fix tie_word_embedding
* Testing with timm PE
* check in missed converstion from modular to model.py
* First working version of PLM with Eva PE. PLM-1B and 3B outputs are exactly the same as before. PLM-8B output has some differences.
* address review comments
* Fixed batching if video and image examples mixed.
* Simplify PE configuration.
* Enable AutoModel for PerceptionEncoder.
* Update PE config style.
* update all headers
* Minor fixes.
* Move lm_head to PerceptionLMForConditionalGeneration.
Fix vit_G model specification.
* Fix for testing_modeling_perception_lm.py
* Image processing refactoring to use more common parts.
* Fix processor test.
* update tests to use model from hub
* More test fixes.
* integration test GT update after rebasing; probably due to video preprocessing
* update test media path to hub
* Stop tracking local scripts
* address some review comments
* refactor image processing.
* small fixes
* update documentation and minor fixes
* remove scripts
* Minor fix for CI
* Fix image processing
* CI and doc fix
* CI formatting fix
* ruff fix
* ruff formatting
* ran utils/sort_auto_mappings.py
* update docstring
* more docstring udpates
* add vision_input_type default fallback for image processing
* more verbose variable naming
* test update
* Remove PE and PEConfig use AutoModel(TimmWrapper) instead
* Minor cleanup.
* Minor Fix: remove any ref to PE. Ruff format and check.
* fix docstring
* Fix modular/model consistency.Improvex docstringfor .
* Fix PerceptionLMForConditionalGenerationModelTest
* ruff fix
* fix for check_repo
* minor formatting
* dummy size arg to fix for processor test.
* Update docstring for PerceptionLMConfig
* Minor fixes from review feedback.
* Revert some minor changes per reviewer feedback.
* update base_model_prefix
* address reviewer feedback
* fix comment in modeling file
* address reviewer feedback
* ruff format
* Pre-merge test update.
* reapply modular and fix checkpoint name
* processor test path
* use modular a bit more
* remove dead code
* add token decorator
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Updated Switch Transformers model card with standardized format (Issue #36979)
* Apply reviewer suggestions to the new standardised Switch Transformer's model card
* Update switch_transformers.md
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* changes for video
* update modular
* change get_video_features
* update video token replacement
* update modular
* add test and fix typo
* lint
* fix order
* lint
* fix
* remove dependency
* lint
* lint
* remove todo
* resize video for test
* lint..
* fix test
* new a processor for video_test
* fix test
Also add notes asking users to set `TORCHDYNAMO_CAPTURE_SCALAR_OUTPUTS=1`
or call `torch._dynamo.config.capture_scalar_outputs = True`, as currently
this will cause a graph break.
Signed-off-by: Hollow Man <hollowman@opensuse.org>
* ensure the query is updated during training
avoid unused parameters that DDP does not like
* avoid a crash when `kwargs` contain `padding=True`
trainers often pass this argument automatically
* minor
* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)
* minor - most feature extractors has a `sampling_rate` property
* speedup relative position embeddings
* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.
* minor
* minor
* minor
* fixing a crash without peft active
* add todo to replace einsum
* granite speech speedups:
1. register attention_dist to avoid cpu-to-gpu transfer every layer.
2. pad_sequence is much faster than per-sample-padding + concat.
3. avoid returning audio back to cpu when using a compute device.
* support audio.shape=(1,L)
* add initial structure
* doc fixes, add model base logic
* update init files
* some fixes to config and modular
* some improvements for attention
* format
* remove unused attn
* some fixes for moe layer and for decoder
* adapt _compute_yarn_parameters for deepseek
* format
* small fix
* fix for decoder forward
* add tests, small refactoring
* fix dummies
* fix init
* fix doc
* fix config docs
* add sequce doc, fix init for gate
* fix issues in tests
* fix config doc
* remove unused args
* some fixes and refactoring after review
* fix doc for config
* small fixes for config args
* revert config refactoring
* small refactoring
* minor fixes after rebase
* small fix after merge
* fix modular
* remove rotaryembd from public init
* small test fix
* some rotary pos calculation improvement
* fix format
* some improvements and fixes
* fix config
* some refactoring
* adjust some unit tests
* skip test
* small fixes and tests adjustment
* reapply modular
* fix all tests except Integration
* fix integration testzs
* cleanup BC stuff
* rope
* fix integrations tests based on a10
* style
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-07-09 17:04:28 +02:00
3498 changed files with 234025 additions and 308063 deletions
# copilot-instructions.md Guide for Hugging Face Transformers
This copilot-instructions.md file provides guidance for code agents working with this codebase.
## Core Project Structure
-`/src/transformers`: This contains the core source code for the library
-`/models`: Code for individual models. Models inherit from base classes in the root `/src/transformers` directory.
-`/tests`: This contains the core test classes for the library. These are usually inherited rather than directly run.
-`/models`: Tests for individual models. Model tests inherit from common tests in the root `/tests` directory.
-`/docs`: This contains the documentation for the library, including guides, tutorials, and API references.
## Coding Conventions for Hugging Face Transformers
- PRs should be as brief as possible. Bugfix PRs in particular can often be only one or two lines long, and do not need large comments, docstrings or new functions in this case. Aim to minimize the size of the diff.
- When writing tests, they should be added to an existing file. The only exception is for PRs to add a new model, when a new test directory should be created for that model.
- Code style is enforced in the CI. You can install the style tools with `pip install -e .[quality]`. You can then run `make fixup` to apply style and consistency fixes to your code.
## Copying and inheritance
Many models in the codebase have similar code, but it is not shared by inheritance because we want each model file to be self-contained.
We use two mechanisms to keep this code in sync:
- "Copied from" syntax. Functions or entire classes can have a comment at the top like this: `# Copied from transformers.models.llama.modeling_llama.rotate_half` or `# Copied from transformers.models.t5.modeling_t5.T5LayerNorm with T5->MT5`
These comments are actively checked by the style tools, and copies will automatically be updated when the base code is updated. If you need to update a copied function, you should
either update the base function and use `make fixup` to propagate the change to all copies, or simply remove the `# Copied from` comment if that is inappropriate.
- "Modular" files. These files briefly define models by composing them using inheritance from other models. They are not meant to be used directly. Instead, the style tools
automatically generate a complete modeling file, like `modeling_bert.py`, from the modular file like `modular_bert.py`. If a model has a modular file, the modeling file
should never be edited directly! Instead, changes should be made in the modular file, and then you should run `make fixup` to update the modeling file automatically.
When adding new models, you should prefer `modular` style and inherit as many classes as possible from existing models.
## Testing
After making changes, you should usually run `make fixup` to ensure any copies and modular files are updated, and then test all affected models. This includes both
the model you made the changes in and any other models that were updated by `make fixup`. Tests can be run with `pytest tests/models/[name]/test_modeling_[name].py`
If your changes affect code in other classes like tokenizers or processors, you should run those tests instead, like `test_processing_[name].py` or `test_tokenization_[name].py`.
In order to run tests, you may need to install dependencies. You can do this with `pip install -e .[testing]`. You will probably also need to `pip install torch accelerate` if your environment does not already have them.
RUN_SLOW:yes# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access. # This token is created under the bot `hf-transformers-bot`.
name:Self-hosted runner scale set (AMD mi300 scheduled CI caller)
name:Self-hosted runner scale set (AMD mi325 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
name:Self-hosted runner scale set (AMD mi355 scheduled CI caller)
# Note: For every job in this workflow, the name of the runner scale set is finalized in the runner yaml i.e. huggingface/hf-workflows/.github/workflows/transformers_amd_ci_scheduled_arc_scale_set.yaml
# For example, 1gpu : amd-mi355-ci-1gpu
# 2gpu : amd-mi355-ci-2gpu
on:
workflow_run:
workflows:["Self-hosted runner (AMD scheduled CI caller)"]
@ -38,7 +38,6 @@ In particular all "Please explain" questions or objectively very user-specific f
* "How to train T5 on De->En translation?"
## The GitHub Issues
Everything which hints at a bug should be opened as an [issue](https://github.com/huggingface/transformers/issues).
@ -247,7 +246,6 @@ You are not required to read the following guidelines before opening an issue. H
Try not use italics and bold text too much as these often make the text more difficult to read.
12. If you are cross-referencing a specific comment in a given thread or another issue, always link to that specific comment, rather than using the issue link. If you do the latter it could be quite impossible to find which specific comment you're referring to.
To get the link to the specific comment do not copy the url from the location bar of your browser, but instead, click the `...` icon in the upper right corner of the comment and then select "Copy Link".
@ -257,7 +255,6 @@ You are not required to read the following guidelines before opening an issue. H
13. If you are replying to a last comment, it's totally fine to make your reply with just your comment in it. The readers can follow the information flow here.
But if you're replying to a comment that happened some comments back it's always a good practice to quote just the relevant lines you're replying it. The `>` is used for quoting, or you can always use the menu to do so. For example your editor box will look like:
and adjacent modeling libraries (llama.cpp, mlx, ...) which leverage the model definition from `transformers`.
@ -80,7 +80,7 @@ Explore the [Hub](https://huggingface.com/) today to find a model and use Transf
## Installation
Transformers works with Python 3.9+ [PyTorch](https://pytorch.org/get-started/locally/) 2.1+, [TensorFlow](https://www.tensorflow.org/install/pip) 2.6+, and [Flax](https://flax.readthedocs.io/en/latest/) 0.4.1+.
Transformers works with Python 3.9+, and [PyTorch](https://pytorch.org/get-started/locally/) 2.1+.
Create and activate a virtual environment with [venv](https://docs.python.org/3/library/venv.html) or [uv](https://docs.astral.sh/uv/), a fast Rust-based Python package and project manager.
@ -147,7 +147,7 @@ chat = [
{"role":"user","content":"Hey, can you tell me any fun things to do in New York?"}
- This library is not a modular toolbox of building blocks for neural nets. The code in the model files is not refactored with additional abstractions on purpose, so that researchers can quickly iterate on each of the models without diving into additional abstractions/files.
- The training API is optimized to work with PyTorch models provided by Transformers. For generic machine learning loops, you should use another library like [Accelerate](https://huggingface.co/docs/accelerate).
- The [example scripts]((https://github.com/huggingface/transformers/tree/main/examples)) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
- The [example scripts](https://github.com/huggingface/transformers/tree/main/examples) are only *examples*. They may not necessarily work out-of-the-box on your specific use case and you'll need to adapt the code for it to work.
## 100 projects using Transformers
@ -280,8 +279,8 @@ Expand each modality below to see a few example models for various use cases.
- Automatic mask generation with [SAM](https://huggingface.co/facebook/sam-vit-base)
- Depth estimation with [DepthPro](https://huggingface.co/apple/DepthPro-hf)
- Image classification with [DINO v2](https://huggingface.co/facebook/dinov2-base)
- Keypoint detection with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue)
- Keypoint detection with [SuperPoint](https://huggingface.co/magic-leap-community/superpoint)
- Keypoint matching with [SuperGlue](https://huggingface.co/magic-leap-community/superglue_outdoor)
- Object detection with [RT-DETRv2](https://huggingface.co/PekingU/rtdetr_v2_r50vd)
- Pose Estimation with [VitPose](https://huggingface.co/usyd-community/vitpose-base-simple)
- Universal segmentation with [OneFormer](https://huggingface.co/shi-labs/oneformer_ade20k_swin_large)
@ -14,7 +14,7 @@ Models uploaded on the Hugging Face Hub come in different formats. We heavily re
models in the [`safetensors`](https://github.com/huggingface/safetensors) format (which is the default prioritized
by the transformers library), as developed specifically to prevent arbitrary code execution on your system.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
To avoid loading models from unsafe formats(e.g. [pickle](https://docs.python.org/3/library/pickle.html), you should use the `use_safetensors` parameter. If doing so, in the event that no .safetensors file is present, transformers will error when loading the model.
@ -6,7 +6,7 @@ developers, researchers, students, professors, engineers, and anyone else to bui
In this list, we showcase incredibly impactful and novel projects that have pushed the field forward. We celebrate
100 of these projects as we reach the milestone of 100k stars as a community; but we're very open to pull requests
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
adding other projects to the list. If you believe a project should be here and it's not, then please, open a PR
to add it.
## [gpt4all](https://github.com/nomic-ai/gpt4all)
@ -49,7 +49,7 @@ Keywords: LLMs, Large Language Models, Agents, Chains
[LlamaIndex](https://github.com/run-llama/llama_index) is a project that provides a central interface to connect your LLM's with external data. It provides various kinds of indices and retrieval mechanisms to perform different LLM tasks and obtain knowledge-augmented results.
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
Keywords: LLMs, Large Language Models, Data Retrieval, Indices, Knowledge Augmentation
@ -257,7 +257,7 @@ Stable-Dreamfusion is a pytorch implementation of the text-to-3D model Dreamfusi
Keywords: Text-to-3D, Stable Diffusion
## [txtai](https://github.com/neuml/txtai)
[txtai](https://github.com/neuml/txtai) is an open-source platform for semantic search and workflows powered by language models. txtai builds embeddings databases, which are a union of vector indexes and relational databases enabling similarity search with SQL. Semantic workflows connect language models together into unified applications.
Keywords: Semantic search, LLM
@ -309,8 +309,8 @@ Keywords: OCR, LaTeX, Math formula
OpenCLIP is an open source implementation of OpenAI's CLIP.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
The goal of this repository is to enable training models with contrastive image-text supervision, and to investigate their properties such as robustness to distribution shift.
The starting point is an implementation of CLIP that matches the accuracy of the original CLIP models when trained on the same dataset.
Specifically, a ResNet-50 model trained with this codebase on OpenAI's 15 million image subset of YFCC achieves 32.7% top-1 accuracy on ImageNet.
@ -596,7 +596,7 @@ Keywords: Data-Centric AI, Data Quality, Noisy Labels, Outlier Detection, Active
## [BentoML](https://github.com/bentoml/BentoML)
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
[BentoML](https://github.com/bentoml) is the unified framework for building, shipping, and scaling production-ready AI applications incorporating traditional ML, pre-trained AI models, Generative and Large Language Models.
All Hugging Face models and pipelines can be seamlessly integrated into BentoML applications, enabling the running of models on the most suitable hardware and independent scaling based on usage.
Keywords: BentoML, Framework, Deployment, AI Applications
@ -606,4 +606,3 @@ Keywords: BentoML, Framework, Deployment, AI Applications
[LLaMA Factory](https://github.com/hiyouga/LLaMA-Factory) offers a user-friendly fine-tuning framework that incorporates PEFT. The repository includes training(fine-tuning) and inference examples for LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, and other LLMs. A ChatGLM version is also available in [ChatGLM-Efficient-Tuning](https://github.com/hiyouga/ChatGLM-Efficient-Tuning).
For uploading results, you need a HuggingFace token with write permissions to the target dataset. You can provide the token in several ways (in order of precedence):
1. Command line: `--token hf_your_token_here`
3. Environment variable: `HF_TOKEN`
### Running Specific Benchmarks
```bash
# Include only specific benchmarks
python run_benchmarks.py --include llama
# Exclude specific benchmarks
python run_benchmarks.py --exclude old_benchmark
## Output Format
Results are saved as JSON files with the following structure:
```json
{
"model_name": "llama_2_7b",
"benchmark_scenarios": [
{
"scenario_name": "eager_variant",
"metadata": {
"timestamp": "2025-01-XX...",
"commit_id": "abc123...",
"hardware_info": {
"gpu_name": "NVIDIA A100",
"gpu_memory_total": 40960,
"cpu_count": 64
},
"config": {
"variant": "eager",
"warmup_iterations": 3,
"measurement_iterations": 5
}
},
"measurements": {
"latency": {
"mean": 2.45,
"median": 2.43,
"std": 0.12,
"min": 2.31,
"max": 2.67,
"p95": 2.61,
"p99": 2.65
},
"time_to_first_token": {
"mean": 0.15,
"std": 0.02
},
"tokens_per_second": {
"mean": 87.3,
"unit": "tokens/sec"
}
},
"gpu_metrics": {
"gpu_utilization_mean": 85.2,
"gpu_memory_used_mean": 12450
}
}
]
}
```
### Debug Mode
```bash
python run_benchmarks.py --log-level DEBUG
```
## Contributing
To add new benchmarks:
1. Create a new file in `benches/`
2. Implement the `ModelBenchmark` interface
3. Add a runner function (`run_<benchmark_name>` or `run_benchmark`)
@ -20,22 +20,21 @@ To generate the documentation, you first have to build it. Several packages are
you can install them with the following command, at the root of the code repository:
```bash
pip install -e ".[docs]"
pip install -e ".[dev]"
```
> [!NOTE]
> This command might fail for some OS that are missing dependencies. Check step 4 in [Create a Pull Request](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request) to workaround it.
Then you need to install our special tool that builds the documentation:
The docs will be viewable at [http://localhost:3000](http://localhost:3000). You can also preview the docs once you have opened a PR. You will see a bot add a comment to a link where the documentation with your changes lives.
---
**NOTE**
The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml`& restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
---
> [!NOTE]
> The `preview` command only works with existing doc files. When you add a completely new file, you need to update `_toctree.yml` & restart `preview` command (`ctrl-c` to stop it & call `doc-builder preview ...` again).
## Adding a new element to the navigation bar
@ -164,6 +159,9 @@ These classes should be added using our Markdown syntax. Usually as follows:
[[autodoc]] XXXConfig
```
> [!IMPORTANT]
> Always add a blank line after `[[autodoc]]` to ensure it passes the CI/CD checks.
This will include every public method of the configuration that is documented. If for some reason you wish for a method
not to be displayed in the documentation, you can do so by specifying which methods should be in the docs:
تسمح لك فئات `AutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`AutoModelForSequenceClassification.from_pretrained`]:
```py
@ -143,25 +141,4 @@
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `AutoModelFor` لتحميل مثيلات مُدربة مسبقًا من النماذج. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، تعرف على كيفية استخدام المحلل اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
</pt>
<tf>
أخيرًا، تسمح لك فئات `TFAutoModelFor` بتحميل نموذج مُدرب مسبقًا لمهمة معينة (راجع [هنا](model_doc/auto) للحصول على قائمة كاملة بالمهام المتاحة). على سبيل المثال، قم بتحميل نموذج لتصنيف التسلسل باستخدام [`TFAutoModelForSequenceClassification.from_pretrained`]:
بشكل عام، نوصي باستخدام فئة `AutoTokenizer` وفئة `TFAutoModelFor` لتحميل نسخ لنماذج مُدربة مسبقًا. سيساعدك هذا في تحميل البنية الصحيحة في كل مرة. في البرنامج التعليمي التالي، ستتعرف على كيفية استخدام المُجزّئ اللغوي ومعالج الصور ومستخرج الميزات والمعالج الذي تم تحميله حديثًا لمعالجة مجموعة بيانات للضبط الدقيق.
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "torch_dtype" كما فعلنا أعلاه.
بشكل افتراضي، تقوم فئات Hugging Face مثل [`TextGenerationPipeline`] أو [`AutoModelForCausalLM`] بتحميل النموذج في دقة "float32". وهذا يعني أنه يحتاج إلى 4 بايتات (32 بت) لكل معلمة، لذا فإن نموذج "8B" بحجم 8 مليار معلمة سيحتاج إلى ~32 جيجابايت من الذاكرة. ومع ذلك، يمكن أن يكون هذا مضيعة للموارد! يتم تدريب معظم نماذج اللغة الحديثة في دقة "bfloat16"، والتي تستخدم فقط 2 بايت لكل معلمة. إذا كان عتادك يدعم ذلك (Nvidia 30xx/Axxx أو أحدث)، فيمكنك تحميل النموذج في دقة "bfloat16"، باستخدام معامل "dtype" كما فعلنا أعلاه.
ومن الممكن أيضًا النزول إلى أقل من 16 بت باستخدام "التكميم"، وهي طريقة لضغط أوزان النموذج بطريقة تفقد بعض المعلومات. يسمح هذا بضغط كل معلمة إلى 8 بتات أو 4 بتات أو حتى أقل. لاحظ أنه، خاصة في 4 بتات، قد تتأثر جودة ناتج النموذج سلبًا، ولكن غالبًا ما يكون هذا مقايضة تستحق القيام بها لتناسب نموذج محادثة أكبر وأكثر قدرة في الذاكرة. دعنا كيف يمكننا تطبيق ذلك باستخدام مكتبة `bitsandbytes`:
الخطوة التالية هي إنشاء [نموذج](main_classes/models). النموذج - ويُشار إليه أحيانًا باسم البنية - يُحدد وظيفة كل طبقة والعمليات الحسابية المُنفذة. تُستخدم خصائص مثل `num_hidden_layers` من التكوين لتحديد هذه البنية. تشترك جميع النماذج في فئة أساسية واحدة هي [`PreTrainedModel`] وبعض الوظائف المُشتركة مثل غيير حجم مُدخلات الكلمات وتقليص رؤوس آلية الانتباه الذاتي. بالإضافة إلى ذلك، فإن جميع النماذج هي فئات فرعية إما من [`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html)، [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) أو [`flax.linen.Module`](https://flax.readthedocs.io/en/latest/api_reference/flax.linen/module.html) . هذا يعني النماذج متوافقة مع كل استخدام لإطار العمل الخاص بها.
<frameworkcontent>
<pt>
قم بتحميل خصائص التكوين المخصصة الخاصة بك في النموذج:
هذا ينشئ نموذجًا بقيم عشوائية بدلاً من الأوزان المُدربة مسبقًا. لن يكون هذا النموذج مفيدًا حتى يتم تدريبه. تُعد عملية التدريب مكلفة وتستغرق وقتًا طويلاً. من الأفضل بشكل عام استخدام نموذج مُدرب مسبقًا للحصول على نتائج أفضل بشكل أسرع، مع استخدام جزء بسيط فقط من الموارد المطلوبة للتدريب.
قم بإنشاء نموذج مُدرب مسبقًا باستخدام [`~TFPreTrainedModel.from_pretrained`]:
عندما تقوم بتحميل الأوزان المُدربة مسبقًا،يتم تحميل إعدادات النموذج الافتراضي تلقائيًا إذا كان النموذج من مكتبة 🤗 Transformers. ومع ذلك، يمكنك أيضًا استبدال - بعض أو كل - إعدادات النموذج الافتراضية بإعداداتك الخاصة:
في هذه المرحلة، لديك نموذج DistilBERT الأساسي الذي يخرج *حالات الكامنة*. تُمرَّر هذه الحالات الكامنة كمدخلات لرأس النموذج لإنتاج المخرجات النهائية. توفر مكتبة 🤗 Transformers رأس نموذج مختلف لكل مهمة طالما أن النموذج يدعم المهمة (أي لا يمكنك استخدام DistilBERT لمهمة تسلسل إلى تسلسل مثل الترجمة).
<frameworkcontent>
<pt>
على سبيل المثال، [`DistilBertForSequenceClassification`] هو نموذج DistilBERT الأساس مزودًا برأس تصنيف تسلسلي. يُشكّل رأس التصنيف التسلسلي طبقة خطية فوق المخرجات المجمعة.
على سبيل المثال، [`TFDistilBertForSequenceClassification`] هو نموذج DistilBERT الأساسي برأس تصنيف تسلسل. رأس التصنيف التسلسلي هو طبقة خطية أعلى المخرجات المجمعة.
أعد استخدام هذا نقطة التحقق لمهمة أخرى عن طريق التبديل إلى رأس نموذج مختلف. لمهمة الإجابة على الأسئلة، ستستخدم رأس النموذج [`TFDistilBertForQuestionAnswering`]. رأس الإجابة على الأسئلة مشابه لرأس التصنيف التسلسلي باستثناء أنه طبقة خطية أعلى حالات الإخراج المخفية.
في هذا الدليل، سنستعرض التقنيات الفعالة لتُحسِّن من كفاءة نشر نماذج اللغة الكبيرة:
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization.md).
1. سنتناول تقنية "دقة أقل" التي أثبتت الأبحاث فعاليتها في تحقيق مزايا حسابية دون التأثير بشكل ملحوظ على أداء النموذج عن طريق العمل بدقة رقمية أقل [8 بت و4 بت](/main_classes/quantization).
2.**اFlash Attention:** إن Flash Attention وهي نسخة مُعدَّلة من خوارزمية الانتباه التي لا توفر فقط نهجًا أكثر كفاءة في استخدام الذاكرة، ولكنها تحقق أيضًا كفاءة متزايدة بسبب الاستخدام الأمثل لذاكرة GPU.
3.**الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)]((https://huggingface.co/papers/2305.13245)).
3.**الابتكارات المعمارية:** حيث تم اقتراح هياكل متخصصة تسمح باستدلال أكثر فعالية نظرًا لأن نماذج اللغة الكبيرة يتم نشرها دائمًا بنفس الطريقة أثناء عملية الاستدلال، أي توليد النص التنبؤي التلقائي مع سياق الإدخال الطويل، فقد تم اقتراح بنيات نموذج متخصصة تسمح بالاستدلال الأكثر كفاءة. أهم تقدم في بنيات النماذج هنا هو [عذر](https://huggingface.co/papers/2108.12409)، [الترميز الدوار](https://huggingface.co/papers/2104.09864)، [الاهتمام متعدد الاستعلامات (MQA)](https://huggingface.co/papers/1911.02150) و [مجموعة الانتباه بالاستعلام (GQA)](https://huggingface.co/papers/2305.13245).
على مدار هذا الدليل، سنقدم تحليلًا للتوليد التنبؤي التلقائي من منظور المُوتِّرات. نتعمق في مزايا وعيوب استخدام دقة أقل، ونقدم استكشافًا شاملاً لخوارزميات الانتباه الأحدث، ونناقش بنيات نماذج نماذج اللغة الكبيرة المحسنة. سندعم الشرح بأمثلة عملية تُبرِز كل تحسين على حدة.
@ -73,7 +73,7 @@ model = AutoModelForCausalLM.from_pretrained("bigscience/bloom", device_map="aut
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", torch_dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
model = AutoModelForCausalLM.from_pretrained("bigcode/octocoder", dtype=torch.bfloat16, device_map="auto", pad_token_id=0)
> يتم تدريب جميع النماذج تقريبًا بتنسيق bfloat16 في الوقت الحالي، ولا يوجد سبب لتشغيل النموذج بدقة float32 الكاملة إذا [كانت وحدة معالجة الرسومات (GPU) الخاصة بك تدعم bfloat16](https://discuss.pytorch.org/t/bfloat16-native-support/117155/5). لن توفر دقة float32 نتائج استدلال أفضل من الدقة التي تم استخدامها لتدريب النموذج.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"torch_dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., torch_dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
إذا لم تكن متأكدًا من تنسيق تخزين أوزان النموذج على Hub، فيمكنك دائمًا الاطلاع على تهيئة نقطة التفتيش في `"dtype"`، على سبيل المثال [هنا](https://huggingface.co/meta-llama/Llama-2-7b-hf/blob/6fdf2e60f86ff2481f2241aaee459f85b5b0bbb9/config.json#L21). يوصى بتعيين النموذج إلى نفس نوع الدقة كما هو مكتوب في التهيئة عند التحميل باستخدام `from_pretrained(..., dtype=...)` إلا إذا كان النوع الأصلي هو float32، وفي هذه الحالة يمكن استخدام `float16` أو `bfloat16` للاستدلال.
دعونا نحدد وظيفة `flush(...)` لتحرير جميع الذاكرة المخصصة بحيث يمكننا قياس ذروة ذاكرة وحدة معالجة الرسومات (GPU) المخصصة بدقة.
قبل مشاركة نموذج على Hub، ستحتاج إلى بيانات اعتماد حساب Hugging Face الخاصة بك. إذا كنت تستخدم منصة الأوامر، فقم بتشغيل الأمر التالي في بيئة افتراضية حيث تم تثبيت 🤗 Transformers. سيقوم هذا الأمر بتخزين رمز الدخول الخاص بك في مجلد تخزين المؤقت لـ Hugging Face (`~/.cache/` بشكل افتراضي):
```bash
huggingface-cli login
hf auth login
```
إذا كنت تستخدم دفتر ملاحظات مثل Jupyter أو Colaboratory، فتأكد من تثبيت مكتبة [`huggingface_hub`](https://huggingface.co/docs/hub/adding-a-library). تسمح لك هذه المكتبة بالتفاعل برمجيًا مع Hub.
@ -65,43 +65,15 @@ pip install huggingface_hub
تحويل نقطة التحقق لإطار عمل آخر أمر سهل. تأكد من تثبيت PyTorch و TensorFlow (راجع [هنا](installation) لتعليمات التثبيت)، ثم ابحث عن النموذج الملائم لمهمتك في الإطار الآخر.
<frameworkcontent>
<pt>
حدد `from_tf=True` لتحويل نقطة تحقق من TensorFlow إلى PyTorch:
مشاركة نموذجك على Hub مر بسيط للغاية كل ما عليك هو إضافة معلمة أو استدعاء رد إضافي. كما تذكر من درس [التدريب الدقيق](training)، فإن فئة [`TrainingArguments`] هي المكان الذي تحدد فيه المعلمات الفائقة وخيارات التدريب الإضافية. تشمل إحدى خيارات التدريب هذه القدرة على دفع النموذج مباشرة إلى المنصة Hub. قم بتعيين `push_to_hub=True` في [`TrainingArguments`]:
@ -127,29 +99,6 @@ pip install huggingface_hub
```py
>>>trainer.push_to_hub()
```
</pt>
<tf>
شارك نموذجًا على Hub باستخدام [`PushToHubCallback`]. في دالة [`PushToHubCallback`], أضف:
- دليل إخراج لنموذجك.
- مُجزّئ اللغوي.
-`hub_model_id`، والذي هو اسم مستخدم Hub واسم النموذج الخاص بك.
* انقر فوق الزر **Edit model card** في مستودع نموذجك.
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).
الق نظرة على بطاقة [DistilBert](https://huggingface.co/distilbert/distilbert-base-uncased) للحصول على مثال جيد على نوع المعلومات التي يجب أن تتضمنها بطاقة النموذج. للحصول على مزيد من التفاصيل حول الخيارات الأخرى التي يمكنك التحكم فيها في ملف `README.md` مثل البصمة الكربونية للنموذج أو أمثلة الأداة، راجع الوثائق [هنا](https://huggingface.co/docs/hub/models-cards).
| [كيفية ضبط نموذج بدقة على التلخيص](https://github.com/huggingface/notebooks/blob/main/examples/summarization.ipynb)| يوضح كيفية معالجة البيانات مسبقًا وضبط نموذج مُدرَّب مسبقًا بدقة على XSUM. | [](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/notebooks/blob/main/examples/summarization.ipynb)|
| [كيفية تدريب نموذج لغة من البداية](https://github.com/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| تسليط الضوء على جميع الخطوات لتدريب نموذج Transformer بشكل فعال على بيانات مخصصة | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/01_how_to_train.ipynb)|
| [كيفية إنشاء نص](https://github.com/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| كيفية استخدام أساليب فك التشفير المختلفة لإنشاء اللغة باستخدام المحولات | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/02_how_to_generate.ipynb)|
| [كيفية إنشاء نص (مع قيود)](https://github.com/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| كيفية توجيه إنشاء اللغة باستخدام القيود التي يوفرها المستخدم | [](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)| [](https://studiolab.sagemaker.aws/import/github/huggingface/blog/blob/main/notebooks/53_constrained_beam_search.ipynb)|
| [Reformer](https://github.com/huggingface/blog/blob/main/notebooks/03_reformer.ipynb)| كيف يدفع Reformer حدود النمذجة اللغوية | [](https://colab.research.google.com/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)| [](https://studiolab.sagemaker.aws/import/github/patrickvonplaten/blog/blob/main/notebooks/03_reformer.ipynb)|
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `torch_dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
إذا كان النموذج كبيرًا جدًا بالنسبة لوحدة معالجة الرسومات (GPU) واحدة، وأنت تستخدم PyTorch، فيمكنك تعيين `dtype='float16'` لتمكين الاستدلال بدقة FP16. عادةً ما لا يتسبب ذلك في حدوث انخفاضات كبيرة في الأداء، ولكن تأكد من تقييمه على نماذجك!
بدلاً من ذلك، يمكنك تعيين `device_map="auto"` لتحديد كيفية تحميل مخزنات النموذج وتخزينها تلقائيًا. يتطلب استخدام معامل `device_map` مكتبه 🤗 [Accelerate](https://huggingface.co/docs/accelerate):
استخدم [`AutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `AutoClass` في القسم التالي):
```py
@ -132,18 +120,6 @@ label: NEGATIVE, with score: 0.5309
استخدم [`TFAutoModelForSequenceClassification`] و [`AutoTokenizer`] لتحميل النموذج المُدرب مسبقًا ومعالجته المرتبط به (مزيد من المعلومات حول `TFAutoClass` في القسم التالي):
حدد النموذج والمعالج في [`pipeline`]. الآن يمكنك تطبيق `classifier` على النص الفرنسي:
@ -192,8 +168,6 @@ label: NEGATIVE, with score: 0.5309
يمكن المجزئ أيضًا قبول قائمة من المدخلات، ويقوم بـ "حشو" و"تقصير" النص لإرجاع كدفعة بطول موحد:
<frameworkcontent>
<pt>
```py
>>>pt_batch=tokenizer(
@ -204,20 +178,6 @@ label: NEGATIVE, with score: 0.5309
...return_tensors="pt",
...)
```
</pt>
<tf>
```py
>>>tf_batch=tokenizer(
...["We are very happy to show you the 🤗 Transformers library.","We hope you don't hate it."],
...padding=True,
...truncation=True,
...max_length=512,
...return_tensors="tf",
...)
```
</tf>
</frameworkcontent>
<Tip>
@ -227,8 +187,6 @@ label: NEGATIVE, with score: 0.5309
### AutoModel
<frameworkcontent>
<pt>
تقدم مكتبة 🤗 Transformers طريقة بسيطة وموحدة لتحميل نماذج مدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`AutoModel`] كما لو كنت تقوم بتحميل [`AutoTokenizer`]. الفرق الوحيد هو اختيار فئة [`AutoModel`] المناسبة للمهمة. بالنسبة لتصنيف النص (أو التسلسل)، يجب عليك تحميل [`AutoModelForSequenceClassification`]:
```py
@ -264,39 +222,6 @@ label: NEGATIVE, with score: 0.5309
يوفر 🤗 Transformers طريقة بسيطة وموحدة لتحميل مثيلات مُدربة مسبقًا. وهذا يعني أنه يمكنك تحميل [`TFAutoModel`] مثل تحميل [`AutoTokenizer`]. والفرق الوحيد هو تحديد [`TFAutoModel`] الصحيح للمهمة. للتصنيف النصي (أو التسلسلي)، يجب تحميل [`TFAutoModelForSequenceClassification`]:
من الميزات الرائعة في 🤗 Transformers القدرة على حفظ نموذج وإعادة تحميله كنموذج PyTorch أو TensorFlow. يمكن أن يحول معامل `from_pt` أو `from_tf` النموذج من إطار عمل إلى آخر:
- يقوم النص البرمجي التوضيحي بتنزيل مجموعة بيانات ومعالجتها مسبقًا من مكتبة 🤗 [Datasets](https://huggingface.co/docs/datasets/).
- ثم يقوم النص البرمجي بضبط نموذج بيانات دقيق باستخدام Keras على بنية تدعم الملخص.
- يوضح المثال التالي كيفية ضبط نموذج [T5-small](https://huggingface.co/google-t5/t5-small) على مجموعة بيانات [CNN/DailyMail](https://huggingface.co/datasets/cnn_dailymail).
- يتطلب نموذج T5 ماعمل `source_prefix` إضافية بسبب الطريقة التي تم تدريبه بها. يتيح هذا المطالبة لـ T5 معرفة أن هذه مهمة التلخيص.
## تشغيل نص برمجي على وحدة معالجة الدقة الفائقة (TPU)
<frameworkcontent>
<pt>
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. يدعم PyTorch وحدات معالجة الدقة الفائقة (TPUs) مع [XLA](https://www.tensorflow.org/xla) مجمع الدقة الفائقة للتعلم العميق (راجع [هنا](https://github.com/pytorch/xla/blob/master/README.md) لمزيد من التفاصيل). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتشغيل نص `xla_spawn.py` البرمجي واستخدم معامل `num_cores` لتعيين عدد وحدات معالجة الدقة الفائقة (TPU) التي تريد استخدامها.
تُعد وحدات معالجة الدقة الفائقة (TPUs) مصممة خصيصًا لتسريع الأداء. تستخدم نصوص TensorFlow البرمجية استراتيجية [`TPUStrategy`](https://www.tensorflow.org/guide/distributed_training#tpustrategy) للتدريب على وحدات معالجة الدقة الفائقة (TPUs). لاستخدام وحدة معالجة الدقة الفائقة (TPU)، قم بتمرير اسم مورد وحدة معالجة الدقة الفائقة (TPU) إلى حجة `tpu`.
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأفضل أن تقوم بـ *الحشو الديناميكي* للجمل إلى الطول الأطول في الدفعة أثناء التجميع، بدلاً من حشو كامل المجموعة من البيانات إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز للحشو، وحدد `mlm_probability` لحجب الرموز بشكل عشوائي عند كل تكرار للبيانات:
حول مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...lm_dataset["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_test_set=model.prepare_tf_dataset(
...lm_dataset["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة الافتراضية، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# لا يوجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومجمّع البيانات في [`~transformers.PushToHubCallback`]:
أخيراً، أنت جاهز لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العصور، والتعليقات الخاصة بك لتدريب النموذج:
بمجرد اكتمال التدريب، يتم تحميل نموذجك تلقائيًا إلى Hub حتى يتمكن الجميع من استخدامه!
</tf>
</frameworkcontent>
<Tip>
@ -365,8 +280,6 @@ Perplexity: 49.61
[{'generated_text':"Somatic hypermutation allows the immune system to be able to effectively reverse the damage caused by an infection.\n\n\nThe damage caused by an infection is caused by the immune system's ability to perform its own self-correcting tasks."}]
["Somatic hypermutation allows the immune system to react to drugs with the ability to adapt to a different environmental situation. In other words, a system of 'hypermutation' can help the immune system to adapt to a different environmental situation or in some cases even a single life. In contrast, researchers at the University of Massachusetts-Boston have found that 'hypermutation' is much stronger in mice than in humans but can be found in humans, and that it's not completely unknown to the immune system. A study on how the immune system"]
```
</pt>
<tf>
قم بتقسيم النص وإرجاع `input_ids` كـ TensorFlow tensors:
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الملخص. للمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والبارامترات للتحكم في التوليد، راجع صفحة [استراتيجيات توليد النص](../generation_strategies).
['Somatic hypermutation allows the immune system to detect the presence of other viruses as they become more prevalent. Therefore, researchers have identified a high proportion of human viruses. The proportion of virus-associated viruses in our study increases with age. Therefore, we propose a simple algorithm to detect the presence of these new viruses in our samples as a sign of improved immunity. A first study based on this algorithm, which will be published in Science on Friday, aims to show that this finding could translate into the development of a better vaccine that is more effective for']
الآن، قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForLanguageModeling`]. من الأكثر كفاءة أن تقوم بـ *الحشو الديناميكي* ليصل طولها إلى أطول جملة في الدفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الطول الأقصى.
<frameworkcontent>
<pt>
استخدم رمز نهاية التسلسل كرمز الحشو وحدد `mlm_probability` لحجب الرموز عشوائياً كل مرة تكرر فيها البيانات:
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...lm_dataset["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_test_set=model.prepare_tf_dataset(
...lm_dataset["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers لديها جميعها دالة خسارة افتراضية ذات صلة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة ما لم تكن تريد ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# لا توجد حجة للخسارة!
```
يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالج الرموز في [`~transformers.PushToHubCallback`]:
أخيراً، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد العصور، والتعليقات الخاصة بك لتعديل النموذج:
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة مناسبة للمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>>model.compile(optimizer=optimizer)# لا توجد وسيطة خسارة!
```
الخطوتان الأخيرتان قبل بدء التدريب هما: حساب دقة التنبؤات، وتوفير طريقة لرفع النموذج إلى Hub. ويمكن تحقيق ذلك باستخدام [استدعاءات Keras](../main_classes/keras_callbacks)
مرر دالتك `compute_metrics` إلى [`~transformers.KerasMetricCallback`]:
أخيرًا، أنت جاهز لبدء تدريب نموذجك! استدعِ[`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب والاستدعاءات لضبط النموذج:
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DefaultDataCollator`]. بخلاف مجمّعات البيانات الأخرى في 🤗 Transformers، لا يطبق [`DefaultDataCollator`] أي معالجة مسبقة إضافية مثل الحشو.
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...tokenized_squad["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_validation_set=model.prepare_tf_dataset(
...tokenized_squad["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method):
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)
```
آخر شيء يجب إعداده قبل بدء التدريب هو توفير طريقة لدفع نموذجك إلى Hub. يمكن القيام بذلك عن طريق تحديد مكان دفع نموذجك ومعالجك المعجمي في [`~transformers.PushToHubCallback`]:
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة، وعدد العهود، ومعاودة الاتصال الخاصة بك لضبط النموذج:
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`]. الأكثر كفاءة هو استخدام الحشو الديناميكي لجعل الجمل متساوية في الطول داخل كل دفعة، بدلًا من حشو كامل البيانات إلى الحد الأقصى للطول.
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...tokenized_imdb["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_validation_set=model.prepare_tf_dataset(
...tokenized_imdb["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتهيئة النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة ما لم ترغب في ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب الدقة من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
قم بتمرير دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
أخيرًا، أنت مستعد لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق، وعدد الحقبات، واستدعاءاتك لضبط النموذج:
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء عملية التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...tokenized_billsum["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_test_set=model.prepare_tf_dataset(
...tokenized_billsum["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers لديها دالة خسارة ذات صلة بالمهمة افتراضيًا، لذلك لست بحاجة إلى تحديد واحدة ما لم تكن ترغب في ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب درجة ROUGE من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it'sthemostaggressiveactionontacklingtheclimatecrisisinamericanhistory.itwillasktheultra-wealthyandcorporationstopaytheirfairshare.'
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء التلخيص. لمزيد من التفاصيل حول استراتيجيات توليد النص المختلفة والمعلمات للتحكم في التوليد، راجع واجهة برمجة تطبيقات [توليد النص](../main_classes/text_generation).
'the inflation reduction act lowers prescription drug costs, health care costs, and energy costs. it'sthemostaggressiveactionontacklingtheclimatecrisisinamericanhistory.itwillasktheultra-wealthyandcorporationstopaytheirfairshare.'
الآن قم بإنشاء دفعة من الأمثلة باستخدام [`DataCollatorWithPadding`].من الأفضل استخدام *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بالكامل إلى الطول الأقصى.
قم بتحويل مجموعات بياناتك إلى تنسيق `tf.data.Dataset` مع [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...tokenized_wnut["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_validation_set=model.prepare_tf_dataset(
...tokenized_wnut["validation"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
هيّئ النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن نماذج Transformers تتضمن دالة خسارة افتراضية مرتبطة بالمهمة، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# No loss argument!
```
آخر أمرين يجب إعدادهما قبل بدء التدريب هو حساب درجات seqeval من التنبؤات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم ذلك باستخدام [Keras callbacks](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
أخيرًا، أنت جاهز الآن لبدء تدريب نموذجك! قم باستدعاء [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع بيانات التدريب والتحقق، وعدد الحقبات، وcallbacks لتعديل النموذج:
الآن أنشئ دفعة من الأمثلة باستخدام [`DataCollatorForSeq2Seq`]. من الأكثر كفاءة *الحشو الديناميكي* للجمل إلى أطول طول في دفعة أثناء التجميع، بدلاً من حشو مجموعة البيانات بأكملها إلى الحد الأقصى للطول.
حوّل مجموعات البيانات الخاصة بك إلى تنسيق `tf.data.Dataset` باستخدام [`~transformers.TFPreTrainedModel.prepare_tf_dataset`]:
```py
>>>tf_train_set=model.prepare_tf_dataset(
...tokenized_books["train"],
...shuffle=True,
...batch_size=16,
...collate_fn=data_collator,
...)
>>>tf_test_set=model.prepare_tf_dataset(
...tokenized_books["test"],
...shuffle=False,
...batch_size=16,
...collate_fn=data_collator,
...)
```
قم بتكوين النموذج للتدريب باستخدام [`compile`](https://keras.io/api/models/model_training_apis/#compile-method). لاحظ أن جميع نماذج Transformers تحتوي على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذلك لا تحتاج إلى تحديد واحدة إلا إذا كنت ترغب في ذلك:
```py
>>>importtensorflowastf
>>>model.compile(optimizer=optimizer)# No loss argument!
```
آخر شيئين يجب إعدادهما قبل بدء التدريب هما حساب مقياس SacreBLEU من التوقعات، وتوفير طريقة لدفع نموذجك إلى Hub. يتم كلاهما باستخدام [استدعاءات Keras](../main_classes/keras_callbacks).
مرر دالة `compute_metrics` الخاصة بك إلى [`~transformers.KerasMetricCallback`]:
أخيرًا، أنت جاهز لبدء تدريب نموذجك! اتصل بـ [`fit`](https://keras.io/api/models/model_training_apis/#fit-method) مع مجموعات بيانات التدريب والتحقق من الصحة وعدد الحقب واستدعاءاتك لضبط النموذج:
استخدم طريقة [`~transformers.generation_tf_utils.TFGenerationMixin.generate`] لإنشاء الترجمة. لمزيد من التفاصيل حول استراتيجيات توليد النصوص المختلفة والمعلمات للتحكم في التوليد، تحقق من واجهة برمجة تطبيقات [توليد النصوص](../main_classes/text_generation).
[TensorFlow Lite](https://www.tensorflow.org/lite/guide) هو إطار عمل خفيف الوزن لنشر نماذج التعلم الآلي على الأجهزة المحدودة الموارد، مثل الهواتف المحمولة، والأنظمة المدمجة، وأجهزة إنترنت الأشياء (IoT). تم تصميم TFLite لتشغيل النماذج وتحسينها بكفاءة على هذه الأجهزة ذات الطاقة الحاسوبية والذاكرة واستهلاك الطاقة المحدودة.
يُمثَّل نموذج TensorFlow Lite بتنسيق محمول فعال خاص يُعرَّف بامتداد الملف `.tflite`.
🤗 Optimum يقدم وظيفة لتصدير نماذج 🤗 Transformers إلى TFLite من خلال الوحدة النمطية `exporters.tflite`. بالنسبة لقائمة هندسات النماذج المدعومة، يرجى الرجوع إلى [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/exporters/tflite/overview).
لتصدير نموذج إلى TFLite، قم بتثبيت متطلبات البرنامج المطلوبة:
```bash
pip install optimum[exporters-tf]
```
للاطلاع على جميع المغامﻻت المتاحة، راجع [وثائق 🤗 Optimum](https://huggingface.co/docs/optimum/main/en/exporters/tflite/usage_guides/export_a_model)، أو عرض المساعدة في سطر الأوامر:
```bash
optimum-cli export tflite --help
```
لتصدير نسخة النموذج ل 🤗 Hub، على سبيل المثال، `google-bert/bert-base-uncased`، قم بتشغيل الأمر التالي:
ستظهر لك السجلات التي تُبيّن التقدم وموقع حفظ ملف `model.tflite` الناتج، كما في المثال التالي:
```bash
Validating TFLite model...
-[✓] TFLite model output names match reference model (logits)
- Validating TFLite Model output "logits":
-[✓](1, 128, 30522) matches (1, 128, 30522)
-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
- logits: max diff= 5.817413330078125e-05.
The exported model was saved at: bert_tflite
```
يُبيّن المثال أعلاه كيفية تصدير نسخة من النموذج ل 🤗 Hub. عند تصدير نموذج محلي، تأكد أولاً من حفظ ملفات أوزان النموذج المجزء اللغوى في نفس المسار (`local_path`). عند استخدام CLI، قم بتمرير `local_path` إلى معامل `model` بدلاً من اسم النسخة على 🤗 Hub.
# Tokenizer returns a BatchEncoding, but we convert that to a dict for Keras
tokenized_data=dict(tokenized_data)
labels=np.array(dataset["label"])# Label is already an array of 0 and 1
```
أخيرًا، قم بتحميل وتجميع وتناسب النموذج. لاحظ أن نماذج Transformers تحتوي جميعها على دالة خسارة ذات صلة بالمهمة بشكل افتراضي، لذا فأنت لست بحاجة إلى تحديد واحدة ما لم ترغب في ذلك:
# معدلات التعلم المنخفضة أفضل غالبًا لضبط النماذج الدقيقة
model.compile(optimizer=Adam(3e-5))# لا توجد دالة خسارة!
model.fit(tokenized_data,labels)
```
<Tip>
أنت لست مضطرًا لتمرير دالة خسارة إلى نماذجك عند تجميعها! تختار نماذج Hugging Face تلقائيًا
دالة خسارة مناسبة لمهمتها وهندسة نموذجها إذا تُركت هذه الحجة فارغة. يمكنك دائمًا
تجاوز ذلك عن طريق تحديد دالة خسارة بنفسك إذا كنت تريد ذلك!
</Tip>
يعمل هذا النهج بشكل رائع لمجموعات البيانات الصغيرة، ولكن بالنسبة لمجموعات البيانات الأكبر، فقد تجد أنه يصبح مشكلة. لماذا؟
لأن المصفوفة المرمزة والتصنيفات يجب أن يتم تحميلها بالكامل في الذاكرة، ولأن NumPy لا يتعامل مع
المصفوفات"غير المنتظمة"، لذا حشو كل عينة إلى طول أطول عينة في مجموعة البيانات بأكملها. سيؤدي ذلك إلى زيادة حجم المصفوفة لديك، وستبطئ الرموز الزائده من عملية التدريب أيضًا!
### تحميل البيانات كـ tf.data.Dataset
إذا كنت تريد تجنب إبطاء التدريب، فيمكنك تحميل بياناتك كـ `tf.data.Dataset` بدلاً من ذلك. على الرغم من أنه يمكنك كتابة خط أنابيب `tf.data` الخاص بك إذا كنت تريد، إلا أن لدينا طريقتين مختصرتين للقيام بذلك:
- [`~TFPreTrainedModel.prepare_tf_dataset`]: هذه هي الطريقة التي نوصي بها في معظم الحالات. نظرًا لأنه طريقة
واستبعاد الأعمدة الأخرى لإنشاء مجموعة بيانات أبسط وأكثر كفاءة.
- [`~datasets.Dataset.to_tf_dataset`]: هذه الطريقة أكثر أساسية، وهي مفيدة عندما تريد التحكم بدقة في كيفية
إنشاء مجموعة البيانات الخاصة بك، عن طريق تحديد أعمدة `columns` و `label_cols` المحددة التي سيتم تضمينها.
قبل أن تتمكن من استخدام [`~TFPreTrainedModel.prepare_tf_dataset`]، ستحتاج إلى إضافة مخرجات المُجزئ إلى مجموعة البيانات الخاصة بك كأعمدة، كما هو موضح في
عينة التعليمات البرمجية التالية:
```py
deftokenize_dataset(data):
# ستتم إضافة مفاتيح القاموس الذي تمت إعادته كأعمدة إلى مجموعة البيانات
returntokenizer(data["text"])
dataset=dataset.map(tokenize_dataset)
```
تذكر أن مجموعات بيانات Hugging Face يتم تخزينها على القرص بشكل افتراضي، لذا فلن يؤدي ذلك إلى تضخيم استخدام الذاكرة لديك! بمجرد إضافة الأعمدة، يمكنك بث الدفعات من مجموعة البيانات وإضافة الترميز إلى كل دفعة، مما يقلل بشكل كبير من عدد رموز الترقيم مقارنة بترميز مجموعة البيانات بأكملها.
لاحظ أنه في عينة التعليمات البرمجية أعلاه، تحتاج إلى تمرير المُجزئ اللغوي إلى `prepare_tf_dataset` حتى تتمكن من حشو الدُفعات بشكل صحيح أثناء تحميلها.
إذا كانت جميع العينات في مجموعة البيانات الخاصة بك بنفس الطول ولم يكن الترميز ضروريًا، فيمكنك تخطي هذا المعامل.
إذا كنت بحاجة إلى القيام بشيء أكثر تعقيدًا من مجرد ترميز العينات (على سبيل المثال، إفساد الرموز للنمذجة اللغوية المُقنعة)،
فيمكنك استخدام معامل `collate_fn` بدلاً من ذلك لتمرير دالة يتم استدعاؤها لتحويل
قائمة العينات إلى دفعة وتطبيق أي معالجة مسبقة تريدها. راجع أمثلةنا [examples](https://github.com/huggingface/transformers/tree/main/examples) أو
[دفاتر الملاحظات](https://huggingface.co/docs/transformers/notebooks) لرؤية هذا النهج في العمل.
بمجرد إنشاء `tf.data.Dataset`، يمكنك تجميع النموذج وتناسبه كما هو الحال من قبل:
```py
model.compile(optimizer=Adam(3e-5))# No loss argument!
model.fit(tf_dataset)
```
</tf>
</frameworkcontent>
<aid='pytorch_native'></a>
## تدريب في PyTorch الأصلي
<frameworkcontent>
<pt>
<Youtubeid="Dh9CL8fyG80"/>
[`Trainer`] يهتم بحلقة التدريب ويسمح لك بضبط نموذج في سطر واحد من التعليمات البرمجية. بالنسبة للمستخدمين الذين يفضلون كتابة حلقة التدريب الخاصة بهم، يمكنك أيضًا ضبط نموذج 🤗 Transformers في PyTorch الأصلي.
@ -397,8 +281,6 @@ torch.cuda.empty_cache()
>>> metric.compute()
```
</pt>
</frameworkcontent>
<aid='additional-resources'></a>
@ -409,4 +291,4 @@ torch.cuda.empty_cache()
- [🤗 أمثلة المحولات](https://github.com/huggingface/transformers/tree/main/examples) تتضمن
النصوص البرمجية لتدريب مهام NLP الشائعة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.
- [🤗 دفاتر ملاحظات المحولات](notebooks) يحتوي على دفاتر ملاحظات مختلفة حول كيفية ضبط نموذج لمهمة محددة في PyTorch وTensorFlow.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.