Compare commits

...

84 Commits

Author SHA1 Message Date
896e9cea1a Release: v4.53.1 2025-07-04 09:53:04 +02:00
bc0a47e64f Add packed tensor format support for flex/sdpa/eager through the mask! (#39194)
* Add the necesary logic to mask_utils

* add it everywhere

* Update masking_utils.py

* style

* Update masking_utils.py

* Update modeling_mimi.py

* Update masking_utils.py

* add support for more than batch size 1

* Update masking_utils.py

* add test

* style

* Update test_masking_utils.py

* Update masking_utils.py

* add require_token

* fix tests

* fix
2025-07-04 09:47:02 +02:00
63af3d7a07 when delaying optimizer creation only prepare the model (#39152) 2025-07-04 09:46:45 +02:00
a10c2bf840 Fix multimodal processor get duplicate arguments when receive kwargs for initialization (#39125)
* fix processor tokenizer override

Signed-off-by: Isotr0py <2037008807@qq.com>

* code format

Signed-off-by: Isotr0py <2037008807@qq.com>

* add regression test

Signed-off-by: Isotr0py <2037008807@qq.com>

* fix

Signed-off-by: Isotr0py <2037008807@qq.com>

* check image processor same

Signed-off-by: Isotr0py <2037008807@qq.com>

---------

Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-04 09:46:11 +02:00
650bc67950 [smolvlm] fix video inference (#39147)
* fix smolvlm

* better do as before, set sampling params in overwritten `apply_chat_template`

* style

* update with `setdefault`
2025-07-04 09:40:50 +02:00
3c5f91089a [qwen2-vl] fix FA2 inference (#39121)
* fix FA2

* update is causal flag and remove mask for FA2

* update for FA2 with varlen path

* how the tests were passing with different devices?

* add comment and ref to the PR

* move mask preparation to base pretrained model

* seq len is the first dim, not second

* fix copies to fix GLM4V
2025-07-04 09:39:09 +02:00
5e1c91462f Several fixes for Gemma3n (#39135)
* remove the skips

* fix the epsilon to a small value (does not make sense otherwise)

* safeguard

* overload test_eager_matches_sdpa

* Update test_modeling_common.py

* skip appropriate tests

* correct no_split_layer

* fix all devices issue

* fix backward

* fix
2025-07-04 09:38:07 +02:00
8446e2a5ee Fix key mapping for VLMs (#39029)
* fix key mapping for VLMs

* use __mro__ instead

* update key mapping in save_pretrained
2025-07-04 09:38:07 +02:00
6900fe8a3f Fix: unprotected import of tp plugin (#39083) 2025-07-04 09:38:07 +02:00
67ddc82fbc Release: v4.53.0 2025-06-26 18:02:11 +02:00
0a8081b03d [Modeling] Fix encoder CPU offloading for whisper (#38994)
* fix cpu offloading for whisper

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* unskip offloading tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* revert small change

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

* remove tests

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>

---------

Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-06-26 15:56:33 +00:00
c63cfd6a83 Gemma 3n (#39059)
* Gemma 3n

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3p5RMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3p5 overall and text config with vision and audio config placeholders (#3)

* Adding gemma3p5 text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* regenerating modeling file after syncing to HEAD

* Use torch.std(..., unbiased=False) for activation sparsity (#8)

* Refactoring to a single QVK Norm (#13)

* AltUp: support scale_corrected_output (#14)

* Converts einsums to nn.Linear (#7)

* Converts einsums to nn.Linear

* Removing unused variables

* Aligning SharedKVCache with HybridCache (#11)

* Alinging SharedKVStore with HybridCache

* Remove KVStore. Refactor apply_rotary_pos_emb for sharing

* Addressing review comments

* Supporting split modality embeddings in Gemma3n (#10)

* Adding the Embedder class

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, adding audio embedding layers, integrating embedder with the remaining architecture, adding a forward method for conditional generation

* Apply suggestions from code review

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Update modular

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>

* Addressing review comments, prop drilling audio and vision configs to the text config

* Removing TODO's that have been addressed

* Simplify Embedder init and add audio embeddings

* Embeddings refactor. Adds Gemma3nAudioEmbedder and Gemma3nVisionEmbedder

* Refactoring vision and audio embeddings into ConditionalGeneration model

---------

Co-authored-by: Ryan Mullins <ryan@ryanmullins.org>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating attention mask for Gemma 3.5 (#15)

* xxx_token_index to xxx_token_id

* remvoing deprecated last_cache_position

* Removing references to SigLIP

* Always init per-layer inputs

* Using torch.finfo().min for epsilon_tensor

* Gemma3nDecoderLayer inherits from Gemma3DecoderLayer. Remove gating lambdas

* fix modular GEMMA3N_INPUTS_DOCSTRING

* Gemma3nAttention inherits from Gemma3Attention

* Modular inheritance fixes

* CausalLM conversion script for 4B model (#16)

* Add Gemma3n Audio Encoder (#6)

* initial commit of Gemma 3.5 scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma3n overall and text config with vision and audio config placeholders (#3)

* Adding gemma3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3.5 (#3)

* Initial Gemm3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3.5

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right Gemma 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3.5

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* CausalLM conversion script for 4B model

* inv_timescales to non-persistent buffer

* Addressing audio encoder Attention feedback

* Addressing Gemma3nAudioSSCPConvBlock feedback

* Addressing Gemma3nAudioConformerAttention feedback

* Addressing padding feedback

* Weights conversion loads audio state dict

* Always use vision_config so saving works

* Token id updates for configs

* Stubs for interleaving audio embs

* Addressing reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Fixing cache access error

* Removing duplicate code from a bad merge

* Gemma 3n Text + Vision Part 1 (#17)

* testing utilities for numerics comparisons

* Corrected einsum to nn.Linear weights conversion

* Inherit scaled word embs from Gemma3 not Bart

* Fixing transposes for collapsed linears

* More transpose fixes

* numpy api fix

* RMSNorm: Explicit kwargs, scale_shift=0.0 when with_scale=True

* Force AltUp  to float32

* Updating debugging script for AudioEncoder debugging

* Support divide_weight_by_sqrt_fan_in from JAX for per-layer inputs

* Correcting attention einsum conversions

* RMSNorm in type of x

* Fixing douplicate laurel norm/gating

* KV sharing using the right previous indices

* Refactor kv shared index computation. Correct frac_shared_layers

* Use num_shared_layers instead of inferring from a fraction

* fixing a bug for logging

* Fix shared data_ptrs in altup inits

* rope: adjust proj -> norm -> rope to preserve computation (#20)

* rope: adjust proj -> norm -> rope to preserve computation

* Removing some breaking language model fluff in ConditionalGeneration

* Consolidate query_states transforms

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Vectorize the loops in AltUp (#19)

* Vectorize the loops in AltUp

* fix typo

* Expanding to support batched inputs

* remove extra debug script

* Fix AltUp.forward

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Add 'scale_shift=0.0, with_scale=True' to the final norm in TextModel

* Convert norm to 1/sqrt (#21)

* Convert norm to 1/sqrt

* Scale shift change per Phil's rec

* Adding default activation sparsity

* Fixing 2B config in weights conversion script

* Fixing RMSNorm parameters - adding scale_shift and with_scale

* Correcting query pre-attention scaling

* Adding query_rescale_scalar to text config

* Adding layer_idx to MLP

* Permafix for input_layernorm

* Use 1/sqrt instead of rsqrt in DecoderLayer

* Fix o_proj conversion

* Conversion script update for vision encoder

* Removing logging for debugging timm model

* Fixing bugs in Gemma3nForConditionalGeneration for text generation

* Generating the modeling_gemma3n.py file

* Removing the addition of an erroneous line in the modeling file

* Adding gemma3n text model to modeling_auto

* Bugfix: Updating the interleaving of inputs_embeds and vision_embeds

* Updating the modeling file with the latest bugfix changes

* Updating models/auto for Gemma 3n

* using AutoTokenizer in forward test

* Adding processing_gemma3n.py

* Gemma 3n configured for AutoModel. Conversion script updated.

* Removing errant merge artifacts

---------

Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>

* Removing errant debugging statements from Gemma 3

* Gemma3n audio model (#18)

* testing utilities for numerics comparisons

* Implement CumulativeGroupNorm and add to SubSampleConvProjection and SSCPConvBlock

* Add audio version of forward script based on RyanMullins' implementation

* Updating to match encoder tests. WIP: config question needs resolving

* Updates to audio classes to enable end-to-end running

* Removing vestigial classes, cleaning up print statements

* Adding SiLU / Swish to audio conformer feed forward block

* Shifted Gemma3p5Audio naming prefix to Gemma3NanoAudio

* Adding outputs to audio test

* Fixes to padding in SSCP and 1D convolution, align RMS Norm with wider model

* Update forward test to load from local weights

* Update conversion to process / output audio layers

* Update __all__ to export audio encoder

* AutoModel registration for Gemma 3n Audio

* Use AutoModel for ConditionalGeneration.audio_tower

* Fixing input_proj_linear transpose

* Fixing Gemma3NanoAudioConformerAttention.post conversion

* Fixing Gemma3NanoAudioSSCPConvBlock.conv weights conversion

* Correcting indentation issue on Gemma3p5RMSNorm

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Text + Vision Part 2 (#23)

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3p5.py

* Update src/transformers/models/gemma3p5/modular_gemma3p5.py

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Updating configs for the 2B variant in the conversion script

* Using final generation config in conversion script

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Audio Integration (#12)

* initial commit of Gemma 3n scaffold

* Fixing param pass through on Gemm3nRMSNorm

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Adds AltUp to Gemma 3n

* Adding Gemma 3n overall and text config with vision and audio config placeholders (#3)

* Adding Gemma 3n text configs

* Adding audio config placeholders

* Adding a placeholder for vision configs

* Updating MobileNetVisionConfig, inheriting TimmWrapperConfig

* Updating text configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Removing altup configs to accept the suggested configs

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating altup config

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Addressing review comments and updating text configs

* Adding a config for activation sparsity

* Updating configs to pass through options to super class init and adjust some name prefixes

* Updating laurel and altup with corrected config values

* Normalizing sub_config initializers

---------

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Updating MLP with activation sparsity (#2)

* Updating DecoderBlock for Gemma 3n (#3)

* Initial Gemma3nTextModel (#4)

NOTE: This implementation WILL CHANGE in the coming weeks, however, changes will be strictly additive and this will remain a suitable baseline for downstream implementations to reference.

* Adding KV Cache Sharing

* Adds Einsum layer to Gemma 3n

* Updating EinsumLayer API

* Refactored kv cache sharing in attention

* Adding KVStore for cache sharing

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update modular

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Update src/transformers/cache_utils.py

Co-authored-by: Ryan Mullins <ryanmullins@google.com>

* Undoing erroneous force push

* Reverting RMSNorm to with_scale by default

* Adds LAuReL to Gemma 3n

* Updating KV Cache Sharing implementation

* Updating the q and k norm definitions in the attention module

* Fixing name error for q,k,v RMS norm to use the right 3n module

* Updating MLP with activation sparsity

* Updating DecoderBlock for Gemma 3n

* Updating kv cache sharing implementation with the use of a cache buffer and refactoring some lines of code

* Isolating KV Cache logic to relevant components

* Fixing logic error in Gemma3nAttention.forward

* Refactoring caching contributions and fixing kv_store initialization

* Simplifying Configs

* Remove errant self from super init call

* Bug fix in the Attention module - changing self.head_dim to config.head_dim

* Bug fixes in the LaurelBlock and RMS Norm super init call

* removing redundant code from a merge

* Adding per_layer_inputs to TextModel

* Adding preprocess embeddings with altup

* Adds per-layer-to-single output and a host of TODOs

* Integrating altup predict with the model workflow and other minor bug fixes

* Using nn.Embedding temporarily for text model

* It goes forward

* Minor refactor of attention sparsity and RoPE initialization

* Fixing duplicate rope_scaling param bug when loading from pretrained

---------

Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Normalizing on altup_num_inputs config option

* Adding audio encoder config

* Adds high-level components for Audio Encoder

* Implement uniform reducer for Audio Encoder

* Adding placeholders for Conformer components in Audio Encoder

* Adding placeholders for SubSampleConvProjection components in Audio Encoder

* Adding SequenceLayer component placeholders

* Implementing Gemma3nAudioEncoder with nn.Sequential

* Implementing Gemma3nAudioSubSampleConvProjection with nn.Sequential

* Implementing Conformer model with SequenceLayers

* Use OrderedDict in nn.Sequential initializers

* Implements sl.Residual in Torch with nn.Sequential and OrderedDict

* Adopting a base SequenceLayer class with default forward() method

* Implementing sl.GatedLinearUnit in Torch

* Implementing sl.Swish in Torch

* Implementing sl.ReLU in Torch

* Implementing sl.Scale in Torch

* Removing sl.Dropout after tree-shaking

* Implementing sl.RMSNorm in Torch with fake shape

* Implementing sl.GroupNorm in Torch

* Implementing sl.Conv2d in Torch

* Implementing sl.Dense in Torch

* Removing sl.Delay layers, which act as pass-throughs

* Connecting shapes to configs in initializers

* Removing sl.Emit

* Implementing sl.ExpandDims in Torch

* Adding sl.GradientClipping to Torch

* Implementing sl.DenseShaped in Torch

* Implementing sl.LDPA in Torch

* Removing unused sl.CombinedQKVProj class

* Fixing erroneous type hint

* Implemnenting sl.DepthwiseConv1D in Torch

* Implementing sl.MaskInvalid in Torch

* Fixes for initialization

* Fixes for saving weights

* Removing einsums per feedback from HF staff

* Removing Sequence Layers idioms from audio encoder

* Fixes for reviewer comments

* Converting sl.Frontend to FeatureExtractor

* Updates for ConditionalGeneration.get_image_features

* Adding a WIP draft of image_processing_gemma3n.py

* Update modular

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>

* Modular conversion after github suggested change

* Text + image gives good results

* Fixing image size preset

* Draft of audio data in chat template

* Removing image processing. Using SigLIP instead.

* Audio input going end-to-end

* Fixing dtype issues in audio encoder

* x-lib formatting consistency

* Adding example data

* Save preprocessor_config.json from conversion script

* Instrumentaiton for debugging

* Additional instrumentation for preprocessing debugging

* Updates to preprocessor, padding; produces correct end-to-end results on sample

* Tackling configuraiton TODOs

* Start of feature extractor refatcor

* Adds Numpy version of USM extractor, removes Torch version and dependencies

* Fixing AltUp.correct coef permute

* Supporting batches of single audio segment inputs

* Docstrings updates for config

* In-lining audio feature extraction

* Adjustments to conversion script and smoke test script

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>

* Gemma 3n renaming

* Removing test data and utilities

* Renaming test files

* Gemma 3n refactor

* Fix tokenizer config in conversion script

* Address reviewer feedback

* FeatureExtractor returns float32 by default

* Adding basic tests for audio, and input name for audio encoder

* Audio integration test, updates to model_id for other integration tests

* Use scales for q and k norms (#26)

* Update audio integration test to use HF dataset

* Reviewer feedback

* Expand embedding table to full vocab size in weights conversion

* Mix-n-match MatFormers for Gemma 3n (#25)

* Remove in-place operations (#30)

* chore: removing inplace ops

* remove [tensor] * n pattern

* chore: reviewer feedback in AudioEncoder and AltUp

* More grad clipping

* Dynamo compatibility

* fix: cache slicing error

* chore: simplify shared kv cache slicing

* chore: vision encoder rename in timm

* fix: image processor do_normalize=False

* fixup: style

* chore: model_doc

* fix: docs for code quality

* chore: repo consistency

* fix: RMSNorm in float as in prior Gemmas

* fix: per_layer_inputs = None

* chore: Gemma3nForCausalLM from Gemma3nForConditionalGeneration checkpoint

* chore: repo consistency

* Add initial unit tests for Gemma3nAudioFeatureExtractor (#27)

* Add initial unit tests for Gemma3nAudioFeatureExtractor

* Add basic unit tests for Gemma3nProcessor (#28)

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* parameterize tests

---------

Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>

* chore: code style

* fix: test cases

* style and consistency

* fix config in the test to be coherent with layer cache sharing

* fix hidden states in tests and code

* inits and mappings

* fix modality prefixes

* test order and prefixes

* fix test exception

* fix class order and reduce model size for faster tests

* restore _checkpoint_conversion_mapping to load Caual from Conditional

* fix config mapping!

* fix: reviewer feedback

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix import test

* add model args

* auto_docstring

* replace test path

* consistency

* skip tests for now

* fix docstring for doc builder

* skip unused attr

---------

Co-authored-by: SindhuRaghuram97 <114270661+SindhuRaghuram97@users.noreply.github.com>
Co-authored-by: Sindhu Raghuram <sindhuraghuram@google.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Mayank Chaturvedi <imayank@google.com>
Co-authored-by: Douglas Reid <douglas-reid@users.noreply.github.com>
Co-authored-by: Douglas Reid <21148125+douglas-reid@users.noreply.github.com>
Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Co-authored-by: pculliton <phillipculliton@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-26 17:55:47 +02:00
3e5cc12855 [tests] remove tests from libraries with deprecated support (flax, tensorflow_text, ...) (#39051)
* rm tf/flax tests

* more flax deletions

* revert fixture change

* reverted test that should not be deleted; rm tf/flax test

* revert

* fix a few add-model-like tests

* fix add-model-like checkpoint source

* a few more

* test_get_model_files_only_pt fix

* fix test_retrieve_info_for_model_with_xxx

* fix test_retrieve_model_classes

* relative paths are the devil

* add todo
2025-06-26 16:25:00 +01:00
cfff7ca9a2 [Whisper] Pipeline: handle long form generation (#35750)
* handle long form generation

* add warning

* correct incorrect in place token change

* update test to catch edge case

* make style

* update warning

* add doc
2025-06-26 14:33:31 +00:00
02ecdcfc0f add _keep_in_fp32_modules_strict (#39058)
* add _keep_in_fp32_modules_strict

* complete test
2025-06-26 13:55:28 +00:00
vb
d973e62fdd fix condition where torch_dtype auto collides with model_kwargs. (#39054)
* fix condition where torch_dtype auto collides with model_kwargs.

* update tests

* update comment

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-26 14:52:57 +02:00
44b231671d [qwen2-vl] fix vision attention scaling (#39043)
scale lost its `-` when refactoring
2025-06-26 14:06:52 +02:00
ae15715df1 polishing docs: error fixes for clarity (#39042)
* fix duplicate deprecate_models.py

* fix duplicate modular_model_converter.py
2025-06-26 11:56:31 +00:00
3abeaba7e5 Create test for #38916 (custom generate from local dir with imports) (#39015)
* create test for #38916 (custom generate from local dir with imports)
2025-06-26 13:54:36 +02:00
25c44d4b68 Internvl fix (#38946)
* Image processor compile fix (#38540)

* Added a compile-friendly versiom of resize to BaseImgProcessorFast

* Changed qwen2 processor to use its parent class .resize

* Style

* underlined issue only happens on AMD w/ comment and bool check

* Fixed some utils functions

* Fixed the same issue for bridgetower

* Fixed the same issue for llava_next

* Repo consistency for llava onevision

* Update src/transformers/image_processing_utils_fast.py

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>

* Added an Expectation to an internvl test

* Made qwen2_vl use the resize method of its parent clas

* Changed to torch.where

---------

Co-authored-by: Mohit Sharma <mohit21sharma.ms@gmail.com>
2025-06-26 13:44:59 +02:00
f85b47d1b8 [Generate] Fix no grad on some models (#39008)
fixes on torch no grad for generate
2025-06-26 13:06:09 +02:00
583db52bc6 Add Dia model (#38405)
* add dia model

* add tokenizer files

* cleanup some stuff

* brut copy paste code

* rough cleanup of the modeling code

* nuke some stuff

* more nuking

* more cleanups

* updates

* add mulitLayerEmbedding vectorization

* nits

* more modeling simplifications

* updates

* update rope

* update rope

* just fixup

* update configuration files

* more cleanup!

* default config values

* update

* forgotten comma

* another comma!

* update, more cleanups

* just more nits

* more config cleanups

* time for the encoder

* fix

* sa=mall nit

* nits

* n

* refacto a bit

* cleanup

* update cv scipt

* fix last issues

* fix last nits

* styling

* small fixes

* just run 1 generation

* fixes

* nits

* fix conversion

* fix

* more fixes

* full generate

* ouf!

* fixes!

* updates

* fix

* fix cvrt

* fixup

* nits

* delete wrong test

* update

* update

* test tokenization

* let's start changing things bit by bit - fix encoder step

* removing custom generation, moving to GenerationMixin

* add encoder decoder attention masks for generation

* mask changes, correctness checked against ad29837 in dia repo

* refactor a bit already --> next cache

* too important not to push :)

* minimal cleanup + more todos

* make main overwrite modeling utils

* add cfg filter & eos filter

* add eos countdown & delay pattern

* update eos countdown

* add max step eos countdown

* fix tests

* fix some things

* fix generation with testing

* move cfg & eos stuff to logits processor

* make RepetitionPenaltyLogitsProcessor flexible

- can accept 3D scores like (batch_size, channel, vocab)

* fix input_ids concatenation dimension in GenerationMixin for flexibility

* Add DiaHangoverLogitsProcessor and DiaExponentialDecayLengthPenalty classes; refactor logits processing in DiaForConditionalGeneration to utilize new configurations and improve flexibility.

* Add stopping criteria

* refactor

* move delay pattern from processor to modeling like musicgen.

- add docs
- change eos countdown to eos delay pattern

* fix processor & fix tests

* refactor types

* refactor imports

* format code

* fix docstring to pass ci

* add docstring to DiaConfig & add DiaModel to test

* fix docstring

* add docstring

* fix some bugs

* check

* porting / merging results from other branch - IMPORTANT: it very likely breaks generation, the goal is to have a proper forward path first

* experimental testing of left padding for first channel

* whoops

* Fix merge to make generation work

* fix cfg filter

* add position ids

* add todos, break things

* revert changes to generation --> we will force 2d but go 3d on custom stuff

* refactor a lot, change prepare decoder ids to work with left padding (needs testing), add todos

* some first fixes to get to 10. in generation

* some more generation fixes / adjustment

* style + rope fixes

* move cfg out, simplify a few things, more todos

* nit

* start working on custom logit processors

* nit

* quick fixes

* cfg top k

* more refactor of logits processing, needs a decision if gen config gets the new attributes or if we move it to config or similar

* lets keep changes to core code minimal, only eos scaling is questionable atm

* simpler eos delay logits processor

* that was for debugging :D

* proof of concept rope

* small fix on device mismatch

* cfg fixes + delay logits max len

* transformers rope

* modular dia

* more cleanup

* keep modeling consistently 3D, generate handles 2D internally

* decoder starts with bos if nothing

* post processing prototype

* style

* lol

* force sample / greedy + fixes on padding

* style

* fixup tokenization

* nits

* revert

* start working on dia tests

* fix a lot of tests

* more test fixes

* nit

* more test fixes + some features to simplify code more

* more cleanup

* forgot that one

* autodocs

* small consistency fixes

* fix regression

* small fixes

* dia feature extraction

* docs

* wip processor

* fix processor order

* processing goes brrr

* transpose before

* small fix

* fix major bug but needs now a closer look into the custom processors esp cfg

* small thing on logits

* nits

* simplify indices and shifts

* add simpler version of padding tests back (temporarily)

* add logit processor tests

* starting tests on processor

* fix mask application during generation

* some fixes on the weights conversion

* style + fixup logits order

* simplify conversion

* nit

* remove padding tests

* nits on modeling

* hmm

* fix tests

* trigger

* probably gonna be reverted, just a quick design around audio tokenizer

* fixup typing

* post merge + more typing

* initial design for audio tokenizer

* more design changes

* nit

* more processor tests and style related things

* add to init

* protect import

* not sure why tbh

* add another protect

* more fixes

* wow

* it aint stopping :D

* another missed type issue

* ...

* change design around audio tokenizer to prioritize init and go for auto - in regards to the review

* change to new causal mask function + docstrings

* change ternary

* docs

* remove todo, i dont think its essential tbh

* remove pipeline as current pipelines do not fit in the current scheme, same as csm

* closer to wrapping up the processor

* text to audio, just for demo purposes (will likely be reverted)

* check if it's this

* save audio function

* ensure no grad

* fixes on prefixed audio, hop length is used via preprocess dac, device fixes

* integration tests (tested locally on a100) + some processor utils / fixes

* style

* nits

* another round of smaller things

* docs + some fixes (generate one might be big)

* msytery solved

* small fix on conversion

* add abstract audio tokenizer, change init check to abstract class

* nits

* update docs + fix some processing :D

* change inheritance scheme for audio tokenizer

* delete dead / unnecessary code in copied generate loop

* last nits on new pipeline behavior (+ todo on tests) + style

* trigger

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
2025-06-26 11:04:23 +00:00
5995cfa0a0 Fix Bad Outputs in Fast Path for GraniteMoeHybrid (#39033)
Fix bug in previous state setting
2025-06-26 09:45:57 +02:00
22b0a89878 Granite speech speedup + model saving bugfix (#39028)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property

* speedup relative position embeddings

* fix several issues in model saving/loading:
- avoid modifying `self._hf_peft_config_loaded` when saving
- adapter_config automatically points to the original base model - a finetuned version should point to the model save dir.
- fixing model weights names, that are changed by adding an adapter.

* minor

* minor

* minor

* fixing a crash without peft active

* add todo to replace einsum
2025-06-26 09:44:17 +02:00
1d45d90e5d [tests] remove TF tests (uses of require_tf) (#38944)
* remove uses of require_tf

* remove redundant import guards

* this class has no tests

* nits

* del tf rng comment
2025-06-25 17:29:10 +00:00
d37f751797 Two ReDOS fixes (#39013)
* two_redos_fixes

* Fix two redos issues

* Just don't use RE at all
2025-06-25 17:31:26 +01:00
551e48f182 [Kyutai-STT] correct model type + model id (#39035)
* correct model type + model id

* udpate doc

* init fix

* style !!!
2025-06-25 16:09:00 +00:00
dad0e87c79 Add SmolLM3 (#38755)
* init smollm3

* integration tests

* config quirks

* docs stub

* rests round 2

* tests round 3

* tests round 4

* bring SWA back

* config checker pls

* final checkpoint

* style and copies

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/smollm3/modular_smollm3.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2025-06-25 15:12:15 +00:00
3233e9b7c3 refactor: remove custom BarkLayerNorm (#39003)
`nn.LayerNorm` supports `bias=False` since Pytorch 2.1
2025-06-25 16:07:52 +01:00
3c1d4dfbac Fix grammatical error in models documentation (#39019) 2025-06-25 14:55:22 +00:00
858f9b71a8 Remove script datasets in tests (#38940)
* remove trust_remote_code

* again

* Revert "Skip some tests for now (#38931)"

This reverts commit 31d30b72245aacfdf70249165964b53790d9c4d8.

* again

* style

* again

* again

* style

* fix integration test

* fix tests

* style

* fix

* fix

* fix the last ones

* style

* last one

* fix last

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-25 14:31:20 +00:00
3c322c9cdf fix gemma3 grad acc (#37208)
* fix gemma3 grad acc

* fix

* fix

* fix

* fix

* rmv print

* rm

* Update setup.py

* Apply style fixes

* propagate the changes

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:28:44 +02:00
860b898d03 fix: astronomical loss with ModernBERT when using gradient checkpointing (#38982) (#38983)
* fix: astronomical loss with ModernBERT when using gradient checkpointing

* update the modling fix

---------

Co-authored-by: Arthur <arthur.zucker@gmail.com>
2025-06-25 16:11:18 +02:00
a2eb75c891 Support for Flash Attention 3 (#38972)
* Support `flash_attn_3`
Implements fwd and tests for Flash Attention 3 https://github.com/Dao-AILab/flash-attention/commits/main/hopper

- Includes checks for dropout>0 and ALiBi in `modeling_utils.PreTrainedModel._check_and_enable_flash_attn_3` (Dropout will likely be supported soon, so this will need to be updated and `modeling_flash_attention_utils._flash_attention_forward` at the `if _IS_FLASH_ATTN_3_AVAILABLE: ...`

An example Llama implementation is included in `modeling_llama.py` but other models would still need to be updated

Based on https://github.com/huggingface/transformers/pull/36190 which has model implementations and examples which could be merged

* Add tests for Flash Attention 2 and 3 parity

* ci fix

* FA2 compatibiity
- `_prepare_flash_attention_from_position_ids` ->`prepare_fa2_from_position_ids`
- Remove bettertransformer check in Flash Attention 3
- Merge tests
- Add licensing

* ci fix

* Test naming consistency

* ci fix

* Deprecation warning for `prepare_fa2_from_position_ids`

* ci fix
2025-06-25 14:39:27 +02:00
de98fb25a3 Fix the seamless_m4t cannot work on Gaudi (#38363)
* Fix the seamless_m4t cannot work on Gaudi

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Refine the patch

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Fix seamless_m4t_v2 crash

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Use the patched_gather

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove debug logs

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Remove useless modifications

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add hpu check

Signed-off-by: yuanwu <yuan.wu@intel.com>

* Add comments

Signed-off-by: yuanwu <yuan.wu@intel.com>

---------

Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-06-25 12:40:01 +02:00
7503cb9113 [Model] add dots1 (#38143)
* add dots1

* address comments

* fix

* add link to dots1 doc

* format

---------

Co-authored-by: taishan <rgtjf1@163.com>
2025-06-25 11:38:25 +02:00
3ef8896906 Encoder-Decoder Gemma (#38332)
* Initial submit

* Fix bugs:
1. add __init__ file
2. tied word embedding
3. support flash/flex attention
4. model saving and loading

* Code refactor:
* Rename encdecgemma to t5gemma.
* Split attention into self- and cross-attention
* Split stack into encoder and decoder
* Add test cases
* Add auto configuration

* Update configurations.

* Fix bugs related to copy and attribute checks

* Fix type union

* Fix merge errors

* run ruff format

* Run make style and update tests.

* Add t5gemma model doc.

* ruff and style formatting.

* Add missed module config.

* Add dummy checkpoint link to pass tests (need updated when real checkpoints are uplioaded.).

* Update model doc.

* Minor updates following Arthur's comments:
* replace docstrings with auto_docstrings
* remove checkpoint layers
* remove deprecate_kwargs

* fix rebase errors

* Fix docstring issues.

* fix t5gemma doc issue.

* run ruff format

* Updates:
* split encoder-only model out
* make t5gemmamodel encoder-decoder only
* update token and sequence classification
* update tests
2025-06-25 09:05:10 +00:00
af9870265e GLM-4.1V Model support (#38431)
* 20250508 Model Architecture

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* Update modeling_glm4v.py

* update 1447

* 0526

* update

* format

* problem

* update

* update with only image embed diff

* Final

* upload

* update

* 1

* upload with ruff

* update

* update

* work

* 1

* 1

* update with new note

* 2

* Update convert_glm4v_mgt_weights_to_hf.py

* Update tokenization_auto.py

* update with new format

* remove rmsnrom

* draft with videos

* draft

* update

* update

* fix for review problem

* try to remove min_pixel

* update

* for test

* remove timestamps

* remove item

* update with remove

* change

* update 2200

* update

* Delete app.py

* format

* update

* Update test_video_processing_glm4v.py

* 1

* 2

* use new name

* Update test_video_processing_glm4v.py

* remove docs

* change

* update for image processors update

* 2108

* 2128

* Update modular_glm4v.py

* 1

* update some

* update

* rename

* 1

* remove tests output

* 2

* add configuration

* update

* Update test_video_processing_glm4v.py

* fix simple forward tests

* update with modular

* 1

* fix more tests

* fix generation test

* fix beam search and init

* modular changed

* fix beam search in case of single-image/video. Fails if multiple visuals per text

* update processor

* update test

* pass

* fix beam search

* update

* param correct

* Update convert_glm4v_mgt_weights_to_hf.py

* 1

* Update test_modeling_glm4v.py

* 4

* 2

* 2123 video process

* 2

* revert

* 1

* 2

* revert processing

* update preprocesor

* changed

* 1

* update

* update

* 6

* update

* update

* update

* Delete tmp.txt

* config

* Update video_processing_glm4v.py

* apply modular correctly

* move functions

* fix order

* update the longest_edge

* style

* simplify a lot

* fix random order of classes

* skip integration tests

* correctly fix the tests

* fix TP plan

---------

Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-25 10:43:05 +02:00
7b3807387b Drop unnecessary tokens in GPT2Model generation (#39016)
Drop unnecessary tokens in GPT2Model generation.

Co-authored-by: Yi Pan <conlesspan@outlook.com>
2025-06-25 08:29:00 +00:00
e212ff9e6a [video processor] support torchcodec and decrease cuda memory usage (#38880)
* don't move the whole video to GPU

* add torchcodec

* add tests

* make style

* instrucblip as well

* consistency

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

* Update src/transformers/video_utils.py

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>

---------

Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-25 08:23:37 +00:00
11d0feacce [AutoModelForMaskGeneration] Remove duplicate code (#38622)
Remove duplicate code
2025-06-25 10:00:13 +02:00
3ee72af6b6 Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1 (#37332)
* Fix graph break in torch.compile when using FA2 with attention_mask=None and batch size > 1

* fix code format

* add test; replace position_ids with query_states becasue position_ids.shape[0] is always 1

* add assert loss is not nan
2025-06-25 07:58:34 +00:00
ae32f1ad11 Add zero dim tensor check when using flash_attention (#38280)
* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

* Add zero dim tensor check when using flash_attention

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>

---------

Signed-off-by: ranzhejiang <zhejiang.ran@intel.com>
2025-06-25 09:48:50 +02:00
ca402e2116 [LightGlue] Fixed attribute usage from descriptor_dim to keypoint_detector_descriptor_dim (#39021)
fix: fix descriptor dimension handling in LightGlue model
2025-06-24 23:32:07 +01:00
48b6ef0238 Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code,… (#38954)
* Add Hugging Face authentication procedure for IDEs (PyCharm, VS Code, etc.)

* Update quicktour.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-24 11:48:15 -07:00
ea9a30923e [HPU][Critical Issue Fix] ThreadPool instead of Pool for parallel pre-processing (#39002)
* ThreadPool instead of Pool for parallel pre-processing

* ThreadPool only if hpu available
2025-06-24 20:24:50 +02:00
995666edb5 Skip sdpa dispatch on flash test due to unsupported head dims (#39010) 2025-06-24 20:16:56 +02:00
f367c6337d Update self-comment-ci.yml user list (#39014)
add ivarflakstad to self-comment-ci.yml
2025-06-24 20:13:36 +02:00
67d36dc1d7 Fix bugs in DynamicCache (#37880)
* Fix bugs in DynamicCache

* Updarte

* Update

* Lint

* lint

* Rename test

* update

* update
2025-06-24 19:43:40 +02:00
6bdd4ec952 Add kyutai stt (#38909)
* first draft

* cleaner version

* udpate tests + modeling

* add tests

* init

* udpate test_modeling_common

* fix tests

* csm Processor draft

* convertion update

* mimi cache padding convolutions draft

* mimi streaming udpates

* update mimi padding cache test

* udpate cache padding mimi test

* make style mimi

* updates generate moshi asr

* moshi asr integration tests (single + batched)

* update tests

* update conversion script

* good default sliding window value

* udpdate generate

* update test checkpoint

* nit

* fix mimi

* fix codec prefix

* revert

* revert

* update config

* update config

* unnecessary mimi input restriction

* remove delay in tokens

* remove _prepare_4d_causal_attention_mask_with_cache_position and _update_causal_mask

* test update

* modular update

* make style

* nit

* rename

* create codec model generation config at init

* remove delay

* max_new_tokens/length warning

* correct conv1 padding cache import for modular

* nit

* fix on encoder_past_key_values

* convert modular

* move frame_size to config

* move frame_size to config

* update test name

* handle first token is bos

* better handling of max_new_tokens

* fix

* fix batch size in test input prep

* update docstring

* convert modular

* make style

* make style

* add feature extractor

* correct modular convention name for feature_extraction file

* update convertion script

* doc processor

* update doc

* udpate init

* update model type

* fixes

* update tests

* fix

* make

* add doc

* nit

* fix

* doc

* auto mappings

* doc

* nit

* convert modular

* doc

* nit

* extend _keep_in_fp32_modules to enforce fp32

* renaming to stt

* doc update + test update

* doc fixes

* doc fix

* doc fix

* fix musicgen tests

* fix musicgen tests

* make style

* fix musicgen tests

* correct frame_rate config param for mimi

* update mimi test

* revert update mimi test

* enforce cpu test

* move cache init in cache class

* convert modular

* docstring update

* update model id

* feature_extractor -> feature_extraction (SEW)

* convert modular

* update model id
2025-06-24 18:01:15 +02:00
08bf7f1afe Add kernelize to transformers (#38205)
* fix

* fix

* fix flow

* remove non compiling path

* change

* style

* fix

* update

* update pin

* revert
2025-06-24 17:38:54 +02:00
be10d4df60 Granite speech - minor fixes to support training with the HF trainer (#38833)
* ensure the query is updated during training

avoid unused parameters that DDP does not like

* avoid a crash when `kwargs` contain `padding=True`

trainers often pass this argument automatically

* minor

* Remove mel_spec lazy init, and rename to mel_filters.
this ensures save_pretrained will not crash when saving the processor during training
d5d007a1a0/src/transformers/feature_extraction_utils.py (L595)

* minor - most feature extractors has a `sampling_rate` property
2025-06-24 17:06:52 +02:00
e1e11b0299 Fix undeterministic order in modular dependencies (#39005)
* sort correctly

* Update modeling_minimax.py

* Update modular_model_converter.py
2025-06-24 17:04:33 +02:00
bdf5fb70aa Skip non-selected experts for qwen3_moe (#38133)
* fix(qwen3moe): skip experts with no workload

* avoid tolist and also update other moe models

* fix: should squeeze 0-dim only
2025-06-24 16:33:48 +02:00
719058c625 Update attention_visualizer.py (#37860) 2025-06-24 16:21:36 +02:00
9f42c1f192 Added scikit-learn to the example image-classification requirements.txt (#37506)
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
2025-06-24 15:24:02 +02:00
1636a7bcb9 Fixes for Arcee model (#39001)
* fix modular

* Update modular_arcee.py

* fix
2025-06-24 15:23:52 +02:00
71de20b818 Add Arcee model support (#38621)
* Add Arcee model support to transformers

- Add ArceeConfig and model mappings for all task types (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add auto-loading support through AutoModel, AutoConfig, and AutoTokenizer
- Use LlamaTokenizer for tokenization
- Add FX graph support for Arcee models
- Create lazy loading module structure for Arcee

* feat: update YARN scaling and RoPE validation for Arcee model

* feat: add auto_docstring checkpoint config to Arcee model classes

* docs: add pre-trained model weights reference to Arcee configuration files

* refactor: move RoPE utilities to dedicated modeling_rope_utils module

* Add comprehensive test suite for Arcee model

- Add test_modeling_arcee.py following standard transformers test patterns
- Include tests for all model variants (CausalLM, SequenceClassification, QuestionAnswering, TokenClassification)
- Add specific test for ReLU² activation in ArceeMLP
- Add RoPE scaling tests including YARN support
- Follow CausalLMModelTest pattern used by similar models

* Add documentation for Arcee model

- Add comprehensive model documentation with usage examples
- Include all model variants in autodoc
- Add to table of contents in proper alphabetical order
- Fixes documentation coverage for Arcee model classes

* Make style/fixup

* fix copyright year

* Sync modular conversion

* revert in legacy supported models in src/transformers/utils/fx

* cleaned redundant code in modular_arcee.py

* cleaned testing

* removed pretraining tp

* fix styles

* integration testing

---------

Co-authored-by: Pranav <veldurthipranav@gmail.com>
Co-authored-by: Pranav <56645758+pranav4501@users.noreply.github.com>
2025-06-24 15:05:29 +02:00
23c89a6732 [Attention] Small fix on output attentions (#38948)
small fix
2025-06-24 14:42:10 +02:00
4f650040a6 Removing extra space in large command for speech-pretraining example (#38705)
Removing extra space in Large command
2025-06-24 12:24:56 +00:00
d3d835d4fc [qwen] refactor attentions for vision/audio (#38930)
* refactor attentions in vision/audio

* remove fa2 import

* make config the only args

* pass along kwargs from modality encoders

* style
2025-06-24 10:53:52 +02:00
vb
2e4c045540 🔴 Update default dtype for pipelines to auto (#38882)
* check typing

* Fallback to fp32 if auto not supported.

* up.

* feedback from review.

* make style.
2025-06-24 10:39:18 +02:00
21cb353b7b [docs] Typos - Single GPU efficient training features (#38964)
* Typos

- corrected bf16 training argument
- corrected header for SDPA

* improved readability for SDPA suggested by @stevhliu

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-23 12:33:10 -07:00
f9be71b34d Fix rag (#38585)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:42:46 +02:00
9eac19eb59 [Feature] Support is_split_into_words in the TokenClassificationPipeline. (#38818)
* some fixes

* some fixes

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* now the pipeline can take list of tokens as input and is_split_into_words argument and we can handle batches of tokenized input

* solving test problems

* some fixes

* some fixes

* modify tests

* aligning start and end correctly

* adding tests

* some formatting

* some formatting

* some fixes

* some fixes

* some fixes

* resolve conflicts

* removing unimportant lines

* removing unimportant lines

* generalize to other languages

* generalize to other languages

* generalize to other languages

* generalize to other languages
2025-06-23 15:31:32 +00:00
2ce02b98bf fix mistral and mistral3 tests (#38978)
* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-23 17:07:18 +02:00
b6b4d43d6d Add support for auto_docstring with model outputs (#38242)
* experiment auto_docstring model outputs

* Fix PatchTSMixer

* Add check model output docstring to check_auto_docstring and fix all model outputs docstring

* add reordering of docstring in check_docstrings

* add check for redundant docstring in check_docstrings, remove redundant docstrings

* refactor check_auto_docstring

* make style

* fix copies

* remove commented code

* change List-> list Tuple-> tuple in docstrings

* fix modular

* make style

* Fix modular vipllava

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-06-23 10:39:41 -04:00
0c98f24889 fix: add __bool__ operator to tokenizer to avoid bloated asserts (#38899)
* fix: add __bool__ operator to tokenizer to avoid bloated asserts

When a user does 'assert tokenizer' to ensure that the tokenizer is not None, they inadvertently set off a rather expensive process in the '__len__()' operator. This fix adds a trivial '__bool__()' that returns True, so that a None tokenizer asserts and an actual tokenizer returns True when asserted, without calling length op.

* typo
2025-06-23 14:32:16 +00:00
d29482cc91 Add Idefics2/3 and SmolVLM Fast image processors + improvements for fast image processors (#38157)
* add working idefics2 fast and improvements for fast nested images processing

* add fast image processors idefics 3 and smolvlm

* cleanup tests

* fic doc idefics2

* PR review and fix issues after merge

* Force providing disable_grouping to group_images_by_shape

* simplify group_images_by_shape

* fix modular

* Fix nits after review
2025-06-23 14:17:25 +00:00
1a96127e46 Break tie in Expectations and gemma3 fixes (#38943)
* Added major / minor version to Expectations ordering

* Added fixes to gemma3

* Style
2025-06-23 15:13:27 +02:00
84d19be41e Apply GradientCheckpointingLayer to the whole repo (#38913)
* first batch (4)

* align

* altclip

* beit

* bert

* yolos

* dino, pvt_v2

* bark, bart, bert_generation

* big_bird, biogpt

* blnderbot, bloom

* bridgetower

* camambert, canine, chameleon

* chinese clip, clap, clip

* codegen, conditional detr, convbert

* dab_detr, data2vec

* dbrx, deberta

* deberta, decicion_tranformer, deformable_detr

* deit, deta, mctct

* detr, dinov2, distilbert

* donut, dpt, electra

* ernie, esm, falcon

* flava, fnet, falcon_mamba

* focalnet, git, gpt2

* gpt - bigcode, neo, neox

* gptj, groupvit

* idefics2, idefics3

* ijepa, imagegpt, internvl

* jetmoe, kosmos2, layoutlm

* layoutlm2-3, led

* lilt, longformer, longt5, luke

* m2m, mamba1-2

* marian, markuplm, mask2former

* maskformer

* mbart, megatron_bert, mimi

* mixtral, mlcd

* mobilevit1-2, modernbert

* moshi, mpt, mra

* mt5, musicgen

* mvp, nemotron

* nllb_moe

* nystromformer, omdet_turbo

* opt, owlvit, owlv2

* pegasus, pegasus_x, presimmon

* phimoe, pix2struct, pixtral

* plbart, pop2piano, prophetnet

* qwen2*

* qwen2, qwen3 moe,  rec gemma

* rembert

* roberta

* roberta prelayernorm

* roc_bert, roformer, rwkv

* sam, sam_hq

* seggpt, smolvlm, speech_to_text

* splinter, stablelm, swin

* swin2sr, switch_transformer, t5, table_transformer

* tapas, time_series_tranformer, timesformer

* trocr, tvp, umt5

* videomae, vilt, visual_bert

* vit, vit_mae, vit_msn

* vitpose_backbone, vits, vivit

* whisper. x_clip, xglm

* xlm_roberta, xmod

* yoso

* zamba

* vitdet, wav2vec2, wav2vec2_bert

* unispeech, wav2vec_conformer

* wavlm

* speecht5

* swinv2

* sew / _d

* seamless_mt4 / _v2

* deprecated models update

* bros

* gemma2, gemma3

* got, hiera, hubert, llama4, mllama, oneformer, phi, olmoe, informer

* fixup

* Add use_cache=False and past_key_value=None to  GradientCheckpointingLayer

* fixup

* fix prophetnet

* fix bigbird_pegasus

* fix blenderbot

* fix mbart

* fix mvp

* fix zamba2

* fix bart

* fix blenderbot_small

* fix codegen

* Update gradient checkpointing layer to support more past_key_values arg names

* fix data2vec vision

* fix deformable_detr

* fix gptj

* fix led

* fix m2m_100

* add comment

* fix nnlb_moe

* Fix pegasus_x

* fix plbart

* udop

* fix-copies: beit, wav2vec2

* fix gpt_bigcode

* fixup

* fix t5

* fix switch_transformers

* fix longt5

* fix mt5

* update tapas

* fix blip2

* update blip

* fix musicgen

* fix gpt2, trocr

* fix copies

* !!! Revert zamba, mllama

* update autoformer

* update bros

* update args / kwargs for BERT and copies

* 2nd round of updates

* update conditional detr

* Pass encoder_hidden_states as positional arg

* Update to pass encoder_decoder_position_bias as positional arg

* fixup

* biogpt modular

* modular gemma2

* modular gemma3

* modular gpt_neox

* modular informer

* modular internvl

* modular mixtral

* modular mlcd

* modular modernbert

* modular phi

* modular qwen2_5_omni

* modular qwen2_5_vl

* modular sam_hq

* modular sew

* wav2vec2_bert

* modular wav2vec2_conformer

* modular wavlm

* fixup

* Update by modular instructblipvideo

* modular data2vec_audio

* nit modular mistral

* apply modular minimax

* fix modular moonshine

* revert zamba2

* fix mask2former

* refactor idefics
2025-06-23 14:24:48 +02:00
07aab1af1e Remove dead protected imports (#38980)
* remove them

* more
2025-06-23 13:44:50 +02:00
74f5e4a1fa [modular] CLI allows positional arguments, and more defaults names for the optional arg (#38979)
* More defaults

* Update modular_model_converter.py
2025-06-23 12:40:01 +02:00
334bf913dc Fix(informer): Correct tensor shape for input_size=1 (#38856)
* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

* Fix(time_series): Correct scaler tensor shape in base model

The create_network_inputs function in TimeSeriesTransformerModel
handled the scaler's loc and scale tensors inconsistently.
When input_size=1, the tensors were not squeezed, leading to
downstream dimension errors for models like Informer.

This commit refactors the logic to unconditionally apply .squeeze(1),
which correctly handles all input_size cases and fixes the bug at its source.

Fixes #38745

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2025-06-23 11:50:51 +02:00
c184550daf Fix DTensor import compatibility for PyTorch < 2.5 (#38836) 2025-06-23 11:25:56 +02:00
984ff89e73 Gaudi3 CI (#38790) 2025-06-23 10:56:51 +02:00
2166b6b4ff Update blip model card (#38513)
* Update docs/source/en/model_doc/blip.md

* fix(docs/source/en/model_doc/blip.md): fix redundent typo error

* fix (docs/source/en/model_doc/blip.md): modify of review contents

* fix(docs/source/en/model_doc/blip.md): modify code block

* Update blip.md

---------

Co-authored-by: devkade <mouseku@moana-master>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-06-20 13:46:19 -07:00
166e823f77 Fix custom generate from local directory (#38916)
Fix custom generate from local directory:
1. Create parent dirs before copying files (custom_generate dir)
2. Correctly copy relative imports to the submodule file.
3. Update docs.
2025-06-20 17:36:57 +01:00
3d34b92116 Switch to use A10 progressively (#38936)
* try

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 16:10:35 +00:00
b8059e1f8f Fix more flaky test_initialization (#38932)
* try

* try

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2025-06-20 17:28:32 +02:00
5ee60f970a Correctly raise error for awq quantization (#38945)
fix warning
2025-06-20 17:18:06 +02:00
8ac2d75353 Pin PyTorch extras for AMD containers (#38941)
* Pin additional Torch packages

* Remove unused def

---------

Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
2025-06-20 12:17:21 +00:00
9120567b02 Add kwargs for timm.create_model in TimmWrapper (#38860)
* Add init kwargs for timm wrapper

* model_init_kwargs -> model_args

* add save-load test

* fixup
2025-06-20 12:00:09 +00:00
ff95974bc6 [static cache] fix device map per layer in VLMs (#38488)
return lm as decoder
2025-06-20 13:49:29 +02:00
1010 changed files with 49207 additions and 93279 deletions

View File

@ -12,8 +12,8 @@ on:
slice_id:
required: true
type: number
runner:
required: true
runner_map:
required: false
type: string
docker:
required: true
@ -45,7 +45,7 @@ jobs:
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: '${{ inputs.machine_type }}'
group: ${{ fromJson(inputs.runner_map)[matrix.folders][inputs.machine_type] }}
container:
image: ${{ inputs.docker }}
options: --gpus all --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/

View File

@ -1,128 +0,0 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
machine_type:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
docker:
required: true
type: string
env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
OMP_NUM_THREADS: 8
MKL_NUM_THREADS: 8
RUN_SLOW: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
TF_FORCE_GPU_ALLOW_GROWTH: true
CUDA_VISIBLE_DEVICES: 0,1
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 1 # For now, not to parallelize. Can change later if it works well.
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on: ['${{ inputs.machine_type }}', self-hosted, amd-gpu, '${{ inputs.runner }}']
container:
image: ${{ inputs.docker }}
options: --device /dev/kfd --device /dev/dri --env ROCR_VISIBLE_DEVICES --shm-size "16gb" --ipc host -v /mnt/cache/.cache/huggingface:/mnt/cache/
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
# For folders like `models/bert`, set an env. var. (`matrix_folders`) to `models_bert`, which will be used to
# set the artifact folder names (because the character `/` is not allowed).
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Update clone
working-directory: /transformers
run: git fetch && git checkout ${{ github.sha }}
- name: Reinstall transformers in edit mode (remove the one installed during docker image build)
working-directory: /transformers
run: python3 -m pip uninstall -y transformers && python3 -m pip install -e .
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') }}
working-directory: /transformers
run: |
python3 -m pip install -U datasets
- name: Update / Install some packages (for Past CI)
if: ${{ contains(inputs.docker, '-past-') && contains(inputs.docker, '-pytorch-') }}
working-directory: /transformers
run: |
python3 -m pip install --no-cache-dir git+https://github.com/huggingface/accelerate@main#egg=accelerate
- name: ROCM-SMI
run: |
rocm-smi
- name: ROCM-INFO
run: |
rocminfo | grep "Agent" -A 14
- name: Show ROCR environment
run: |
echo "ROCR: $ROCR_VISIBLE_DEVICES"
- name: Environment
working-directory: /transformers
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
working-directory: /transformers
run: pip freeze
- name: Run all tests on GPU
working-directory: /transformers
run: python3 -m pytest -rsfE -v --make-reports=${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }} -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports
echo "hello" > /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ inputs.machine_type }}_run_models_gpu_${{ env.matrix_folders }}_test_reports
path: /transformers/reports/${{ inputs.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports

View File

@ -0,0 +1,121 @@
name: model jobs
on:
workflow_call:
inputs:
folder_slices:
required: true
type: string
slice_id:
required: true
type: number
runner:
required: true
type: string
machine_type:
required: true
type: string
report_name_prefix:
required: false
default: run_models_gpu
type: string
env:
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
run_models_gpu:
name: " "
strategy:
max-parallel: 8
fail-fast: false
matrix:
folders: ${{ fromJson(inputs.folder_slices)[inputs.slice_id] }}
runs-on:
group: ${{ inputs.runner }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Echo input and matrix info
shell: bash
run: |
echo "${{ inputs.folder_slices }}"
echo "${{ matrix.folders }}"
echo "${{ toJson(fromJson(inputs.folder_slices)[inputs.slice_id]) }}"
- name: Echo folder ${{ matrix.folders }}
shell: bash
run: |
echo "${{ matrix.folders }}"
matrix_folders=${{ matrix.folders }}
matrix_folders=${matrix_folders/'models/'/'models_'}
echo "$matrix_folders"
echo "matrix_folders=$matrix_folders" >> $GITHUB_ENV
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ inputs.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ inputs.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ inputs.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all tests on Gaudi
run: python3 -m pytest -v --make-reports=${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: cat reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/failures_short.txt
- name: Run test
shell: bash
run: |
mkdir -p reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports
echo "hello" > reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports/hello.txt
echo "${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports"
- name: "Test suite reports artifacts: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ env.matrix_folders }}_test_reports
path: reports/${{ env.machine_type }}_${{ inputs.report_name_prefix }}_${{ matrix.folders }}_test_reports

View File

@ -29,7 +29,7 @@ jobs:
runs-on: ubuntu-22.04
name: Get PR number
# For security: only allow team members to run
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
if: ${{ github.event.issue.state == 'open' && contains(fromJSON('["ydshieh", "ArthurZucker", "zucchini-nlp", "qubvel", "molbap", "gante", "LysandreJik", "Cyrilvallez", "Rocketknight1", "SunMarc", "muellerzr", "eustlb", "MekkCyber", "manueldeprada", "vasqu", "ivarflakstad"]'), github.actor) && (startsWith(github.event.comment.body, 'run-slow') || startsWith(github.event.comment.body, 'run slow') || startsWith(github.event.comment.body, 'run_slow')) }}
outputs:
PR_NUMBER: ${{ steps.set_pr_number.outputs.PR_NUMBER }}
steps:

View File

@ -22,7 +22,7 @@ on:
default: ""
# Used for `push` to easily modiffy the target workflow runs to compare against
# Used for `push` to easily modify the target workflow runs to compare against
env:
prev_workflow_run_id: ""
other_workflow_run_id: ""
@ -51,7 +51,6 @@ jobs:
with:
job: run_models_gpu
slack_report_channel: "#transformers-ci-daily-models"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
@ -63,7 +62,6 @@ jobs:
with:
job: run_pipelines_torch_gpu
slack_report_channel: "#transformers-ci-daily-pipeline-torch"
runner: daily-ci
docker: huggingface/transformers-pytorch-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
@ -75,7 +73,6 @@ jobs:
with:
job: run_examples_gpu
slack_report_channel: "#transformers-ci-daily-examples"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
@ -87,7 +84,6 @@ jobs:
with:
job: run_trainer_and_fsdp_gpu
slack_report_channel: "#transformers-ci-daily-training"
runner: daily-ci
docker: huggingface/transformers-all-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci
@ -99,7 +95,6 @@ jobs:
with:
job: run_torch_cuda_extensions_gpu
slack_report_channel: "#transformers-ci-daily-training"
runner: daily-ci
docker: huggingface/transformers-pytorch-deepspeed-latest-gpu
ci_event: Daily CI
working-directory-prefix: /workspace
@ -112,7 +107,6 @@ jobs:
with:
job: run_quantization_torch_gpu
slack_report_channel: "#transformers-ci-daily-quantization"
runner: daily-ci
docker: huggingface/transformers-quantization-latest-gpu
ci_event: Daily CI
report_repo_id: hf-internal-testing/transformers_daily_ci

View File

@ -0,0 +1,345 @@
name: Self-hosted runner (scheduled-intel-gaudi)
on:
workflow_call:
inputs:
job:
required: true
type: string
slack_report_channel:
required: true
type: string
runner_scale_set:
required: true
type: string
ci_event:
required: true
type: string
report_repo_id:
required: true
type: string
env:
NUM_SLICES: 2
RUN_SLOW: yes
PT_HPU_LAZY_MODE: 0
TRANSFORMERS_IS_CI: yes
PT_ENABLE_INT64_SUPPORT: 1
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
SIGOPT_API_TOKEN: ${{ secrets.SIGOPT_API_TOKEN }}
HF_HOME: /mnt/cache/.cache/huggingface
jobs:
setup:
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Setup
runs-on: ubuntu-latest
outputs:
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
quantization_matrix: ${{ steps.set-matrix.outputs.quantization_matrix }}
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- id: set-matrix
if: contains(fromJSON('["run_models_gpu", "run_trainer_and_fsdp_gpu"]'), inputs.job)
name: Identify models to test
working-directory: tests
run: |
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
fi
- id: set-matrix-quantization
if: ${{ inputs.job == 'run_quantization_torch_gpu' }}
name: Identify quantization method to test
working-directory: tests
run: |
echo "quantization_matrix=$(python3 -c 'import os; tests = os.getcwd(); quantization_tests = os.listdir(os.path.join(tests, "quantization")); d = sorted(list(filter(os.path.isdir, [f"quantization/{x}" for x in quantization_tests]))) ; print(d)')" >> $GITHUB_OUTPUT
run_models_gpu:
if: ${{ inputs.job == 'run_models_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_models_gpu
secrets: inherit
run_trainer_and_fsdp_gpu:
if: ${{ inputs.job == 'run_trainer_and_fsdp_gpu' }}
name: " "
needs: setup
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs_intel_gaudi.yml
with:
slice_id: ${{ matrix.slice_id }}
machine_type: ${{ matrix.machine_type }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
runner: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit
run_pipelines_gpu:
if: ${{ inputs.job == 'run_pipelines_gpu' }}
name: Pipelines
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: python3 utils/print_env.py
- name: Show installed libraries and their versions
run: pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all pipeline tests on Intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_pipelines_gpu_test_reports tests/pipelines -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_pipelines_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_pipelines_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_pipelines_gpu_test_reports
path: reports/${{ env.machine_type }}_run_pipelines_gpu_test_reports
run_examples_gpu:
if: ${{ inputs.job == 'run_examples_gpu' }}
name: Examples directory
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run examples tests on Intel Gaudi
run: |
pip install -r examples/pytorch/_tests_requirements.txt
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_examples_gpu_test_reports examples/pytorch -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_examples_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_examples_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_examples_gpu_test_reports
path: reports/${{ env.machine_type }}_run_examples_gpu_test_reports
run_deepspeed_gpu:
if: ${{ inputs.job == 'run_deepspeed_gpu' }}
name: Intel Gaudi deepspeed tests
strategy:
fail-fast: false
matrix:
machine_type: [1gaudi, 2gaudi]
runs-on:
group: ${{ inputs.runner_scale_set }}-${{ matrix.machine_type }}
container:
image: vault.habana.ai/gaudi-docker/1.21.1/ubuntu22.04/habanalabs/pytorch-installer-2.6.0:latest
options: --runtime=habana
-v /mnt/cache/.cache/huggingface:/mnt/cache/.cache/huggingface
--env OMPI_MCA_btl_vader_single_copy_mechanism=none
--env HABANA_VISIBLE_DEVICES
--env HABANA_VISIBLE_MODULES
--cap-add=sys_nice
--shm-size=64G
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install dependencies
run: |
pip install -e .[testing,torch] "numpy<2.0.0" scipy scikit-learn librosa soundfile
pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.20.0
- name: HL-SMI
run: |
hl-smi
echo "HABANA_VISIBLE_DEVICES=${HABANA_VISIBLE_DEVICES}"
echo "HABANA_VISIBLE_MODULES=${HABANA_VISIBLE_MODULES}"
- name: Environment
run: |
python3 utils/print_env.py
- name: Show installed libraries and their versions
run: |
pip freeze
- name: Set `machine_type` for report and artifact names
shell: bash
run: |
if [ "${{ matrix.machine_type }}" = "1gaudi" ]; then
machine_type=single-gpu
elif [ "${{ matrix.machine_type }}" = "2gaudi" ]; then
machine_type=multi-gpu
else
machine_type=${{ matrix.machine_type }}
fi
echo "machine_type=$machine_type" >> $GITHUB_ENV
- name: Run all deepspeed tests on intel Gaudi
run: |
python3 -m pytest -v --make-reports=${{ env.machine_type }}_run_deepspeed_gpu_test_reports tests/deepspeed -m "not not_device_test"
- name: Failure short reports
if: ${{ failure() }}
continue-on-error: true
run: |
cat reports/${{ env.machine_type }}_run_deepspeed_gpu_test_reports/failures_short.txt
- name: "Test suite reports artifacts: ${{ env.machine_type }}_run_deepspeed_gpu_test_reports"
if: ${{ always() }}
uses: actions/upload-artifact@v4
with:
name: ${{ env.machine_type }}_run_deepspeed_gpu_test_reports
path: reports/${{ env.machine_type }}_run_deepspeed_gpu_test_reports
send_results:
name: Slack Report
needs:
[
setup,
run_models_gpu,
run_examples_gpu,
run_pipelines_gpu,
run_deepspeed_gpu,
run_trainer_and_fsdp_gpu,
]
if: ${{ always() }}
uses: ./.github/workflows/slack-report.yml
with:
job: ${{ inputs.job }}
setup_status: ${{ needs.setup.result }}
slack_report_channel: ${{ inputs.slack_report_channel }}
quantization_matrix: ${{ needs.setup.outputs.quantization_matrix }}
folder_slices: ${{ needs.setup.outputs.folder_slices }}
report_repo_id: ${{ inputs.report_repo_id }}
ci_event: ${{ inputs.ci_event }}
secrets: inherit

View File

@ -0,0 +1,67 @@
name: Self-hosted runner (Intel Gaudi3 scheduled CI caller)
on:
repository_dispatch:
workflow_dispatch:
schedule:
- cron: "17 2 * * *"
jobs:
model-ci:
name: Model CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_models_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
pipeline-ci:
name: Pipeline CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_pipelines_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
example-ci:
name: Example CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_examples_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
deepspeed-ci:
name: DeepSpeed CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_deepspeed_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit
trainer-fsdp-ci:
name: Trainer/FSDP CI
uses: ./.github/workflows/self-scheduled-intel-gaudi.yml
with:
job: run_trainer_and_fsdp_gpu
ci_event: Scheduled CI (Intel) - Gaudi3
runner_scale_set: itac-bm-emr-gaudi3-dell
slack_report_channel: "#transformers-ci-daily-intel-gaudi3"
report_repo_id: optimum-intel/transformers_daily_ci_intel_gaudi3
secrets: inherit

View File

@ -15,9 +15,6 @@ on:
slack_report_channel:
required: true
type: string
runner:
required: true
type: string
docker:
required: true
type: string
@ -62,6 +59,7 @@ jobs:
outputs:
folder_slices: ${{ steps.set-matrix.outputs.folder_slices }}
slice_ids: ${{ steps.set-matrix.outputs.slice_ids }}
runner_map: ${{ steps.set-matrix.outputs.runner_map }}
quantization_matrix: ${{ steps.set-matrix-quantization.outputs.quantization_matrix }}
steps:
- name: Update clone
@ -88,6 +86,7 @@ jobs:
if [ "${{ inputs.job }}" = "run_models_gpu" ]; then
echo "folder_slices=$(python3 ../utils/split_model_tests.py --num_splits ${{ env.NUM_SLICES }})" >> $GITHUB_OUTPUT
echo "slice_ids=$(python3 -c 'd = list(range(${{ env.NUM_SLICES }})); print(d)')" >> $GITHUB_OUTPUT
echo "runner_map=$(python3 ../utils/get_runner_map.py)" >> $GITHUB_OUTPUT
elif [ "${{ inputs.job }}" = "run_trainer_and_fsdp_gpu" ]; then
echo "folder_slices=[['trainer'], ['fsdp']]" >> $GITHUB_OUTPUT
echo "slice_ids=[0, 1]" >> $GITHUB_OUTPUT
@ -111,14 +110,14 @@ jobs:
strategy:
fail-fast: false
matrix:
machine_type: [aws-g4dn-4xlarge-cache, aws-g4dn-12xlarge-cache]
machine_type: [single-gpu, multi-gpu]
slice_id: ${{ fromJSON(needs.setup.outputs.slice_ids) }}
uses: ./.github/workflows/model_jobs.yml
with:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
runner_map: ${{ needs.setup.outputs.runner_map }}
docker: ${{ inputs.docker }}
secrets: inherit
@ -136,7 +135,6 @@ jobs:
folder_slices: ${{ needs.setup.outputs.folder_slices }}
machine_type: ${{ matrix.machine_type }}
slice_id: ${{ matrix.slice_id }}
runner: ${{ inputs.runner }}
docker: ${{ inputs.docker }}
report_name_prefix: run_trainer_and_fsdp_gpu
secrets: inherit

View File

@ -3,6 +3,9 @@ LABEL maintainer="Hugging Face"
ARG DEBIAN_FRONTEND=noninteractive
ARG TORCH_VISION='0.21.0'
ARG TORCH_AUDIO='2.6.0'
RUN apt update && \
apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip python3-dev ffmpeg git-lfs && \
apt clean && \
@ -20,6 +23,7 @@ WORKDIR /
ADD https://api.github.com/repos/huggingface/transformers/git/refs/heads/main version.json
RUN git clone https://github.com/huggingface/transformers && cd transformers && git checkout $REF
RUN python3 -m pip install --no-cache-dir torchvision==$TORCH_VISION torchaudio==$TORCH_AUDIO
RUN python3 -m pip install --no-cache-dir -e ./transformers[dev-torch,testing,video]
RUN python3 -m pip uninstall -y tensorflow flax

View File

@ -473,13 +473,6 @@ Hier ist zum Beispiel ein Test, der nur ausgeführt werden muss, wenn 2 oder meh
def test_example_with_multi_gpu():
```
Wenn ein Test `tensorflow` benötigt, verwenden Sie den Dekorator `require_tf`. Zum Beispiel:
```python no-style
@require_tf
def test_tf_thing_with_tensorflow():
```
Diese Dekors können gestapelt werden. Wenn zum Beispiel ein Test langsam ist und mindestens eine GPU unter pytorch benötigt, können Sie
wie Sie ihn einrichten können:
@ -1204,9 +1197,6 @@ if torch.cuda.is_available():
import numpy as np
np.random.seed(seed)
# tf RNG
tf.random.set_seed(seed)
```
### Tests debuggen

View File

@ -363,6 +363,8 @@
- sections:
- local: model_doc/albert
title: ALBERT
- local: model_doc/arcee
title: Arcee
- local: model_doc/bamba
title: Bamba
- local: model_doc/bart
@ -431,6 +433,8 @@
title: DiffLlama
- local: model_doc/distilbert
title: DistilBERT
- local: model_doc/dots1
title: dots1
- local: model_doc/dpr
title: DPR
- local: model_doc/electra
@ -653,6 +657,8 @@
title: SwitchTransformers
- local: model_doc/t5
title: T5
- local: model_doc/t5gemma
title: T5Gemma
- local: model_doc/t5v1.1
title: T5v1.1
- local: model_doc/tapex
@ -833,6 +839,8 @@
title: CSM
- local: model_doc/dac
title: dac
- local: model_doc/dia
title: Dia
- local: model_doc/encodec
title: EnCodec
- local: model_doc/fastspeech2_conformer
@ -841,6 +849,8 @@
title: GraniteSpeech
- local: model_doc/hubert
title: Hubert
- local: model_doc/kyutai_speech_to_text
title: Kyutai Speech-To-Text
- local: model_doc/mctct
title: MCTCT
- local: model_doc/mimi
@ -949,8 +959,12 @@
title: FLAVA
- local: model_doc/gemma3
title: Gemma3
- local: model_doc/gemma3n
title: Gemma3n
- local: model_doc/git
title: GIT
- local: model_doc/glm4v
title: glm4v
- local: model_doc/got_ocr2
title: GOT-OCR2
- local: model_doc/granitevision
@ -1043,6 +1057,8 @@
title: SigLIP
- local: model_doc/siglip2
title: SigLIP2
- local: model_doc/smollm3
title: SmolLM3
- local: model_doc/smolvlm
title: SmolVLM
- local: model_doc/speech-encoder-decoder

View File

@ -468,9 +468,17 @@ def generate(model, input_ids, generation_config=None, left_padding=None, **kwar
Follow the recommended practices below to ensure your custom decoding method works as expected.
- Feel free to reuse the logic for validation and input preparation in the original [`~GenerationMixin.generate`].
- Pin the `transformers` version in the requirements if you use any private method/attribute in `model`.
- You can add other files in the `custom_generate` folder, and use relative imports.
- Consider adding model validation, input validation, or even a separate test file to help users sanity-check your code in their environment.
Your custom `generate` method can relative import code from the `custom_generate` folder. For example, if you have a `utils.py` file, you can import it like this:
```py
from .utils import some_function
```
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom decoding method.
#### requirements.txt
You can optionally specify additional Python requirements in a `requirements.txt` file inside the `custom_generate` folder. These are checked at runtime and an exception will be thrown if they're missing, nudging users to update their environment accordingly.

View File

@ -0,0 +1,104 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
# Arcee
Arcee is a decoder-only transformer model based on the Llama architecture with a key modification: it uses ReLU² (ReLU-squared) activation in the MLP blocks instead of SiLU, following recent research showing improved training efficiency with squared activations. This architecture is designed for efficient training and inference while maintaining the proven stability of the Llama design.
The Arcee model is architecturally similar to Llama but uses `x * relu(x)` in MLP layers for improved gradient flow and is optimized for efficiency in both training and inference scenarios.
> [!TIP]
> The Arcee model supports extended context with RoPE scaling and all standard transformers features including Flash Attention 2, SDPA, gradient checkpointing, and quantization support.
The example below demonstrates how to generate text with Arcee using [`Pipeline`] or the [`AutoModel`].
<hfoptions id="usage">
<hfoption id="Pipeline">
```py
import torch
from transformers import pipeline
pipeline = pipeline(
task="text-generation",
model="arcee-ai/AFM-4.5B",
torch_dtype=torch.float16,
device=0
)
output = pipeline("The key innovation in Arcee is")
print(output[0]["generated_text"])
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import AutoTokenizer, ArceeForCausalLM
tokenizer = AutoTokenizer.from_pretrained("arcee-ai/AFM-4.5B")
model = ArceeForCausalLM.from_pretrained(
"arcee-ai/AFM-4.5B",
torch_dtype=torch.float16,
device_map="auto"
)
inputs = tokenizer("The key innovation in Arcee is", return_tensors="pt")
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
</hfoption>
</hfoptions>
## ArceeConfig
[[autodoc]] ArceeConfig
## ArceeModel
[[autodoc]] ArceeModel
- forward
## ArceeForCausalLM
[[autodoc]] ArceeForCausalLM
- forward
## ArceeForSequenceClassification
[[autodoc]] ArceeForSequenceClassification
- forward
## ArceeForQuestionAnswering
[[autodoc]] ArceeForQuestionAnswering
- forward
## ArceeForTokenClassification
[[autodoc]] ArceeForTokenClassification
- forward

View File

@ -350,6 +350,10 @@ The following auto classes are available for the following audio tasks.
[[autodoc]] AutoModelForTextToWaveform
### AutoModelForAudioTokenization
[[autodoc]] AutoModelForAudioTokenization
## Multimodal
The following auto classes are available for the following multimodal tasks.

View File

@ -14,35 +14,76 @@ rendered properly in your Markdown viewer.
-->
# BLIP
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="TensorFlow" src="https://img.shields.io/badge/TensorFlow-FF6F00?style=flat&logo=tensorflow&logoColor=white">
</div>
</div>
## Overview
# BLIP
The BLIP model was proposed in [BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation](https://huggingface.co/papers/2201.12086) by Junnan Li, Dongxu Li, Caiming Xiong, Steven Hoi.
[BLIP](https://huggingface.co/papers/2201.12086) (Bootstrapped Language-Image Pretraining) is a vision-language pretraining (VLP) framework designed for *both* understanding and generation tasks. Most existing pretrained models are only good at one or the other. It uses a captioner to generate captions and a filter to remove the noisy captions. This increases training data quality and more effectively uses the messy web data.
BLIP is a model that is able to perform various multi-modal tasks including:
- Visual Question Answering
- Image-Text retrieval (Image-text matching)
- Image Captioning
The abstract from the paper is the following:
You can find all the original BLIP checkpoints under the [BLIP](https://huggingface.co/collections/Salesforce/blip-models-65242f40f1491fbf6a9e9472) collection.
*Vision-Language Pre-training (VLP) has advanced the performance for many vision-language tasks.
However, most existing pre-trained models only excel in either understanding-based tasks or generation-based tasks. Furthermore, performance improvement has been largely achieved by scaling up the dataset with noisy image-text pairs collected from the web, which is a suboptimal source of supervision. In this paper, we propose BLIP, a new VLP framework which transfers flexibly to both vision-language understanding and generation tasks. BLIP effectively utilizes the noisy web data by bootstrapping the captions, where a captioner generates synthetic captions and a filter removes the noisy ones. We achieve state-of-the-art results on a wide range of vision-language tasks, such as image-text retrieval (+2.7% in average recall@1), image captioning (+2.8% in CIDEr), and VQA (+1.6% in VQA score). BLIP also demonstrates strong generalization ability when directly transferred to videolanguage tasks in a zero-shot manner. Code, models, and datasets are released.*
> [!TIP]
> This model was contributed by [ybelkada](https://huggingface.co/ybelkada).
>
> Click on the BLIP models in the right sidebar for more examples of how to apply BLIP to different vision language tasks.
![BLIP.gif](https://cdn-uploads.huggingface.co/production/uploads/1670928184033-62441d1d9fdefb55a0b7d12c.gif)
The example below demonstrates how to visual question answering with [`Pipeline`] or the [`AutoModel`] class.
This model was contributed by [ybelkada](https://huggingface.co/ybelkada).
The original code can be found [here](https://github.com/salesforce/BLIP).
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
import torch
from transformers import pipeline
pipeline = pipeline(
task="visual-question-answering",
model="Salesforce/blip-vqa-base",
torch_dtype=torch.float16,
device=0
)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
pipeline(question="What is the weather in this image?", image=url)
```
</hfoption>
<hfoption id="AutoModel">
```python
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForVisualQuestionAnswering
processor = AutoProcessor.from_pretrained("Salesforce/blip-vqa-base")
model = AutoModelForVisualQuestionAnswering.from_pretrained(
"Salesforce/blip-vqa-base",
torch_dtype=torch.float16,
device_map="auto"
)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
image = Image.open(requests.get(url, stream=True).raw)
question = "What is the weather in this image?"
inputs = processor(images=image, text=question, return_tensors="pt").to("cuda", torch.float16)
output = model.generate(**inputs)
processor.batch_decode(output, skip_special_tokens=True)[0]
```
</hfoption>
</hfoptions>
## Resources
- [Jupyter notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_blip.ipynb) on how to fine-tune BLIP for image captioning on a custom dataset
Refer to this [notebook](https://github.com/huggingface/notebooks/blob/main/examples/image_captioning_blip.ipynb) to learn how to fine-tune BLIP for image captioning on a custom dataset.
## BlipConfig

View File

@ -0,0 +1,162 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Dia
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
## Overview
Dia is an opensource text-to-speech (TTS) model (1.6B parameters) developed by [Nari Labs](https://huggingface.co/nari-labs).
It can generate highly realistic dialogue from transcript including nonverbal communications such as laughter and coughing.
Furthermore, emotion and tone control is also possible via audio conditioning (voice cloning).
**Model Architecture:**
Dia is an encoder-decoder transformer based on the original transformer architecture. However, some more modern features such as
rotational positional embeddings (RoPE) are also included. For its text portion (encoder), a byte tokenizer is utilized while
for the audio portion (decoder), a pretrained codec model [DAC](./dac.md) is used - DAC encodes speech into discrete codebook
tokens and decodes them back into audio.
## Usage Tips
### Generation with Text
```python
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
text = ["[S1] Dia is an open weights text to dialogue model."]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)
model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
outputs = model.generate(**inputs, max_new_tokens=256) # corresponds to around ~2s
# save audio to a file
outputs = processor.batch_decode(outputs)
processor.save_audio(outputs, "example.wav")
```
### Generation with Text and Audio (Voice Cloning)
```python
from datasets import load_dataset, Audio
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
ds = load_dataset("hf-internal-testing/dailytalk-dummy", split="train")
ds = ds.cast_column("audio", Audio(sampling_rate=44100))
audio = ds[-1]["audio"]["array"]
# text is a transcript of the audio + additional text you want as new audio
text = ["[S1] I know. It's going to save me a lot of money, I hope. [S2] I sure hope so for you."]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(text=text, audio=audio, padding=True, return_tensors="pt").to(torch_device)
prompt_len = processor.get_audio_prompt_len(inputs["decoder_attention_mask"])
model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
outputs = model.generate(**inputs, max_new_tokens=256) # corresponds to around ~2s
# retrieve actually generated audio and save to a file
outputs = processor.batch_decode(outputs, audio_prompt_len=prompt_len)
processor.save_audio(outputs, "example_with_audio.wav")
```
### Training
```python
from datasets import load_dataset, Audio
from transformers import AutoProcessor, DiaForConditionalGeneration
torch_device = "cuda"
model_checkpoint = "buttercrab/dia-v1-1.6b"
ds = load_dataset("hf-internal-testing/dailytalk-dummy", split="train")
ds = ds.cast_column("audio", Audio(sampling_rate=44100))
audio = ds[-1]["audio"]["array"]
# text is a transcript of the audio
text = ["[S1] I know. It's going to save me a lot of money, I hope."]
processor = AutoProcessor.from_pretrained(model_checkpoint)
inputs = processor(
text=text,
audio=audio,
generation=False,
output_labels=True,
padding=True,
return_tensors="pt"
).to(torch_device)
model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
out = model(**inputs)
out.loss.backward()
```
This model was contributed by [Jaeyong Sung](https://huggingface.co/buttercrab), [Arthur Zucker](https://huggingface.co/ArthurZ),
and [Anton Vlasjuk](https://huggingface.co/AntonV). The original code can be found [here](https://github.com/nari-labs/dia/).
## DiaConfig
[[autodoc]] DiaConfig
## DiaDecoderConfig
[[autodoc]] DiaDecoderConfig
## DiaEncoderConfig
[[autodoc]] DiaEncoderConfig
## DiaTokenizer
[[autodoc]] DiaTokenizer
- __call__
## DiaFeatureExtractor
[[autodoc]] DiaFeatureExtractor
- __call__
## DiaProcessor
[[autodoc]] DiaProcessor
- __call__
- batch_decode
- decode
## DiaModel
[[autodoc]] DiaModel
- forward
## DiaForConditionalGeneration
[[autodoc]] DiaForConditionalGeneration
- forward
- generate

View File

@ -0,0 +1,40 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# dots.llm1
## Overview
The `dots.llm1` model was proposed in [dots.llm1 technical report](https://www.arxiv.org/pdf/2506.05767) by rednote-hilab team.
The abstract from the report is the following:
*Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that activates 14B parameters out of a total of 142B parameters, delivering performance on par with state-of-the-art models while reducing training and inference costs. Leveraging our meticulously crafted and efficient data processing pipeline, dots.llm1 achieves performance comparable to Qwen2.5-72B after pretraining on high-quality corpus and post-training to fully unlock its capabilities. Notably, no synthetic data is used during pretraining. To foster further research, we open-source intermediate training checkpoints spanning the entire training process, providing valuable insights into the learning dynamics of large language models.*
## Dots1Config
[[autodoc]] Dots1Config
## Dots1Model
[[autodoc]] Dots1Model
- forward
## Dots1ForCausalLM
[[autodoc]] Dots1ForCausalLM
- forward

View File

@ -0,0 +1,204 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
# Gemma3n
## Overview
Gemma3n is a multimodal model with pretrained and instruction-tuned variants, available in E4B and E2B sizes. While
large portions of the language model architecture are shared with prior Gemma releases, there are many new additions in
this model, including [Alternating Updates][altup] (AltUp), [Learned Augmented Residual Layer][laurel] (LAuReL),
[MatFormer][matformer], Per-Layer Embeddings (PLE), activation sparsity, and KV cache sharing. The language model uses
a similar attention pattern to [Gemma 3](./gemma3.md) with alternating 4 local sliding window self-attention layers for
every global self-attention layer with a maximum context length of 32k tokens. Gemma 3n introduces
[MobileNet v5][mobilenetv5] as the vision encoder, using a default resolution of 768x768 pixels, and adds a
[Universal Speech Model][usm] (USM) as the audio encoder.
The instruction-tuned variant was post-trained with knowledge distillation and reinforcement learning.
You can find all the original Gemma 3n checkpoints under the [Gemma 3n][gemma3n-collection] release.
> [!TIP]
> Click on the Gemma 3n models in the right sidebar for more examples of how to apply Gemma to different vision, audio,
> and language tasks.
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
<hfoptions id="usage">
<hfoption id="Pipeline">
```py
import torch
from transformers import pipeline
pipeline = pipeline(
task="image-text-to-text",
model="google/gemma-3n-e4b",
device=0,
torch_dtype=torch.bfloat16
)
pipeline(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
text="<start_of_image> What is shown in this image?"
)
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import AutoProcessor, Gemma3nForConditionalGeneration
model = Gemma3nForConditionalGeneration.from_pretrained(
"google/gemma-3n-e4b-it",
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa"
)
processor = AutoProcessor.from_pretrained(
"google/gemma-3n-e4b-it",
padding_side="left"
)
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful assistant."}
]
},
{
"role": "user", "content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"},
{"type": "text", "text": "What is shown in this image?"},
]
},
]
inputs = processor.apply_chat_template(
messages,
tokenize=True,
return_dict=True,
return_tensors="pt",
add_generation_prompt=True,
).to("cuda")
output = model.generate(**inputs, max_new_tokens=50, cache_implementation="static")
print(processor.decode(output[0], skip_special_tokens=True))
```
</hfoption>
<hfoption id="transformers CLI">
```bash
echo -e "Plants create energy through a process known as" | transformers run --task text-generation --model google/gemma-3n-e2b --device 0
```
</hfoption>
</hfoptions>
## Notes
- Use [`Gemma3nForConditionalGeneration`] for image-audio-and-text, image-and-text, image-and-audio, audio-and-text,
image-only and aduio-only inputs.
- Gemma 3n supports multiple images per input, but make sure the images are correctly batched before passing them to
the processor. Each batch should be a list of one or more images.
```py
url_cow = "https://media.istockphoto.com/id/1192867753/photo/cow-in-berchida-beach-siniscola.jpg?s=612x612&w=0&k=20&c=v0hjjniwsMNfJSuKWZuIn8pssmD5h5bSN1peBd1CmH4="
url_cat = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
messages =[
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful assistant."}
]
},
{
"role": "user",
"content": [
{"type": "image", "url": url_cow},
{"type": "image", "url": url_cat},
{"type": "text", "text": "Which image is cuter?"},
]
},
]
```
- Text passed to the processor should have a `<image_soft_token>` token wherever an image should be inserted.
- Gemma 3n accept at most one target audio clip per input, though multiple audio clips can be provided in few-shot
prompts, for example.
- Text passed to the processor should have a `<audio_soft_token>` token wherever an audio clip should be inserted.
- The processor has its own [`~ProcessorMixin.apply_chat_template`] method to convert chat messages to model inputs.
## Gemma3nAudioFeatureExtractor
[[autodoc]] Gemma3nAudioFeatureExtractor
## Gemma3nProcessor
[[autodoc]] Gemma3nProcessor
## Gemma3nTextConfig
[[autodoc]] Gemma3nTextConfig
## Gemma3nVisionConfig
[[autodoc]] Gemma3nVisionConfig
## Gemma3nAudioConfig
[[autodoc]] Gemma3nAudioConfig
## Gemma3nConfig
[[autodoc]] Gemma3nConfig
## Gemma3nTextModel
[[autodoc]] Gemma3nTextModel
- forward
## Gemma3nModel
[[autodoc]] Gemma3nModel
- forward
## Gemma3nForCausalLM
[[autodoc]] Gemma3nForCausalLM
- forward
## Gemma3nForConditionalGeneration
[[autodoc]] Gemma3nForConditionalGeneration
- forward
[altup]: https://proceedings.neurips.cc/paper_files/paper/2023/hash/f2059277ac6ce66e7e5543001afa8bb5-Abstract-Conference.html
[attention-mask-viz]: https://github.com/huggingface/transformers/blob/beb9b5b02246b9b7ee81ddf938f93f44cfeaad19/src/transformers/utils/attention_visualizer.py#L139
[gemma3n-collection]: https://huggingface.co/collections/google/gemma-3n
[laurel]: https://arxiv.org/abs/2411.07501
[matformer]: https://arxiv.org/abs/2310.07707
[usm]: https://arxiv.org/abs/2303.01037

View File

@ -0,0 +1,180 @@
<!--Copyright 2025 The ZhipuAI Inc. and The HuggingFace Inc. team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white"> </div>
</div>
# GLM-4.1V
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
<hfoptions id="usage">
<hfoption id="Pipeline">
```py
import torch
from transformers import pipeline
pipe = pipeline(
task="image-text-to-text",
model="THUDM/GLM-4.1V-9B-Thinking",
device=0,
torch_dtype=torch.bfloat16
)
messages = [
{
"role": "user",
"content": [
{
"type": "image",
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg",
},
{ "type": "text", "text": "Describe this image."},
]
}
]
pipe(text=messages,max_new_tokens=20, return_full_text=False)
```
</hfoption>
<hfoption id="AutoModel">
```py
import torch
from transformers import Glm4vForConditionalGeneration, AutoProcessor
model = Glm4vForConditionalGeneration.from_pretrained(
"THUDM/GLM-4.1V-9B-Thinking",
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa"
)
processor = AutoProcessor.from_pretrained("THUDM/GLM-4.1V-9B-Thinking")
messages = [
{
"role":"user",
"content":[
{
"type":"image",
"url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
},
{
"type":"text",
"text":"Describe this image."
}
]
}
]
inputs = processor.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt"
).to("cuda")
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)
```
</hfoption>
</hfoptions>
Using GLM-4.1V with video input is similar to using it with image input.
The model can process video data and generate text based on the content of the video.
```python
from transformers import AutoProcessor, Glm4vForConditionalGeneration
import torch
processor = AutoProcessor.from_pretrained("THUDM/GLM-4.1V-9B-Thinking")
model = Glm4vForConditionalGeneration.from_pretrained(
pretrained_model_name_or_path="THUDM/GLM-4.1V-9B-Thinking",
torch_dtype=torch.bfloat16,
device_map="cuda:0"
)
messages = [
{
"role": "user",
"content": [
{
"type": "video",
"url": "https://test-videos.co.uk/vids/bigbuckbunny/mp4/h264/720/Big_Buck_Bunny_720_10s_10MB.mp4",
},
{
"type": "text",
"text": "discribe this video",
},
],
}
]
inputs = processor.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", padding=True).to("cuda:0")
generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=True, temperature=1.0)
output_text = processor.decode(generated_ids[0][inputs["input_ids"].shape[1] :], skip_special_tokens=True)
print(output_text)
```
## Glm4vConfig
[[autodoc]] Glm4vConfig
## Glm4vTextConfig
[[autodoc]] Glm4vTextConfig
## Glm4vImageProcessor
[[autodoc]] Glm4vImageProcessor
- preprocess
## Glm4vVideoProcessor
[[autodoc]] Glm4vVideoProcessor
- preprocess
## Glm4vImageProcessorFast
[[autodoc]] Glm4vImageProcessorFast
- preprocess
## Glm4vProcessor
[[autodoc]] Glm4vProcessor
## Glm4vTextModel
[[autodoc]] Glm4vTextModel
- forward
## Glm4vModel
[[autodoc]] Glm4vModel
- forward
## Glm4vForConditionalGeneration
[[autodoc]] Glm4vForConditionalGeneration
- forward

View File

@ -162,7 +162,7 @@ To load and run a model using Flash Attention-2, simply change the code snippet
```diff
model = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b",
+ torch_dtype=torch.float16,
+ torch_dtype=torch.float16,
+ attn_implementation="flash_attention_2",
).to(device)
```
@ -184,7 +184,7 @@ Quantizing a model is as simple as passing a `quantization_config` to the model.
+ )
model = Idefics2ForConditionalGeneration.from_pretrained(
"HuggingFaceM4/idefics2-8b",
+ torch_dtype=torch.float16,
+ torch_dtype=torch.float16,
+ quantization_config=quantization_config,
).to(device)
```
@ -218,7 +218,10 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h
[[autodoc]] Idefics2ImageProcessor
- preprocess
## Idefics2ImageProcessorFast
[[autodoc]] Idefics2ImageProcessorFast
- preprocess
## Idefics2Processor
[[autodoc]] Idefics2Processor
- __call__
- __call__

View File

@ -80,6 +80,9 @@ This model was contributed by [amyeroberts](https://huggingface.co/amyeroberts)
[[autodoc]] Idefics3ImageProcessor
- preprocess
## Idefics3ImageProcessorFast
[[autodoc]] Idefics3ImageProcessorFast
- preprocess
## Idefics3Processor
[[autodoc]] Idefics3Processor

View File

@ -0,0 +1,122 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Kyutai Speech-To-Text
## Overview
Kyutai STT is a speech-to-text model architecture based on the [Mimi codec](https://huggingface.co/docs/transformers/en/model_doc/mimi), which encodes audio into discrete tokens in a streaming fashion, and a [Moshi-like](https://huggingface.co/docs/transformers/en/model_doc/moshi) autoregressive decoder. Kyutais lab has released two model checkpoints:
- [kyutai/stt-1b-en_fr](https://huggingface.co/kyutai/stt-1b-en_fr): a 1B-parameter model capable of transcribing both English and French
- [kyutai/stt-2.6b-en](https://huggingface.co/kyutai/stt-2.6b-en): a 2.6B-parameter model focused solely on English, optimized for maximum transcription accuracy
<div class="flex justify-center">
<img src="https://huggingface.co/datasets/eustlb/documentation-images/resolve/main/kyutai_stt.png"/>
</div>
## Usage Tips
### Inference
```python
import torch
from datasets import load_dataset, Audio
from transformers import KyutaiSpeechToTextProcessor, KyutaiSpeechToTextForConditionalGeneration
# 1. load the model and the processor
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "kyutai/stt-2.6b-en-trfs"
processor = KyutaiSpeechToTextProcessor.from_pretrained(model_id)
model = KyutaiSpeechToTextForConditionalGeneration.from_pretrained(model_id, device_map=torch_device, torch_dtype="auto")
# 2. load audio samples
ds = load_dataset(
"hf-internal-testing/librispeech_asr_dummy", "clean", split="validation"
)
ds = ds.cast_column("audio", Audio(sampling_rate=24000))
# 3. prepare the model inputs
inputs = processor(
ds[0]["audio"]["array"],
)
inputs.to(torch_device)
# 4. infer the model
output_tokens = model.generate(**inputs)
# 5. decode the generated tokens
print(processor.batch_decode(output_tokens, skip_special_tokens=True))
```
### Batched Inference
```python
import torch
from datasets import load_dataset, Audio
from transformers import KyutaiSpeechToTextProcessor, KyutaiSpeechToTextForConditionalGeneration
# 1. load the model and the processor
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
model_id = "kyutai/stt-2.6b-en-trfs"
processor = KyutaiSpeechToTextProcessor.from_pretrained(model_id)
model = KyutaiSpeechToTextForConditionalGeneration.from_pretrained(model_id, device_map=torch_device, torch_dtype="auto")
# 2. load audio samples
ds = load_dataset(
"hf-internal-testing/librispeech_asr_dummy", "clean", split="validation"
)
ds = ds.cast_column("audio", Audio(sampling_rate=24000))
# 3. prepare the model inputs
audio_arrays = [ds[i]["audio"]["array"] for i in range(4)]
inputs = processor(audio_arrays, return_tensors="pt", padding=True)
inputs = inputs.to(torch_device)
# 4. infer the model
output_tokens = model.generate(**inputs)
# 5. decode the generated tokens
decoded_outputs = processor.batch_decode(output_tokens, skip_special_tokens=True)
for output in decoded_outputs:
print(output)
```
This model was contributed by [Eustache Le Bihan](https://huggingface.co/eustlb).
The original code can be found [here](https://github.com/kyutai-labs/moshi).
## KyutaiSpeechToTextConfig
[[autodoc]] KyutaiSpeechToTextConfig
## KyutaiSpeechToTextProcessor
[[autodoc]] KyutaiSpeechToTextProcessor
- __call__
## KyutaiSpeechToTextFeatureExtractor
[[autodoc]] KyutaiSpeechToTextFeatureExtractor
## KyutaiSpeechToTextForConditionalGeneration
[[autodoc]] KyutaiSpeechToTextForConditionalGeneration
- forward
- generate
## KyutaiSpeechToTextModel
[[autodoc]] KyutaiSpeechToTextModel

View File

@ -56,7 +56,7 @@ Here is how to use the processor to process text and audio:
```python
>>> # let's load an audio sample from an Arabic speech corpus
>>> from datasets import load_dataset
>>> dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True, trust_remote_code=True)
>>> dataset = load_dataset("halabi2016/arabic_speech_corpus", split="test", streaming=True)
>>> audio_sample = next(iter(dataset))["audio"]
>>> # now, process it

View File

@ -56,7 +56,7 @@ Here is how to use the processor to process text and audio:
```python
>>> # let's load an audio sample from an Arabic speech corpus
>>> from datasets import load_dataset
>>> dataset = load_dataset("arabic_speech_corpus", split="test", streaming=True, trust_remote_code=True)
>>> dataset = load_dataset("halabi2016/arabic_speech_corpus", split="test", streaming=True)
>>> audio_sample = next(iter(dataset))["audio"]
>>> # now, process it

View File

@ -0,0 +1,173 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
<div style="float: right;">
<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
</div>
</div>
# SmolLM3
SmolLM3 is a fully open, compact language model designed for efficient deployment while maintaining strong performance. It uses a Transformer decoder architecture with Grouped Query Attention (GQA) to reduce the kv cache, and no RoPE, enabling improved performance on long-context tasks. It is trained using a multi-stage training approach on high-quality public datasets across web, code, and math domains. The model is multilingual and supports very large context lengths. The instruct variant is optimized for reasoning and tool use.
> [!TIP]
> Click on the SmolLM3 models in the right sidebar for more examples of how to apply SmolLM3 to different language tasks.
The example below demonstrates how to generate text with [`Pipeline`], [`AutoModel`], and from the command line using the instruction-tuned models.
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
import torch
from transformers import pipeline
pipe = pipeline(
task="text-generation",
model="HuggingFaceTB/SmolLM3-3B",
torch_dtype=torch.bfloat16,
device_map=0
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me about yourself."},
]
outputs = pipe(messages, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"][-1]['content'])
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM3-3B",
torch_dtype=torch.bfloat16,
device_map="auto",
attn_implementation="sdpa"
)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
prompt = "Give me a short introduction to large language models."
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to("cuda")
generated_ids = model.generate(
model_inputs.input_ids,
cache_implementation="static",
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_k=50,
top_p=0.95
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)
```
</hfoption>
<hfoption id="transformers CLI">
```bash
# pip install -U flash-attn --no-build-isolation
transformers chat HuggingFaceTB/SmolLM3-3B --torch_dtype auto --attn_implementation flash_attention_2 --device 0
```
</hfoption>
</hfoptions>
Quantization reduces the memory burden of large models by representing the weights in a lower precision. Refer to the [Quantization](../quantization/overview) overview for more available quantization backends.
The example below uses [bitsandbytes](../quantization/bitsandbytes) to quantize the weights to 4-bits.
```python
# pip install -U flash-attn --no-build-isolation
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
model = AutoModelForCausalLM.from_pretrained(
"HuggingFaceTB/SmolLM3-3B",
torch_dtype=torch.bfloat16,
device_map="auto",
quantization_config=quantization_config,
attn_implementation="flash_attention_2"
)
inputs = tokenizer("Gravity is the force", return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Notes
- Ensure your Transformers library version is up-to-date. SmolLM3 requires Transformers>=4.53.0 for full support.
## SmolLM3Config
[[autodoc]] SmolLM3Config
## SmolLM3Model
[[autodoc]] SmolLM3Model
- forward
## SmolLM3ForCausalLM
[[autodoc]] SmolLM3ForCausalLM
- forward
## SmolLM3ForSequenceClassification
[[autodoc]] SmolLM3ForSequenceClassification
- forward
## SmolLM3ForTokenClassification
[[autodoc]] SmolLM3ForTokenClassification
- forward
## SmolLM3ForQuestionAnswering
[[autodoc]] SmolLM3ForQuestionAnswering
- forward

View File

@ -32,7 +32,7 @@ SmolVLM2 is an adaptation of the Idefics3 model with two main differences:
Input images are processed either by upsampling (if resizing is enabled) or at their original resolution. The resizing behavior depends on two parameters: do_resize and size.
Videos should not be upsampled.
Videos should not be upsampled.
If `do_resize` is set to `True`, the model resizes images so that the longest edge is 4*512 pixels by default.
The default resizing behavior can be customized by passing a dictionary to the `size` parameter. For example, `{"longest_edge": 4 * 512}` is the default, but you can change it to a different value if needed.
@ -192,11 +192,14 @@ print(generated_texts[0])
[[autodoc]] SmolVLMForConditionalGeneration
- forward
## SmolVLMImageProcessor
[[autodoc]] SmolVLMImageProcessor
- preprocess
## SmolVLMImageProcessorFast
[[autodoc]] SmolVLMImageProcessorFast
- preprocess
## SmolVLMVideoProcessor
[[autodoc]] SmolVLMVideoProcessor
- preprocess

View File

@ -0,0 +1,107 @@
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# T5Gemma
T5Gemma (aka encoder-decoder Gemma) was proposed in a [research paper](https://arxiv.org/abs/2504.06225) by Google. It is a family of encoder-decoder large langauge models, developed by adapting pretrained decoder-only models into encoder-decoder. T5Gemma includes pretrained and instruction-tuned variants. The architecture is based on transformer encoder-decoder design following T5, with improvements from Gemma 2: GQA, RoPE, GeGLU activation, RMSNorm, and interleaved local/global attention.
T5Gemma has two groups of model sizes: 1) [Gemma 2](https://ai.google.dev/gemma/docs/core/model_card_2) sizes (2B-2B, 9B-2B, and 9B-9B), which are based on the offical Gemma 2 models (2B and 9B); and 2) [T5](https://arxiv.org/abs/1910.10683) sizes (Small, Base, Large, and XL), where are pretrained under the Gemma 2 framework following T5 configuration. In addition, we also provide a model at ML size (medium large, ~2B in total), which is in-between T5 Large and T5 XL.
The pretrained varaints are trained with two objectives: prefix language modeling with knowledge distillation (PrefixLM) and UL2, separately. We release both variants for each model size. The instruction-turned varaints was post-trained with supervised fine-tuning and reinforcement learning.
The example below demonstrates how to chat with the model with [`Pipeline`] or the [`AutoModel`] class, and from the command line.
<hfoptions id="usage">
<hfoption id="Pipeline">
```python
import torch
from transformers import pipeline
pipe = pipeline(
task="text2text-generation",
model="google/t5gemma-placeholder",
torch_dtype=torch.bfloat16,
device="cuda",
)
pipe("Question: Why is the sky blue?\nAnswer:", max_new_tokens=50)
```
</hfoption>
<hfoption id="AutoModel">
```python
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("google/t5gemma-placeholder")
model = AutoModelForSeq2SeqLM.from_pretrained(
"google/t5gemma-placeholder",
torch_dtype=torch.bfloat16,
device_map="auto"
)
input_text = "Question: Why is the sky blue?\nAnswer:"
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
</hfoption>
<hfoption id="transformers CLI">
```
echo -e "Question: Why is the sky blue? Answer:" | transformers run --task text2text-generation --model google/t5gemma-placeholder --device 0
```
## T5GemmaConfig
[[autodoc]] T5GemmaConfig
## T5GemmaModuleConfig
[[autodoc]] T5GemmaModuleConfig
## T5GemmaModel
[[autodoc]] T5GemmaModel
- forward
## T5GemmaEncoderModel
[[autodoc]] T5GemmaEncoderModel
- forward
## T5GemmaForConditionalGeneration
[[autodoc]] T5GemmaForConditionalGeneration
- forward
## T5GemmaForSequenceClassification
[[autodoc]] T5GemmaForSequenceClassification
- forward
## T5GemmaForTokenClassification
[[autodoc]] T5GemmaForTokenClassification
- forward

View File

@ -18,7 +18,7 @@ rendered properly in your Markdown viewer.
Transformers provides many pretrained models that are ready to use with a single line of code. It requires a model class and the [`~PreTrainedModel.from_pretrained`] method.
Call [`~PreTrainedModel.from_pretrained`] to download and load a models weights and configuration stored on the Hugging Face [Hub](https://hf.co/models).
Call [`~PreTrainedModel.from_pretrained`] to download and load a model's weights and configuration stored on the Hugging Face [Hub](https://hf.co/models).
> [!TIP]
> The [`~PreTrainedModel.from_pretrained`] method loads weights stored in the [safetensors](https://hf.co/docs/safetensors/index) file format if they're available. Traditionally, PyTorch model weights are serialized with the [pickle](https://docs.python.org/3/library/pickle.html) utility which is known to be unsecure. Safetensor files are more secure and faster to load.

View File

@ -24,7 +24,7 @@ A linter "unravels" the modular file into a `modeling.py` file to preserve the s
Run the command below to automatically generate a `modeling.py` file from a modular file.
```bash
python utils/modular_model_converter.py --files_to_parse src/transformers/models/<your_model>/modular_<your_model>.py
python utils/modular_model_converter.py --files-to-parse src/transformers/models/<your_model>/modular_<your_model>.py
```
For example:

View File

@ -31,7 +31,7 @@ Refer to the table below to quickly help you identify the features relevant to y
| data preloading | yes | no |
| torch_empty_cache_steps | no | yes |
| torch.compile | yes | no |
| PEFT | no | yes |
| scaled dot production attention (SDPA) | yes | yes |
## Trainer
@ -128,7 +128,7 @@ fp16 isn't memory-optimized because the gradients that are computed in fp16 are
[bf16](https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus) trades off some precision for a much larger dynamic range, which is helpful for avoiding overflow and underflow errors. You can use bf16 without adding any loss scaling methods like you would with fp16. bf16 is supported by NVIDIAs Ampere architecture or newer.
Configure [`~TrainingArguments.fp16`] in [`TrainingArguments`] to enable mixed precision training with the bf16 data type.
Configure [`~TrainingArguments.bf16`] in [`TrainingArguments`] to enable mixed precision training with the bf16 data type.
```py
from transformers import TrainingArguments

View File

@ -32,12 +32,29 @@ To start, we recommend creating a Hugging Face [account](https://hf.co/join). An
Create a [User Access Token](https://hf.co/docs/hub/security-tokens#user-access-tokens) and log in to your account.
<hfoptions id="authenticate">
<hfoption id="notebook">
Paste your User Access Token into [`~huggingface_hub.notebook_login`] when prompted to log in.
```py
from huggingface_hub import notebook_login
notebook_login()
```
</hfoption>
<hfoption id="CLI">
Make sure the [huggingface_hub[cli]](https://huggingface.co/docs/huggingface_hub/guides/cli#getting-started) package is installed and run the command below. Paste your User Access Token when prompted to log in.
```bash
huggingface-cli login
```
</hfoption>
</hfoptions>
Install a machine learning framework.
<hfoptions id="installation">

View File

@ -474,13 +474,6 @@ For example, here is a test that must be run only when there are 2 or more GPUs
def test_example_with_multi_gpu():
```
If a test requires `tensorflow` use the `require_tf` decorator. For example:
```python no-style
@require_tf
def test_tf_thing_with_tensorflow():
```
These decorators can be stacked. For example, if a test is slow and requires at least one GPU under pytorch, here is
how to set it up:
@ -1226,11 +1219,6 @@ if torch.cuda.is_available():
import numpy as np
np.random.seed(seed)
# tf RNG
import tensorflow as tf
tf.random.set_seed(seed)
```
### Debugging tests

View File

@ -445,13 +445,6 @@ CUDA_VISIBLE_DEVICES="1" pytest tests/utils/test_logging.py
def test_example_with_multi_gpu():
```
テストに `tensorflow` が必要な場合は、`require_tf` デコレータを使用します。例えば:
```python no-style
@require_tf
def test_tf_thing_with_tensorflow():
```
これらのデコレータは積み重ねることができます。たとえば、テストが遅く、pytorch で少なくとも 1 つの GPU が必要な場合は、次のようになります。
設定方法:
@ -1135,9 +1128,6 @@ if torch.cuda.is_available():
import numpy as np
np.random.seed(seed)
# tf RNG
tf.random.set_seed(seed)
```

View File

@ -473,13 +473,6 @@ GPU 요구 사항을 표로 정리하면 아래와 같습니디ㅏ:
def test_example_with_multi_gpu():
```
`tensorflow`가 필요한 경우 `require_tf` 데코레이터를 사용합니다. 예를 들어 다음과 같습니다:
```python no-style
@require_tf
def test_tf_thing_with_tensorflow():
```
이러한 데코레이터는 중첩될 수 있습니다.
예를 들어, 느린 테스트로 진행되고 pytorch에서 적어도 하나의 GPU가 필요한 경우 다음과 같이 설정할 수 있습니다:

View File

@ -60,7 +60,7 @@ from transformers.utils import check_min_version, send_example_telemetry
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
Array = Any
Dataset = datasets.arrow_dataset.Dataset

View File

@ -59,7 +59,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risk.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/flax/speech-recognition/requirements.txt")

View File

@ -264,7 +264,6 @@ class ExamplesTests(TestCasePlus):
--dataset_config clean
--train_split_name validation
--eval_split_name validation
--trust_remote_code
--output_dir {tmp_dir}
--overwrite_output_dir
--num_train_epochs=2

View File

@ -55,7 +55,7 @@ from transformers.utils import check_min_version, send_example_telemetry
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
Array = Any
Dataset = datasets.arrow_dataset.Dataset

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/token-classification/requirements.txt")

View File

@ -14,6 +14,7 @@ class MyNewModelConfig(PretrainedConfig):
This is the configuration class to store the configuration of a [`MyNewModelModel`]. It is used to instantiate an MyNewModel
model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
defaults will yield a similar configuration to that of the MyNewModel-7B.
e.g. [meta-my_new_model/MyNewModel-2-7b-hf](https://huggingface.co/meta-my_new_model/MyNewModel-2-7b-hf)
Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
documentation from [`PretrainedConfig`] for more information.

View File

@ -4,37 +4,25 @@
# the file from the modular. If any change should be done, please apply the change to the
# modular_dummy.py file directly. One of our CI enforces this.
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
from typing import Callable, Optional, Union
from typing import Callable, Optional
import torch
from torch import nn
from ...activations import ACT2FN
from ...cache_utils import Cache, DynamicCache, StaticCache
from ...cache_utils import Cache, DynamicCache
from ...integrations import use_kernel_forward_from_hub
from ...modeling_attn_mask_utils import AttentionMaskConverter
from ...masking_utils import create_causal_mask
from ...modeling_flash_attention_utils import FlashAttentionKwargs
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutputWithPast
from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...processing_utils import Unpack
from ...utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
can_return_tuple,
is_torch_flex_attn_available,
logging,
)
from ...utils import auto_docstring, can_return_tuple, logging
from .configuration_dummy import DummyConfig
if is_torch_flex_attn_available():
from torch.nn.attention.flex_attention import BlockMask
from ...integrations.flex_attention import make_flex_block_causal_mask
logger = logging.get_logger(__name__)
@ -232,15 +220,8 @@ class DummyAttention(nn.Module):
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
attention_interface: Callable = eager_attention_forward
if self.config._attn_implementation != "eager":
if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
logger.warning_once(
"`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)
else:
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attn_output, attn_weights = attention_interface(
self,
@ -311,27 +292,7 @@ class DummyDecoderLayer(GradientCheckpointingLayer):
return outputs
DUMMY_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`DummyConfig`]):
Model configuration class with all the parameters of the model. Initializing with a config file does not
load the weights associated with the model, only the configuration. Check out the
[`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
@add_start_docstrings(
"The bare Dummy Model outputting raw hidden-states without any specific head on top.",
DUMMY_START_DOCSTRING,
)
@auto_docstring
class DummyPreTrainedModel(PreTrainedModel):
config_class = DummyConfig
base_model_prefix = "model"
@ -360,88 +321,8 @@ class DummyPreTrainedModel(PreTrainedModel):
module.weight.data.fill_(1.0)
DUMMY_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
it.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length) or `BlockMask`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
If the model is configured to use flex_attention, it will attempt to convert the mask Tensor into a BlockMask,
but you can also pass a `BlockMask` object directly here.
[What are attention masks?](../glossary#attention-mask)
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
`past_key_values`).
If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
and modify to your needs. See diagram 1 in [the paper](https://huggingface.co/papers/1910.13461) for more
information on the default strategy.
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.n_positions - 1]`.
[What are position IDs?](../glossary#position-ids)
past_key_values (`Cache`, *optional*):
Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
of shape `(batch_size, sequence_length)`.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
the complete sequence length.
"""
@add_start_docstrings(
"The bare Dummy Model outputting raw hidden-states without any specific head on top.",
DUMMY_START_DOCSTRING,
)
@auto_docstring
class DummyModel(DummyPreTrainedModel):
"""
Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`DummyDecoderLayer`]
Args:
config: DummyConfig
"""
def __init__(self, config: DummyConfig):
super().__init__(config)
self.padding_idx = config.pad_token_id
@ -465,7 +346,7 @@ class DummyModel(DummyPreTrainedModel):
self.embed_tokens = value
@can_return_tuple
@add_start_docstrings_to_model_forward(DUMMY_INPUTS_DOCSTRING)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
@ -513,8 +394,12 @@ class DummyModel(DummyPreTrainedModel):
if position_ids is None:
position_ids = cache_position.unsqueeze(0)
causal_mask = self._update_causal_mask(
attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
causal_mask = create_causal_mask(
config=self.config,
input_embeds=inputs_embeds,
attention_mask=attention_mask,
cache_position=cache_position,
past_key_values=past_key_values,
)
hidden_states = inputs_embeds
@ -559,126 +444,3 @@ class DummyModel(DummyPreTrainedModel):
hidden_states=all_hidden_states,
attentions=all_self_attns,
)
def _update_causal_mask(
self,
attention_mask: Union[torch.Tensor, "BlockMask"],
input_tensor: torch.Tensor,
cache_position: torch.Tensor,
past_key_values: Cache,
output_attentions: bool = False,
):
if self.config._attn_implementation == "flash_attention_2":
if attention_mask is not None and (attention_mask == 0.0).any():
return attention_mask
return None
if self.config._attn_implementation == "flex_attention":
if isinstance(attention_mask, torch.Tensor):
attention_mask = make_flex_block_causal_mask(attention_mask)
return attention_mask
# For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
# order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
# to infer the attention mask.
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
using_static_cache = isinstance(past_key_values, StaticCache)
# When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
if AttentionMaskConverter._ignore_causal_mask_sdpa(
attention_mask,
inputs_embeds=input_tensor,
past_key_values_length=past_seen_tokens,
is_training=self.training,
):
return None
dtype = input_tensor.dtype
sequence_length = input_tensor.shape[1]
if using_static_cache:
target_length = past_key_values.get_max_cache_shape()
else:
target_length = (
attention_mask.shape[-1]
if isinstance(attention_mask, torch.Tensor)
else past_seen_tokens + sequence_length + 1
)
# In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
attention_mask,
sequence_length=sequence_length,
target_length=target_length,
dtype=dtype,
cache_position=cache_position,
batch_size=input_tensor.shape[0],
)
if (
self.config._attn_implementation == "sdpa"
and attention_mask is not None
and attention_mask.device.type in ["cuda", "xpu", "npu"]
and not output_attentions
):
# Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
# using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
# Details: https://github.com/pytorch/pytorch/issues/110213
min_dtype = torch.finfo(dtype).min
causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
return causal_mask
@staticmethod
def _prepare_4d_causal_attention_mask_with_cache_position(
attention_mask: torch.Tensor,
sequence_length: int,
target_length: int,
dtype: torch.dtype,
cache_position: torch.Tensor,
batch_size: int,
**kwargs,
):
"""
Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
`(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
Args:
attention_mask (`torch.Tensor`):
A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
`(batch_size, 1, query_length, key_value_length)`.
sequence_length (`int`):
The sequence length being processed.
target_length (`int`):
The target length: when generating with static cache, the mask should be as long as the static cache,
to account for the 0 padding, the part of the cache that is not filled yet.
dtype (`torch.dtype`):
The dtype to use for the 4D attention mask.
cache_position (`torch.Tensor`):
Indices depicting the position of the input sequence tokens in the sequence.
batch_size (`torch.Tensor`):
Batch size.
"""
if attention_mask is not None and attention_mask.dim() == 4:
# In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
causal_mask = attention_mask
else:
min_dtype = torch.finfo(dtype).min
causal_mask = torch.full(
(sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=cache_position.device
)
if sequence_length != 1:
causal_mask = torch.triu(causal_mask, diagonal=1)
causal_mask *= torch.arange(target_length, device=cache_position.device) > cache_position.reshape(-1, 1)
causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
if attention_mask is not None:
causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
mask_length = attention_mask.shape[-1]
padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
causal_mask.device
)
padding_mask = padding_mask == 0
causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
padding_mask, min_dtype
)
return causal_mask

View File

@ -14,24 +14,16 @@ from torch import nn
from ...activations import ACT2FN
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask_for_sdpa, _prepare_4d_causal_attention_mask_for_sdpa
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import (
add_code_sample_docstrings,
add_start_docstrings,
add_start_docstrings_to_model_forward,
get_torch_version,
logging,
)
from ...utils import auto_docstring, get_torch_version, logging
from .configuration_dummy_bert import DummyBertConfig
logger = logging.get_logger(__name__)
_CHECKPOINT_FOR_DOC = "google-dummy_bert/dummy_bert-base-uncased"
_CONFIG_FOR_DOC = "DummyBertConfig"
class DummyBertEmbeddings(nn.Module):
"""Construct the embeddings from word, position and token_type embeddings."""
@ -432,7 +424,7 @@ class DummyBertOutput(nn.Module):
return hidden_states
class DummyBertLayer(nn.Module):
class DummyBertLayer(GradientCheckpointingLayer):
def __init__(self, config):
super().__init__()
self.chunk_size_feed_forward = config.chunk_size_feed_forward
@ -557,27 +549,15 @@ class DummyBertEncoder(nn.Module):
layer_head_mask = head_mask[i] if head_mask is not None else None
past_key_value = past_key_values[i] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
layer_outputs = self._gradient_checkpointing_func(
layer_module.__call__,
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states,
encoder_attention_mask,
past_key_value,
output_attentions,
)
else:
layer_outputs = layer_module(
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states,
encoder_attention_mask,
past_key_value,
output_attentions,
)
layer_outputs = layer_module(
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states, # as a positional argument for gradient checkpointing
encoder_attention_mask=encoder_attention_mask,
past_key_value=past_key_value,
output_attentions=output_attentions,
)
hidden_states = layer_outputs[0]
if use_cache:
@ -739,12 +719,8 @@ def load_tf_weights_in_dummy_bert(model, config, tf_checkpoint_path):
return model
@auto_docstring
class DummyBertPreTrainedModel(PreTrainedModel):
"""
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
models.
"""
config_class = DummyBertConfig
load_tf_weights = load_tf_weights_in_dummy_bert
base_model_prefix = "dummy_bert"
@ -770,79 +746,8 @@ class DummyBertPreTrainedModel(PreTrainedModel):
module.bias.data.zero_()
DUMMY_BERT_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`DummyBertConfig`]): Model configuration class with all the parameters of the model.
Initializing with a config file does not load the weights associated with the model, only the
configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
DUMMY_BERT_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `({0})`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.FloatTensor` of shape `({0})`or `(batch_size, sequence_length, target_length)`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
[What are attention masks?](../glossary#attention-mask)
token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
1]`:
- 0 corresponds to a *sentence A* token,
- 1 corresponds to a *sentence B* token.
[What are token type IDs?](../glossary#token-type-ids)
position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.max_position_embeddings - 1]`.
[What are position IDs?](../glossary#position-ids)
head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""
@add_start_docstrings(
"The bare DummyBert Model transformer outputting raw hidden-states without any specific head on top.",
DUMMY_BERT_START_DOCSTRING,
)
class DummyBertModel(DummyBertPreTrainedModel):
"""
@auto_docstring(
custom_intro="""
The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
cross-attention is added between the self-attention layers, following the architecture described in [Attention is
all you need](https://huggingface.co/papers/1706.03762) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
@ -852,10 +757,15 @@ class DummyBertModel(DummyBertPreTrainedModel):
to `True`. To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder` argument and
`add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass.
"""
)
class DummyBertModel(DummyBertPreTrainedModel):
_no_split_modules = ["DummyBertEmbeddings", "DummyBertLayer"]
def __init__(self, config, add_pooling_layer=True):
r"""
add_pooling_layer (bool, *optional*, defaults to `True`):
Whether to add a pooling layer
"""
super().__init__(config)
self.config = config
@ -884,12 +794,7 @@ class DummyBertModel(DummyBertPreTrainedModel):
for layer, heads in heads_to_prune.items():
self.encoder.layer[layer].attention.prune_heads(heads)
@add_start_docstrings_to_model_forward(DUMMY_BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_DOC,
output_type=BaseModelOutputWithPoolingAndCrossAttentions,
config_class=_CONFIG_FOR_DOC,
)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
@ -906,26 +811,6 @@ class DummyBertModel(DummyBertPreTrainedModel):
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[tuple[torch.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
r"""
encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
the model is configured as a decoder.
encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)` or `(batch_size, sequence_length, target_length)`, *optional*):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
past_key_values (`tuple(tuple(torch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states

View File

@ -10,6 +10,7 @@ import torch
from torch import nn
from ...activations import ACT2FN
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS
from ...utils import logging
from .configuration_from_uppercase_model import FromUppercaseModelTextConfig, FromUppercaseModelVisionConfig
@ -138,7 +139,7 @@ class FromUppercaseModelMLP(nn.Module):
return hidden_states
class FromUppercaseModelEncoderLayer(nn.Module):
class FromUppercaseModelEncoderLayer(GradientCheckpointingLayer):
def __init__(self, config: Union[FromUppercaseModelVisionConfig, FromUppercaseModelTextConfig]):
super().__init__()
self.embed_dim = config.hidden_size

View File

@ -4,37 +4,25 @@
# the file from the modular. If any change should be done, please apply the change to the
# modular_multimodal1.py file directly. One of our CI enforces this.
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
from typing import Callable, Optional, Union
from typing import Callable, Optional
import torch
from torch import nn
from ...activations import ACT2FN
from ...cache_utils import Cache, DynamicCache, StaticCache
from ...cache_utils import Cache, DynamicCache
from ...integrations import use_kernel_forward_from_hub
from ...modeling_attn_mask_utils import AttentionMaskConverter
from ...masking_utils import create_causal_mask
from ...modeling_flash_attention_utils import FlashAttentionKwargs
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutputWithPast
from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...processing_utils import Unpack
from ...utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
can_return_tuple,
is_torch_flex_attn_available,
logging,
)
from ...utils import auto_docstring, can_return_tuple, logging
from .configuration_multimodal1 import Multimodal1TextConfig
if is_torch_flex_attn_available():
from torch.nn.attention.flex_attention import BlockMask
from ...integrations.flex_attention import make_flex_block_causal_mask
logger = logging.get_logger(__name__)
@ -232,15 +220,8 @@ class Multimodal1TextAttention(nn.Module):
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
attention_interface: Callable = eager_attention_forward
if self.config._attn_implementation != "eager":
if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
logger.warning_once(
"`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)
else:
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attn_output, attn_weights = attention_interface(
self,
@ -311,27 +292,7 @@ class Multimodal1TextDecoderLayer(GradientCheckpointingLayer):
return outputs
MULTIMODAL1_TEXT_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`Multimodal1TextConfig`]):
Model configuration class with all the parameters of the model. Initializing with a config file does not
load the weights associated with the model, only the configuration. Check out the
[`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
@add_start_docstrings(
"The bare Multimodal1Text Model outputting raw hidden-states without any specific head on top.",
MULTIMODAL1_TEXT_START_DOCSTRING,
)
@auto_docstring
class Multimodal1TextPreTrainedModel(PreTrainedModel):
config_class = Multimodal1TextConfig
base_model_prefix = "model"
@ -360,88 +321,8 @@ class Multimodal1TextPreTrainedModel(PreTrainedModel):
module.weight.data.fill_(1.0)
MULTIMODAL1_TEXT_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
it.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length) or `BlockMask`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
If the model is configured to use flex_attention, it will attempt to convert the mask Tensor into a BlockMask,
but you can also pass a `BlockMask` object directly here.
[What are attention masks?](../glossary#attention-mask)
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
`past_key_values`).
If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
and modify to your needs. See diagram 1 in [the paper](https://huggingface.co/papers/1910.13461) for more
information on the default strategy.
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.n_positions - 1]`.
[What are position IDs?](../glossary#position-ids)
past_key_values (`Cache`, *optional*):
Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
of shape `(batch_size, sequence_length)`.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
the complete sequence length.
"""
@add_start_docstrings(
"The bare Multimodal1Text Model outputting raw hidden-states without any specific head on top.",
MULTIMODAL1_TEXT_START_DOCSTRING,
)
@auto_docstring
class Multimodal1TextModel(Multimodal1TextPreTrainedModel):
"""
Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`Multimodal1TextDecoderLayer`]
Args:
config: Multimodal1TextConfig
"""
def __init__(self, config: Multimodal1TextConfig):
super().__init__(config)
self.padding_idx = config.pad_token_id
@ -465,7 +346,7 @@ class Multimodal1TextModel(Multimodal1TextPreTrainedModel):
self.embed_tokens = value
@can_return_tuple
@add_start_docstrings_to_model_forward(MULTIMODAL1_TEXT_INPUTS_DOCSTRING)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
@ -513,8 +394,12 @@ class Multimodal1TextModel(Multimodal1TextPreTrainedModel):
if position_ids is None:
position_ids = cache_position.unsqueeze(0)
causal_mask = self._update_causal_mask(
attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
causal_mask = create_causal_mask(
config=self.config,
input_embeds=inputs_embeds,
attention_mask=attention_mask,
cache_position=cache_position,
past_key_values=past_key_values,
)
hidden_states = inputs_embeds
@ -559,126 +444,3 @@ class Multimodal1TextModel(Multimodal1TextPreTrainedModel):
hidden_states=all_hidden_states,
attentions=all_self_attns,
)
def _update_causal_mask(
self,
attention_mask: Union[torch.Tensor, "BlockMask"],
input_tensor: torch.Tensor,
cache_position: torch.Tensor,
past_key_values: Cache,
output_attentions: bool = False,
):
if self.config._attn_implementation == "flash_attention_2":
if attention_mask is not None and (attention_mask == 0.0).any():
return attention_mask
return None
if self.config._attn_implementation == "flex_attention":
if isinstance(attention_mask, torch.Tensor):
attention_mask = make_flex_block_causal_mask(attention_mask)
return attention_mask
# For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
# order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
# to infer the attention mask.
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
using_static_cache = isinstance(past_key_values, StaticCache)
# When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
if AttentionMaskConverter._ignore_causal_mask_sdpa(
attention_mask,
inputs_embeds=input_tensor,
past_key_values_length=past_seen_tokens,
is_training=self.training,
):
return None
dtype = input_tensor.dtype
sequence_length = input_tensor.shape[1]
if using_static_cache:
target_length = past_key_values.get_max_cache_shape()
else:
target_length = (
attention_mask.shape[-1]
if isinstance(attention_mask, torch.Tensor)
else past_seen_tokens + sequence_length + 1
)
# In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
attention_mask,
sequence_length=sequence_length,
target_length=target_length,
dtype=dtype,
cache_position=cache_position,
batch_size=input_tensor.shape[0],
)
if (
self.config._attn_implementation == "sdpa"
and attention_mask is not None
and attention_mask.device.type in ["cuda", "xpu", "npu"]
and not output_attentions
):
# Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
# using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
# Details: https://github.com/pytorch/pytorch/issues/110213
min_dtype = torch.finfo(dtype).min
causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
return causal_mask
@staticmethod
def _prepare_4d_causal_attention_mask_with_cache_position(
attention_mask: torch.Tensor,
sequence_length: int,
target_length: int,
dtype: torch.dtype,
cache_position: torch.Tensor,
batch_size: int,
**kwargs,
):
"""
Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
`(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
Args:
attention_mask (`torch.Tensor`):
A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
`(batch_size, 1, query_length, key_value_length)`.
sequence_length (`int`):
The sequence length being processed.
target_length (`int`):
The target length: when generating with static cache, the mask should be as long as the static cache,
to account for the 0 padding, the part of the cache that is not filled yet.
dtype (`torch.dtype`):
The dtype to use for the 4D attention mask.
cache_position (`torch.Tensor`):
Indices depicting the position of the input sequence tokens in the sequence.
batch_size (`torch.Tensor`):
Batch size.
"""
if attention_mask is not None and attention_mask.dim() == 4:
# In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
causal_mask = attention_mask
else:
min_dtype = torch.finfo(dtype).min
causal_mask = torch.full(
(sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=cache_position.device
)
if sequence_length != 1:
causal_mask = torch.triu(causal_mask, diagonal=1)
causal_mask *= torch.arange(target_length, device=cache_position.device) > cache_position.reshape(-1, 1)
causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
if attention_mask is not None:
causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
mask_length = attention_mask.shape[-1]
padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
causal_mask.device
)
padding_mask = padding_mask == 0
causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
padding_mask, min_dtype
)
return causal_mask

View File

@ -13,15 +13,10 @@ from torch import nn
from transformers.utils import add_start_docstrings
from ...activations import ACT2FN
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutput, BaseModelOutputWithPooling
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...utils import (
add_start_docstrings_to_model_forward,
can_return_tuple,
logging,
replace_return_docstrings,
torch_int,
)
from ...utils import auto_docstring, can_return_tuple, logging, torch_int
from .configuration_multimodal2 import Multimodal2Config, Multimodal2TextConfig, Multimodal2VisionConfig
@ -229,7 +224,7 @@ class Multimodal2Attention(nn.Module):
return attn_output, attn_weights
class Multimodal2VisionEncoderLayer(nn.Module):
class Multimodal2VisionEncoderLayer(GradientCheckpointingLayer):
def __init__(self, config):
super().__init__()
self.embed_dim = config.hidden_size
@ -344,21 +339,12 @@ class Multimodal2VisionEncoder(nn.Module):
for idx, encoder_layer in enumerate(self.layers):
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
if self.gradient_checkpointing and self.training:
layer_outputs = self._gradient_checkpointing_func(
encoder_layer.__call__,
hidden_states,
attention_mask,
causal_attention_mask,
output_attentions,
)
else:
layer_outputs = encoder_layer(
hidden_states,
attention_mask,
causal_attention_mask,
output_attentions=output_attentions,
)
layer_outputs = encoder_layer(
hidden_states,
attention_mask,
causal_attention_mask,
output_attentions=output_attentions,
)
hidden_states = layer_outputs[0]
@ -458,24 +444,6 @@ class Multimodal2VisionEmbeddings(nn.Module):
return embeddings
MULTIMODAL2_VISION_INPUTS_DOCSTRING = r"""
Args:
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
Pixel values. Padding will be ignored by default should you provide it. Pixel values can be obtained using
[`AutoImageProcessor`]. See [`Multimodal2ImageProcessor.__call__`] for details.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
interpolate_pos_encoding (`bool`, *optional*, defaults `False`):
Whether to interpolate the pre-trained position encodings.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""
class Multimodal2VisionTransformer(nn.Module):
def __init__(self, config):
super().__init__()
@ -488,8 +456,7 @@ class Multimodal2VisionTransformer(nn.Module):
self.post_layernorm = nn.LayerNorm(embed_dim, eps=config.layer_norm_eps)
@can_return_tuple
@add_start_docstrings_to_model_forward(MULTIMODAL2_VISION_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=Multimodal2VisionConfig)
@auto_docstring
def forward(
self,
pixel_values: Optional[torch.FloatTensor] = None,
@ -497,10 +464,6 @@ class Multimodal2VisionTransformer(nn.Module):
output_hidden_states: Optional[bool] = None,
interpolate_pos_encoding: Optional[bool] = False,
) -> BaseModelOutputWithPooling:
r"""
Returns:
"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
@ -530,17 +493,15 @@ class Multimodal2VisionTransformer(nn.Module):
)
@auto_docstring
class Multimodal2VisionPreTrainedModel(PreTrainedModel):
"""
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
models.
"""
config_class = Multimodal2Config
base_model_prefix = "multimodal2_vision"
supports_gradient_checkpointing = True
_supports_sdpa = True
_supports_flash_attn_2 = True
_supports_flex_attn = True
_supports_attention_backend = True
def _init_weights(self, module):
"""Initialize the weights"""
@ -567,8 +528,7 @@ class Multimodal2VisionModel(Multimodal2VisionPreTrainedModel):
return self.vision_model.embeddings.patch_embedding
@can_return_tuple
@add_start_docstrings_to_model_forward(MULTIMODAL2_VISION_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=BaseModelOutputWithPooling, config_class=Multimodal2VisionConfig)
@auto_docstring
def forward(
self,
pixel_values: Optional[torch.FloatTensor] = None,
@ -577,9 +537,7 @@ class Multimodal2VisionModel(Multimodal2VisionPreTrainedModel):
interpolate_pos_encoding: bool = False,
) -> BaseModelOutputWithPooling:
r"""
Returns:
Examples:
Example:
```python
>>> from PIL import Image

View File

@ -4,36 +4,24 @@
# the file from the modular. If any change should be done, please apply the change to the
# modular_my_new_model2.py file directly. One of our CI enforces this.
# 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
from typing import Callable, Optional, Union
from typing import Callable, Optional
import torch
from torch import nn
from ...activations import ACT2FN
from ...cache_utils import Cache, DynamicCache, StaticCache
from ...modeling_attn_mask_utils import AttentionMaskConverter
from ...cache_utils import Cache, DynamicCache
from ...masking_utils import create_causal_mask
from ...modeling_flash_attention_utils import FlashAttentionKwargs
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutputWithPast, SequenceClassifierOutputWithPast
from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...processing_utils import Unpack
from ...utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
can_return_tuple,
is_torch_flex_attn_available,
logging,
)
from ...utils import auto_docstring, can_return_tuple, logging
from .configuration_my_new_model2 import MyNewModel2Config
if is_torch_flex_attn_available():
from torch.nn.attention.flex_attention import BlockMask
from ...integrations.flex_attention import make_flex_block_causal_mask
logger = logging.get_logger(__name__)
@ -230,15 +218,8 @@ class MyNewModel2Attention(nn.Module):
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
attention_interface: Callable = eager_attention_forward
if self.config._attn_implementation != "eager":
if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
logger.warning_once(
"`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)
else:
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attn_output, attn_weights = attention_interface(
self,
@ -309,27 +290,7 @@ class MyNewModel2DecoderLayer(GradientCheckpointingLayer):
return outputs
MY_NEW_MODEL2_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`MyNewModel2Config`]):
Model configuration class with all the parameters of the model. Initializing with a config file does not
load the weights associated with the model, only the configuration. Check out the
[`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
@add_start_docstrings(
"The bare MyNewModel2 Model outputting raw hidden-states without any specific head on top.",
MY_NEW_MODEL2_START_DOCSTRING,
)
@auto_docstring
class MyNewModel2PreTrainedModel(PreTrainedModel):
config_class = MyNewModel2Config
base_model_prefix = "model"
@ -358,88 +319,8 @@ class MyNewModel2PreTrainedModel(PreTrainedModel):
module.weight.data.fill_(1.0)
MY_NEW_MODEL2_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
it.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length) or `BlockMask`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
If the model is configured to use flex_attention, it will attempt to convert the mask Tensor into a BlockMask,
but you can also pass a `BlockMask` object directly here.
[What are attention masks?](../glossary#attention-mask)
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
`past_key_values`).
If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
and modify to your needs. See diagram 1 in [the paper](https://huggingface.co/papers/1910.13461) for more
information on the default strategy.
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.n_positions - 1]`.
[What are position IDs?](../glossary#position-ids)
past_key_values (`Cache`, *optional*):
Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
of shape `(batch_size, sequence_length)`.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
the complete sequence length.
"""
@add_start_docstrings(
"The bare MyNewModel2 Model outputting raw hidden-states without any specific head on top.",
MY_NEW_MODEL2_START_DOCSTRING,
)
@auto_docstring
class MyNewModel2Model(MyNewModel2PreTrainedModel):
"""
Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`MyNewModel2DecoderLayer`]
Args:
config: MyNewModel2Config
"""
def __init__(self, config: MyNewModel2Config):
super().__init__(config)
self.padding_idx = config.pad_token_id
@ -463,19 +344,19 @@ class MyNewModel2Model(MyNewModel2PreTrainedModel):
self.embed_tokens = value
@can_return_tuple
@add_start_docstrings_to_model_forward(MY_NEW_MODEL2_INPUTS_DOCSTRING)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_values: Optional[Union[Cache, list[torch.FloatTensor]]] = None,
past_key_values: Optional[Cache] = None,
inputs_embeds: Optional[torch.FloatTensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
cache_position: Optional[torch.LongTensor] = None,
**kwargs, # NOOP kwarg for now
**kwargs: Unpack[FlashAttentionKwargs],
) -> BaseModelOutputWithPast:
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
@ -507,8 +388,12 @@ class MyNewModel2Model(MyNewModel2PreTrainedModel):
if position_ids is None:
position_ids = cache_position.unsqueeze(0)
causal_mask = self._update_causal_mask(
attention_mask, inputs_embeds, cache_position, past_key_values, output_attentions
causal_mask = create_causal_mask(
config=self.config,
input_embeds=inputs_embeds,
attention_mask=attention_mask,
cache_position=cache_position,
past_key_values=past_key_values,
)
# embed positions
@ -540,6 +425,7 @@ class MyNewModel2Model(MyNewModel2PreTrainedModel):
use_cache=use_cache,
cache_position=cache_position,
position_embeddings=position_embeddings,
**kwargs,
)
hidden_states = layer_outputs[0]
@ -560,132 +446,9 @@ class MyNewModel2Model(MyNewModel2PreTrainedModel):
attentions=all_self_attns,
)
def _update_causal_mask(
self,
attention_mask: Union[torch.Tensor, "BlockMask"],
input_tensor: torch.Tensor,
cache_position: torch.Tensor,
past_key_values: Cache,
output_attentions: bool = False,
):
if self.config._attn_implementation == "flash_attention_2":
if attention_mask is not None and (attention_mask == 0.0).any():
return attention_mask
return None
if self.config._attn_implementation == "flex_attention":
if isinstance(attention_mask, torch.Tensor):
attention_mask = make_flex_block_causal_mask(attention_mask)
return attention_mask
# For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
# order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
# to infer the attention mask.
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
using_static_cache = isinstance(past_key_values, StaticCache)
# When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
if AttentionMaskConverter._ignore_causal_mask_sdpa(
attention_mask,
inputs_embeds=input_tensor,
past_key_values_length=past_seen_tokens,
is_training=self.training,
):
return None
dtype = input_tensor.dtype
sequence_length = input_tensor.shape[1]
if using_static_cache:
target_length = past_key_values.get_max_cache_shape()
else:
target_length = (
attention_mask.shape[-1]
if isinstance(attention_mask, torch.Tensor)
else past_seen_tokens + sequence_length + 1
)
# In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
attention_mask,
sequence_length=sequence_length,
target_length=target_length,
dtype=dtype,
cache_position=cache_position,
batch_size=input_tensor.shape[0],
)
if (
self.config._attn_implementation == "sdpa"
and attention_mask is not None
and attention_mask.device.type in ["cuda", "xpu", "npu"]
and not output_attentions
):
# Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
# using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
# Details: https://github.com/pytorch/pytorch/issues/110213
min_dtype = torch.finfo(dtype).min
causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
return causal_mask
@staticmethod
def _prepare_4d_causal_attention_mask_with_cache_position(
attention_mask: torch.Tensor,
sequence_length: int,
target_length: int,
dtype: torch.dtype,
cache_position: torch.Tensor,
batch_size: int,
**kwargs,
):
"""
Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
`(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
Args:
attention_mask (`torch.Tensor`):
A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
`(batch_size, 1, query_length, key_value_length)`.
sequence_length (`int`):
The sequence length being processed.
target_length (`int`):
The target length: when generating with static cache, the mask should be as long as the static cache,
to account for the 0 padding, the part of the cache that is not filled yet.
dtype (`torch.dtype`):
The dtype to use for the 4D attention mask.
cache_position (`torch.Tensor`):
Indices depicting the position of the input sequence tokens in the sequence.
batch_size (`torch.Tensor`):
Batch size.
"""
if attention_mask is not None and attention_mask.dim() == 4:
# In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
causal_mask = attention_mask
else:
min_dtype = torch.finfo(dtype).min
causal_mask = torch.full(
(sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=cache_position.device
)
if sequence_length != 1:
causal_mask = torch.triu(causal_mask, diagonal=1)
causal_mask *= torch.arange(target_length, device=cache_position.device) > cache_position.reshape(-1, 1)
causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
if attention_mask is not None:
causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
mask_length = attention_mask.shape[-1]
padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
causal_mask.device
)
padding_mask = padding_mask == 0
causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
padding_mask, min_dtype
)
return causal_mask
@add_start_docstrings(
"""
@auto_docstring(
custom_intro="""
The MyNewModel2 Model transformer with a sequence classification head on top (linear layer).
[`MyNewModel2ForSequenceClassification`] uses the last token in order to do the classification, as other causal models
@ -696,8 +459,7 @@ class MyNewModel2Model(MyNewModel2PreTrainedModel):
no `pad_token_id` is defined, it simply takes the last value in each row of the batch. Since it cannot guess the
padding tokens when `inputs_embeds` are passed instead of `input_ids`, it does the same (take the last value in
each row of the batch).
""",
MY_NEW_MODEL2_START_DOCSTRING,
"""
)
class MyNewModel2ForSequenceClassification(MyNewModel2PreTrainedModel):
def __init__(self, config):
@ -716,7 +478,7 @@ class MyNewModel2ForSequenceClassification(MyNewModel2PreTrainedModel):
self.model.embed_tokens = value
@can_return_tuple
@add_start_docstrings_to_model_forward(MY_NEW_MODEL2_INPUTS_DOCSTRING)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.LongTensor] = None,

View File

@ -22,68 +22,48 @@ from .configuration_new_task_model import NewTaskModelConfig
@dataclass
class NewTaskModelModelOutputWithPast(BaseModelOutputWithPast):
"""
@auto_docstring(
custom_intro="""
Base class for NewTaskModel outputs, with hidden states and attentions.
"""
)
class NewTaskModelModelOutputWithPast(BaseModelOutputWithPast):
r"""
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
`(batch_size, num_heads, sequence_length, embed_size_per_head)`)
Args:
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
`(batch_size, num_heads, sequence_length, embed_size_per_head)`)
Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
`past_key_values` input) to speed up sequential decoding.
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, +
one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
image_hidden_states (`torch.FloatTensor`, *optional*):
A `torch.FloatTensor` of size `(batch_size, num_images, sequence_length, hidden_size)`.
image_hidden_states of the model produced by the vision encoder and after projecting the last hidden state.
Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
`past_key_values` input) to speed up sequential decoding.
image_hidden_states (`torch.FloatTensor`, *optional*):
A `torch.FloatTensor` of size `(batch_size, num_images, sequence_length, hidden_size)`.
image_hidden_states of the model produced by the vision encoder and after projecting the last hidden state.
"""
image_hidden_states: Optional[torch.FloatTensor] = None
@dataclass
class NewTaskModelCausalLMOutputWithPast(ModelOutput):
"""
@auto_docstring(
custom_intro="""
Base class for NewTaskModel causal language model (or autoregressive) outputs.
"""
)
class NewTaskModelCausalLMOutputWithPast(ModelOutput):
r"""
loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
Language modeling loss (for next-token prediction).
logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.text_config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
`(batch_size, num_heads, sequence_length, embed_size_per_head)`)
Args:
loss (`torch.FloatTensor` of shape `(1,)`, *optional*, returned when `labels` is provided):
Language modeling loss (for next-token prediction).
logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.text_config.vocab_size)`):
Prediction scores of the language modeling head (scores for each vocabulary token before SoftMax).
past_key_values (`tuple(tuple(torch.FloatTensor))`, *optional*, returned when `use_cache=True` is passed or when `config.use_cache=True`):
Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of shape
`(batch_size, num_heads, sequence_length, embed_size_per_head)`)
Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
`past_key_values` input) to speed up sequential decoding.
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings, if the model has an embedding layer, +
one for the output of each layer) of shape `(batch_size, sequence_length, hidden_size)`.
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`.
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
heads.
image_hidden_states (`torch.FloatTensor`, *optional*):
A `torch.FloatTensor` of size `(batch_size, num_images, sequence_length, hidden_size)`.
image_hidden_states of the model produced by the vision encoder after projecting last hidden state.
Contains pre-computed hidden-states (key and values in the self-attention blocks) that can be used (see
`past_key_values` input) to speed up sequential decoding.
image_hidden_states (`torch.FloatTensor`, *optional*):
A `torch.FloatTensor` of size `(batch_size, num_images, sequence_length, hidden_size)`.
image_hidden_states of the model produced by the vision encoder after projecting last hidden state.
"""
loss: Optional[torch.FloatTensor] = None
@ -157,6 +137,12 @@ class NewTaskModelModel(NewTaskModelPreTrainedModel):
def set_input_embeddings(self, value):
self.language_model.set_input_embeddings(value)
def set_decoder(self, decoder):
self.language_model = decoder
def get_decoder(self):
return self.language_model
def _update_causal_mask(
self,
attention_mask,
@ -406,10 +392,13 @@ class NewTaskModelForNewTask(NewTaskModelPreTrainedModel, GenerationMixin):
self.lm_head = new_embeddings
def set_decoder(self, decoder):
self.model = decoder
self.model.set_decoder(decoder)
def get_decoder(self):
return self.model
return self.model.get_decoder()
def get_image_features(self, pixel_values):
return self.model.get_image_features(pixel_values)
# Make modules available throught conditional class for BC
@property

View File

@ -14,24 +14,16 @@ from packaging import version
from ...activations import ACT2FN
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask_for_sdpa, _prepare_4d_causal_attention_mask_for_sdpa
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutputWithPastAndCrossAttentions, BaseModelOutputWithPoolingAndCrossAttentions
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import apply_chunking_to_forward, find_pruneable_heads_and_indices, prune_linear_layer
from ...utils import (
add_code_sample_docstrings,
add_start_docstrings,
add_start_docstrings_to_model_forward,
get_torch_version,
logging,
)
from ...utils import auto_docstring, get_torch_version, logging
from .configuration_roberta import RobertaConfig
logger = logging.get_logger(__name__)
_CHECKPOINT_FOR_DOC = "google-roberta/roberta-base-uncased"
_CONFIG_FOR_DOC = "RobertaConfig"
class RobertaEmbeddings(nn.Module):
"""Construct the embeddings from word, position and token_type embeddings."""
@ -435,7 +427,7 @@ class RobertaOutput(nn.Module):
return hidden_states
class RobertaLayer(nn.Module):
class RobertaLayer(GradientCheckpointingLayer):
def __init__(self, config):
super().__init__()
self.chunk_size_feed_forward = config.chunk_size_feed_forward
@ -560,27 +552,15 @@ class RobertaEncoder(nn.Module):
layer_head_mask = head_mask[i] if head_mask is not None else None
past_key_value = past_key_values[i] if past_key_values is not None else None
if self.gradient_checkpointing and self.training:
layer_outputs = self._gradient_checkpointing_func(
layer_module.__call__,
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states,
encoder_attention_mask,
past_key_value,
output_attentions,
)
else:
layer_outputs = layer_module(
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states,
encoder_attention_mask,
past_key_value,
output_attentions,
)
layer_outputs = layer_module(
hidden_states,
attention_mask,
layer_head_mask,
encoder_hidden_states, # as a positional argument for gradient checkpointing
encoder_attention_mask=encoder_attention_mask,
past_key_value=past_key_value,
output_attentions=output_attentions,
)
hidden_states = layer_outputs[0]
if use_cache:
@ -742,12 +722,8 @@ def load_tf_weights_in_roberta(model, config, tf_checkpoint_path):
return model
@auto_docstring
class RobertaPreTrainedModel(PreTrainedModel):
"""
An abstract class to handle weights initialization and a simple interface for downloading and loading pretrained
models.
"""
config_class = RobertaConfig
load_tf_weights = load_tf_weights_in_roberta
base_model_prefix = "roberta"
@ -773,79 +749,8 @@ class RobertaPreTrainedModel(PreTrainedModel):
module.bias.data.zero_()
ROBERTA_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`RobertaConfig`]): Model configuration class with all the parameters of the model.
Initializing with a config file does not load the weights associated with the model, only the
configuration. Check out the [`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
ROBERTA_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `({0})`):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.FloatTensor` of shape `({0})`or `(batch_size, sequence_length, target_length)`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
[What are attention masks?](../glossary#attention-mask)
token_type_ids (`torch.LongTensor` of shape `({0})`, *optional*):
Segment token indices to indicate first and second portions of the inputs. Indices are selected in `[0,
1]`:
- 0 corresponds to a *sentence A* token,
- 1 corresponds to a *sentence B* token.
[What are token type IDs?](../glossary#token-type-ids)
position_ids (`torch.LongTensor` of shape `({0})`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.max_position_embeddings - 1]`.
[What are position IDs?](../glossary#position-ids)
head_mask (`torch.FloatTensor` of shape `(num_heads,)` or `(num_layers, num_heads)`, *optional*):
Mask to nullify selected heads of the self-attention modules. Mask values selected in `[0, 1]`:
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
inputs_embeds (`torch.FloatTensor` of shape `({0}, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
"""
@add_start_docstrings(
"The bare Roberta Model transformer outputting raw hidden-states without any specific head on top.",
ROBERTA_START_DOCSTRING,
)
class RobertaModel(RobertaPreTrainedModel):
"""
@auto_docstring(
custom_intro="""
The model can behave as an encoder (with only self-attention) as well as a decoder, in which case a layer of
cross-attention is added between the self-attention layers, following the architecture described in [Attention is
all you need](https://huggingface.co/papers/1706.03762) by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit,
@ -855,10 +760,15 @@ class RobertaModel(RobertaPreTrainedModel):
to `True`. To be used in a Seq2Seq model, the model needs to initialized with both `is_decoder` argument and
`add_cross_attention` set to `True`; an `encoder_hidden_states` is then expected as an input to the forward pass.
"""
)
class RobertaModel(RobertaPreTrainedModel):
_no_split_modules = ["RobertaEmbeddings", "RobertaLayer"]
def __init__(self, config, add_pooling_layer=True):
r"""
add_pooling_layer (bool, *optional*, defaults to `True`):
Whether to add a pooling layer
"""
super().__init__(config)
self.config = config
@ -887,12 +797,7 @@ class RobertaModel(RobertaPreTrainedModel):
for layer, heads in heads_to_prune.items():
self.encoder.layer[layer].attention.prune_heads(heads)
@add_start_docstrings_to_model_forward(ROBERTA_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
@add_code_sample_docstrings(
checkpoint=_CHECKPOINT_FOR_DOC,
output_type=BaseModelOutputWithPoolingAndCrossAttentions,
config_class=_CONFIG_FOR_DOC,
)
@auto_docstring
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
@ -909,26 +814,6 @@ class RobertaModel(RobertaPreTrainedModel):
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
) -> Union[tuple[torch.Tensor], BaseModelOutputWithPoolingAndCrossAttentions]:
r"""
encoder_hidden_states (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
the model is configured as a decoder.
encoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, sequence_length)` or `(batch_size, sequence_length, target_length)`, *optional*):
Mask to avoid performing attention on the padding token indices of the encoder input. This mask is used in
the cross-attention if the model is configured as a decoder. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
past_key_values (`tuple(tuple(torch.FloatTensor))` of length `config.n_layers` with each tuple having 4 tensors of shape `(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
Contains precomputed key and value hidden states of the attention blocks. Can be used to speed up decoding.
If `past_key_values` are used, the user can optionally input only the last `decoder_input_ids` (those that
don't have their past key value states given to this model) of shape `(batch_size, 1)` instead of all
`decoder_input_ids` of shape `(batch_size, sequence_length)`.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
"""
output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
output_hidden_states = (
output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states

View File

@ -12,33 +12,17 @@ from torch import nn
from transformers.modeling_outputs import CausalLMOutputWithPast
from ...activations import ACT2FN
from ...cache_utils import Cache, StaticCache
from ...cache_utils import Cache
from ...integrations import use_kernel_forward_from_hub
from ...modeling_attn_mask_utils import AttentionMaskConverter
from ...modeling_flash_attention_utils import FlashAttentionKwargs
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_rope_utils import ROPE_INIT_FUNCTIONS, dynamic_rope_update
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
from ...processing_utils import Unpack
from ...utils import (
add_start_docstrings,
add_start_docstrings_to_model_forward,
can_return_tuple,
is_torch_flex_attn_available,
logging,
)
from ...utils import auto_docstring, can_return_tuple
from .configuration_super import SuperConfig
if is_torch_flex_attn_available():
from torch.nn.attention.flex_attention import BlockMask
from ...integrations.flex_attention import make_flex_block_causal_mask
logger = logging.get_logger(__name__)
@use_kernel_forward_from_hub("RMSNorm")
class SuperRMSNorm(nn.Module):
def __init__(self, hidden_size, eps=1e-6):
@ -233,15 +217,8 @@ class SuperAttention(nn.Module):
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
attention_interface: Callable = eager_attention_forward
if self.config._attn_implementation != "eager":
if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
logger.warning_once(
"`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)
else:
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attn_output, attn_weights = attention_interface(
self,
@ -312,27 +289,7 @@ class SuperDecoderLayer(GradientCheckpointingLayer):
return outputs
SUPER_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`SuperConfig`]):
Model configuration class with all the parameters of the model. Initializing with a config file does not
load the weights associated with the model, only the configuration. Check out the
[`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
@add_start_docstrings(
"The bare Super Model outputting raw hidden-states without any specific head on top.",
SUPER_START_DOCSTRING,
)
@auto_docstring
class SuperPreTrainedModel(PreTrainedModel):
config_class = SuperConfig
base_model_prefix = "model"
@ -361,88 +318,8 @@ class SuperPreTrainedModel(PreTrainedModel):
module.weight.data.fill_(1.0)
SUPER_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
it.
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
[What are input IDs?](../glossary#input-ids)
attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length) or `BlockMask`, *optional*):
Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
- 1 for tokens that are **not masked**,
- 0 for tokens that are **masked**.
If the model is configured to use flex_attention, it will attempt to convert the mask Tensor into a BlockMask,
but you can also pass a `BlockMask` object directly here.
[What are attention masks?](../glossary#attention-mask)
Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
[`PreTrainedTokenizer.__call__`] for details.
If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
`past_key_values`).
If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
and modify to your needs. See diagram 1 in [the paper](https://huggingface.co/papers/1910.13461) for more
information on the default strategy.
- 1 indicates the head is **not masked**,
- 0 indicates the head is **masked**.
position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
config.n_positions - 1]`.
[What are position IDs?](../glossary#position-ids)
past_key_values (`Cache`, *optional*):
Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
It is a [`~cache_utils.Cache`] instance. For more details, see our [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache).
If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
of shape `(batch_size, sequence_length)`.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
model's internal embedding lookup matrix.
use_cache (`bool`, *optional*):
If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
`past_key_values`).
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
the complete sequence length.
"""
@add_start_docstrings(
"The bare Super Model outputting raw hidden-states without any specific head on top.",
SUPER_START_DOCSTRING,
)
@auto_docstring
class SuperModel(SuperPreTrainedModel):
"""
Transformer decoder consisting of *config.num_hidden_layers* layers. Each layer is a [`SuperDecoderLayer`]
Args:
config: SuperConfig
"""
def __init__(self, config: SuperConfig):
super().__init__(config)
self.padding_idx = config.pad_token_id
@ -466,7 +343,7 @@ class SuperModel(SuperPreTrainedModel):
self.embed_tokens = value
@can_return_tuple
@add_start_docstrings_to_model_forward(SUPER_INPUTS_DOCSTRING)
@auto_docstring
def forward(
self,
input_ids: torch.LongTensor = None,
@ -494,126 +371,3 @@ class SuperModel(SuperPreTrainedModel):
)
out.logits *= 2**4
return out
def _update_causal_mask(
self,
attention_mask: Union[torch.Tensor, "BlockMask"],
input_tensor: torch.Tensor,
cache_position: torch.Tensor,
past_key_values: Cache,
output_attentions: bool = False,
):
if self.config._attn_implementation == "flash_attention_2":
if attention_mask is not None and (attention_mask == 0.0).any():
return attention_mask
return None
if self.config._attn_implementation == "flex_attention":
if isinstance(attention_mask, torch.Tensor):
attention_mask = make_flex_block_causal_mask(attention_mask)
return attention_mask
# For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
# order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
# to infer the attention mask.
past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
using_static_cache = isinstance(past_key_values, StaticCache)
# When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
if AttentionMaskConverter._ignore_causal_mask_sdpa(
attention_mask,
inputs_embeds=input_tensor,
past_key_values_length=past_seen_tokens,
is_training=self.training,
):
return None
dtype = input_tensor.dtype
sequence_length = input_tensor.shape[1]
if using_static_cache:
target_length = past_key_values.get_max_cache_shape()
else:
target_length = (
attention_mask.shape[-1]
if isinstance(attention_mask, torch.Tensor)
else past_seen_tokens + sequence_length + 1
)
# In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
attention_mask,
sequence_length=sequence_length,
target_length=target_length,
dtype=dtype,
cache_position=cache_position,
batch_size=input_tensor.shape[0],
)
if (
self.config._attn_implementation == "sdpa"
and attention_mask is not None
and attention_mask.device.type in ["cuda", "xpu", "npu"]
and not output_attentions
):
# Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
# using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
# Details: https://github.com/pytorch/pytorch/issues/110213
min_dtype = torch.finfo(dtype).min
causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
return causal_mask
@staticmethod
def _prepare_4d_causal_attention_mask_with_cache_position(
attention_mask: torch.Tensor,
sequence_length: int,
target_length: int,
dtype: torch.dtype,
cache_position: torch.Tensor,
batch_size: int,
**kwargs,
):
"""
Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
`(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
Args:
attention_mask (`torch.Tensor`):
A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
`(batch_size, 1, query_length, key_value_length)`.
sequence_length (`int`):
The sequence length being processed.
target_length (`int`):
The target length: when generating with static cache, the mask should be as long as the static cache,
to account for the 0 padding, the part of the cache that is not filled yet.
dtype (`torch.dtype`):
The dtype to use for the 4D attention mask.
cache_position (`torch.Tensor`):
Indices depicting the position of the input sequence tokens in the sequence.
batch_size (`torch.Tensor`):
Batch size.
"""
if attention_mask is not None and attention_mask.dim() == 4:
# In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
causal_mask = attention_mask
else:
min_dtype = torch.finfo(dtype).min
causal_mask = torch.full(
(sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=cache_position.device
)
if sequence_length != 1:
causal_mask = torch.triu(causal_mask, diagonal=1)
causal_mask *= torch.arange(target_length, device=cache_position.device) > cache_position.reshape(-1, 1)
causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
if attention_mask is not None:
causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
mask_length = attention_mask.shape[-1]
padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
causal_mask.device
)
padding_mask = padding_mask == 0
causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
padding_mask, min_dtype
)
return causal_mask

View File

@ -14,13 +14,9 @@ from ...cache_utils import Cache
from ...modeling_flash_attention_utils import FlashAttentionKwargs
from ...modeling_utils import ALL_ATTENTION_FUNCTIONS
from ...processing_utils import Unpack
from ...utils import logging
from .configuration_switch_function import SwitchFunctionConfig
logger = logging.get_logger(__name__)
def rotate_half(x):
# Split and rotate. Note that this function is different from e.g. Llama.
x1 = x[..., ::2]
@ -145,15 +141,8 @@ class SwitchFunctionAttention(nn.Module):
key_states, value_states = past_key_value.update(key_states, value_states, self.layer_idx, cache_kwargs)
attention_interface: Callable = eager_attention_forward
if self.config._attn_implementation != "eager":
if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
logger.warning_once(
"`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
)
else:
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
attn_output, attn_weights = attention_interface(
self,

View File

@ -16,17 +16,11 @@ from torch import Tensor, nn
from ...activations import ACT2FN
from ...integrations import use_kernel_forward_from_hub
from ...modeling_attn_mask_utils import _prepare_4d_attention_mask
from ...modeling_layers import GradientCheckpointingLayer
from ...modeling_outputs import BaseModelOutput
from ...modeling_utils import PreTrainedModel
from ...pytorch_utils import meshgrid
from ...utils import (
ModelOutput,
add_start_docstrings,
add_start_docstrings_to_model_forward,
is_timm_available,
replace_return_docstrings,
requires_backends,
)
from ...utils import ModelOutput, auto_docstring, is_timm_available, requires_backends
from ...utils.backbone_utils import load_backbone
from .configuration_test_detr import TestDetrConfig
@ -34,8 +28,6 @@ from .configuration_test_detr import TestDetrConfig
if is_timm_available():
from timm import create_model
_CONFIG_FOR_DOC = "TestDetrConfig"
@use_kernel_forward_from_hub("MultiScaleDeformableAttention")
class MultiScaleDeformableAttention(nn.Module):
@ -93,32 +85,24 @@ class MultiScaleDeformableAttention(nn.Module):
@dataclass
class TestDetrDecoderOutput(ModelOutput):
"""
@auto_docstring(
custom_intro="""
Base class for outputs of the TestDetrDecoder. This class adds two attributes to
BaseModelOutputWithCrossAttentions, namely:
- a stacked tensor of intermediate decoder hidden states (i.e. the output of each decoder layer)
- a stacked tensor of intermediate reference points.
Args:
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the model.
intermediate_hidden_states (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, hidden_size)`):
Stacked intermediate hidden states (output of each layer of the decoder).
intermediate_reference_points (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, sequence_length, hidden_size)`):
Stacked intermediate reference points (reference points of each layer of the decoder).
hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the model at the output of each layer
plus the initial embedding outputs.
attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`. Attentions weights after the attention softmax, used to compute the weighted average in
the self-attention heads.
cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`. Attentions weights of the decoder's cross-attention layer, after the attention softmax,
used to compute the weighted average in the cross-attention heads.
"""
)
class TestDetrDecoderOutput(ModelOutput):
r"""
intermediate_hidden_states (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, hidden_size)`):
Stacked intermediate hidden states (output of each layer of the decoder).
intermediate_reference_points (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, sequence_length, hidden_size)`):
Stacked intermediate reference points (reference points of each layer of the decoder).
cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` and `config.add_cross_attention=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, sequence_length,
sequence_length)`. Attentions weights of the decoder's cross-attention layer, after the attention softmax,
used to compute the weighted average in the cross-attention heads.
"""
last_hidden_state: Optional[torch.FloatTensor] = None
@ -130,47 +114,27 @@ class TestDetrDecoderOutput(ModelOutput):
@dataclass
class TestDetrModelOutput(ModelOutput):
"""
@auto_docstring(
custom_intro="""
Base class for outputs of the Deformable DETR encoder-decoder model.
Args:
init_reference_points (`torch.FloatTensor` of shape `(batch_size, num_queries, 4)`):
Initial reference points sent through the Transformer decoder.
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, num_queries, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the decoder of the model.
intermediate_hidden_states (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, hidden_size)`):
Stacked intermediate hidden states (output of each layer of the decoder).
intermediate_reference_points (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, 4)`):
Stacked intermediate reference points (reference points of each layer of the decoder).
decoder_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
shape `(batch_size, num_queries, hidden_size)`. Hidden-states of the decoder at the output of each layer
plus the initial embedding outputs.
decoder_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_heads, num_queries,
num_queries)`. Attentions weights of the decoder, after the attention softmax, used to compute the weighted
average in the self-attention heads.
cross_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_queries, num_heads, 4, 4)`.
Attentions weights of the decoder's cross-attention layer, after the attention softmax, used to compute the
weighted average in the cross-attention heads.
encoder_last_hidden_state (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Sequence of hidden-states at the output of the last layer of the encoder of the model.
encoder_hidden_states (`tuple(torch.FloatTensor)`, *optional*, returned when `output_hidden_states=True` is passed or when `config.output_hidden_states=True`):
Tuple of `torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer) of
shape `(batch_size, sequence_length, hidden_size)`. Hidden-states of the encoder at the output of each
layer plus the initial embedding outputs.
encoder_attentions (`tuple(torch.FloatTensor)`, *optional*, returned when `output_attentions=True` is passed or when `config.output_attentions=True`):
Tuple of `torch.FloatTensor` (one for each layer) of shape `(batch_size, num_queries, num_heads, 4, 4)`.
Attentions weights of the encoder, after the attention softmax, used to compute the weighted average in the
self-attention heads.
enc_outputs_class (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`, *optional*, returned when `config.with_box_refine=True` and `config.two_stage=True`):
Predicted bounding boxes scores where the top `config.two_stage_num_proposals` scoring bounding boxes are
picked as region proposals in the first stage. Output of bounding box binary classification (i.e.
foreground and background).
enc_outputs_coord_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, 4)`, *optional*, returned when `config.with_box_refine=True` and `config.two_stage=True`):
Logits of predicted bounding boxes coordinates in the first stage.
"""
)
class TestDetrModelOutput(ModelOutput):
r"""
init_reference_points (`torch.FloatTensor` of shape `(batch_size, num_queries, 4)`):
Initial reference points sent through the Transformer decoder.
last_hidden_state (`torch.FloatTensor` of shape `(batch_size, num_queries, hidden_size)`):
Sequence of hidden-states at the output of the last layer of the decoder of the model.
intermediate_hidden_states (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, hidden_size)`):
Stacked intermediate hidden states (output of each layer of the decoder).
intermediate_reference_points (`torch.FloatTensor` of shape `(batch_size, config.decoder_layers, num_queries, 4)`):
Stacked intermediate reference points (reference points of each layer of the decoder).
enc_outputs_class (`torch.FloatTensor` of shape `(batch_size, sequence_length, config.num_labels)`, *optional*, returned when `config.with_box_refine=True` and `config.two_stage=True`):
Predicted bounding boxes scores where the top `config.two_stage_num_proposals` scoring bounding boxes are
picked as region proposals in the first stage. Output of bounding box binary classification (i.e.
foreground and background).
enc_outputs_coord_logits (`torch.FloatTensor` of shape `(batch_size, sequence_length, 4)`, *optional*, returned when `config.with_box_refine=True` and `config.two_stage=True`):
Logits of predicted bounding boxes coordinates in the first stage.
"""
init_reference_points: Optional[torch.FloatTensor] = None
@ -635,7 +599,7 @@ class TestDetrMultiheadAttention(nn.Module):
return attn_output, attn_weights_reshaped
class TestDetrEncoderLayer(nn.Module):
class TestDetrEncoderLayer(GradientCheckpointingLayer):
def __init__(self, config: TestDetrConfig):
super().__init__()
self.embed_dim = config.d_model
@ -724,7 +688,7 @@ class TestDetrEncoderLayer(nn.Module):
return outputs
class TestDetrDecoderLayer(nn.Module):
class TestDetrDecoderLayer(GradientCheckpointingLayer):
def __init__(self, config: TestDetrConfig):
super().__init__()
self.embed_dim = config.d_model
@ -837,6 +801,7 @@ class TestDetrDecoderLayer(nn.Module):
return outputs
@auto_docstring
class TestDetrPreTrainedModel(PreTrainedModel):
config_class = TestDetrConfig
base_model_prefix = "model"
@ -1001,29 +966,16 @@ class TestDetrEncoder(TestDetrPreTrainedModel):
for i, encoder_layer in enumerate(self.layers):
if output_hidden_states:
encoder_states = encoder_states + (hidden_states,)
if self.gradient_checkpointing and self.training:
layer_outputs = self._gradient_checkpointing_func(
encoder_layer.__call__,
hidden_states,
attention_mask,
position_embeddings,
reference_points,
spatial_shapes,
spatial_shapes_list,
level_start_index,
output_attentions,
)
else:
layer_outputs = encoder_layer(
hidden_states,
attention_mask,
position_embeddings=position_embeddings,
reference_points=reference_points,
spatial_shapes=spatial_shapes,
spatial_shapes_list=spatial_shapes_list,
level_start_index=level_start_index,
output_attentions=output_attentions,
)
layer_outputs = encoder_layer(
hidden_states,
attention_mask,
position_embeddings=position_embeddings,
reference_points=reference_points,
spatial_shapes=spatial_shapes,
spatial_shapes_list=spatial_shapes_list,
level_start_index=level_start_index,
output_attentions=output_attentions,
)
hidden_states = layer_outputs[0]
@ -1155,31 +1107,17 @@ class TestDetrDecoder(TestDetrPreTrainedModel):
if output_hidden_states:
all_hidden_states += (hidden_states,)
if self.gradient_checkpointing and self.training:
layer_outputs = self._gradient_checkpointing_func(
decoder_layer.__call__,
hidden_states,
position_embeddings,
reference_points_input,
spatial_shapes,
spatial_shapes_list,
level_start_index,
encoder_hidden_states,
encoder_attention_mask,
output_attentions,
)
else:
layer_outputs = decoder_layer(
hidden_states,
position_embeddings=position_embeddings,
encoder_hidden_states=encoder_hidden_states,
reference_points=reference_points_input,
spatial_shapes=spatial_shapes,
spatial_shapes_list=spatial_shapes_list,
level_start_index=level_start_index,
encoder_attention_mask=encoder_attention_mask,
output_attentions=output_attentions,
)
layer_outputs = decoder_layer(
hidden_states,
position_embeddings,
reference_points_input,
spatial_shapes,
spatial_shapes_list,
level_start_index,
encoder_hidden_states, # as a positional argument for gradient checkpointing
encoder_attention_mask,
output_attentions,
)
hidden_states = layer_outputs[0]
@ -1253,67 +1191,11 @@ def build_position_encoding(config):
return position_embedding
TEST_DETR_START_DOCSTRING = r"""
This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
etc.)
This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
and behavior.
Parameters:
config ([`TestDetrConfig`]):
Model configuration class with all the parameters of the model. Initializing with a config file does not
load the weights associated with the model, only the configuration. Check out the
[`~PreTrainedModel.from_pretrained`] method to load the model weights.
"""
TEST_DETR_INPUTS_DOCSTRING = r"""
Args:
pixel_values (`torch.FloatTensor` of shape `(batch_size, num_channels, height, width)`):
Pixel values. Padding will be ignored by default should you provide it.
Pixel values can be obtained using [`AutoImageProcessor`]. See [`TestDetrImageProcessor.__call__`]
for details.
pixel_mask (`torch.LongTensor` of shape `(batch_size, height, width)`, *optional*):
Mask to avoid performing attention on padding pixel values. Mask values selected in `[0, 1]`:
- 1 for pixels that are real (i.e. **not masked**),
- 0 for pixels that are padding (i.e. **masked**).
[What are attention masks?](../glossary#attention-mask)
decoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, num_queries)`, *optional*):
Not used by default. Can be used to mask object queries.
encoder_outputs (`tuple(tuple(torch.FloatTensor)`, *optional*):
Tuple consists of (`last_hidden_state`, *optional*: `hidden_states`, *optional*: `attentions`)
`last_hidden_state` of shape `(batch_size, sequence_length, hidden_size)`, *optional*) is a sequence of
hidden-states at the output of the last layer of the encoder. Used in the cross-attention of the decoder.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing the flattened feature map (output of the backbone + projection layer), you
can choose to directly pass a flattened representation of an image.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, num_queries, hidden_size)`, *optional*):
Optionally, instead of initializing the queries with a tensor of zeros, you can choose to directly pass an
embedded representation.
output_attentions (`bool`, *optional*):
Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
tensors for more detail.
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
return_dict (`bool`, *optional*):
Whether or not to return a [`~file_utils.ModelOutput`] instead of a plain tuple.
"""
@add_start_docstrings(
"""
@auto_docstring(
custom_intro="""
The bare Deformable DETR Model (consisting of a backbone and encoder-decoder Transformer) outputting raw
hidden-states without any specific head on top.
""",
TEST_DETR_START_DOCSTRING,
"""
)
class TestDetrModel(TestDetrPreTrainedModel):
def __init__(self, config: TestDetrConfig):
@ -1486,8 +1368,7 @@ class TestDetrModel(TestDetrPreTrainedModel):
object_query = self.enc_output_norm(self.enc_output(object_query))
return object_query, output_proposals
@add_start_docstrings_to_model_forward(TEST_DETR_INPUTS_DOCSTRING)
@replace_return_docstrings(output_type=TestDetrModelOutput, config_class=_CONFIG_FOR_DOC)
@auto_docstring
def forward(
self,
pixel_values: torch.FloatTensor,
@ -1501,7 +1382,14 @@ class TestDetrModel(TestDetrPreTrainedModel):
return_dict: Optional[bool] = None,
) -> Union[tuple[torch.FloatTensor], TestDetrModelOutput]:
r"""
Returns:
decoder_attention_mask (`torch.FloatTensor` of shape `(batch_size, num_queries)`, *optional*):
Not used by default. Can be used to mask object queries.
inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
Optionally, instead of passing the flattened feature map (output of the backbone + projection layer), you
can choose to directly pass a flattened representation of an image.
decoder_inputs_embeds (`torch.FloatTensor` of shape `(batch_size, num_queries, hidden_size)`, *optional*):
Optionally, instead of initializing the queries with a tensor of zeros, you can choose to directly pass an
embedded representation.
Examples:

View File

@ -44,7 +44,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.14.0", "To fix: pip install -r examples/pytorch/audio-classification/requirements.txt")

View File

@ -53,7 +53,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/contrastive-image-text/requirements.txt")

View File

@ -2,4 +2,5 @@ accelerate>=0.12.0
torch>=1.5.0
torchvision>=0.6.0
datasets>=2.14.0
evaluate
evaluate
scikit-learn

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/image-classification/requirements.txt")

View File

@ -48,7 +48,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)

View File

@ -42,7 +42,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/image-pretraining/requirements.txt")

View File

@ -47,7 +47,7 @@ Any model supported by the AutoModelForMaskedImageModeling API can be used.
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/image-pretraining/requirements.txt")

View File

@ -52,7 +52,7 @@ Any model supported by the AutoModelForMaskedImageModeling API can be used.
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/image-pretraining/requirements.txt")

View File

@ -46,7 +46,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/instance-segmentation/requirements.txt")

View File

@ -52,7 +52,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/instance-segmentation/requirements.txt")

View File

@ -54,7 +54,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)

View File

@ -57,7 +57,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

View File

@ -59,7 +59,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)

View File

@ -53,7 +53,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

View File

@ -46,7 +46,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.14.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")

View File

@ -45,7 +45,7 @@ from transformers.utils import check_min_version, send_example_telemetry
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = logging.getLogger(__name__)

View File

@ -53,7 +53,7 @@ from transformers.utils import check_min_version, send_example_telemetry
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)
# You should update this to your particular problem to have better documentation of `model_type`

View File

@ -48,7 +48,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/object-detection/requirements.txt")

View File

@ -51,7 +51,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logging.basicConfig(level=logging.INFO)
logger = get_logger(__name__)

View File

@ -49,7 +49,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt")

View File

@ -47,7 +47,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt")

View File

@ -54,7 +54,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt")

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt")

View File

@ -45,7 +45,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/question-answering/requirements.txt")

View File

@ -50,7 +50,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=2.0.0", "To fix: pip install -r examples/pytorch/semantic-segmentation/requirements.txt")

View File

@ -49,7 +49,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)

View File

@ -129,7 +129,7 @@ To pre-train `"large-sized"` Wav2Vec2 model, *e.g.* [facebook/wav2vec2-large-lv6
on [librispeech_asr](https://huggingface.co/datasets/librispeech_asr), the following command can be run:
```bash
accelerate launch run_wav2vec2_pretraining_no_trainer.py \
accelerate launch run_wav2vec2_pretraining_no_trainer.py \
--dataset_name=librispeech_asr \
--dataset_config_names clean clean other \
--dataset_split_names train.100 train.360 train.500 \
@ -141,7 +141,7 @@ accelerate launch run_wav2vec2_pretraining_no_trainer.py \
--weight_decay=0.01 \
--max_duration_in_seconds=20.0 \
--min_duration_in_seconds=2.0 \
--model_name_or_path=./
--model_name_or_path=./ \
--logging_steps=1 \
--saving_steps=10000 \
--per_device_train_batch_size=2 \

View File

@ -49,7 +49,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.18.0", "To fix: pip install -r examples/pytorch/speech-recognition/requirements.txt")

View File

@ -52,7 +52,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.18.0", "To fix: pip install -r examples/pytorch/speech-recognition/requirements.txt")

View File

@ -47,7 +47,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.18.0", "To fix: pip install -r examples/pytorch/speech-recognition/requirements.txt")

View File

@ -51,7 +51,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/summarization/requirements.txt")

View File

@ -55,7 +55,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/summarization/requirements.txt")

View File

@ -312,7 +312,6 @@ class ExamplesTestsNoTrainer(TestCasePlus):
{self.examples_dir}/pytorch/image-classification/run_image_classification_no_trainer.py
--model_name_or_path google/vit-base-patch16-224-in21k
--dataset_name hf-internal-testing/cats_vs_dogs_sample
--trust_remote_code
--learning_rate 1e-4
--per_device_train_batch_size 2
--per_device_eval_batch_size 1

View File

@ -17,7 +17,6 @@ import json
import logging
import os
import sys
import unittest
from unittest.mock import patch
from transformers import ViTMAEForPreTraining, Wav2Vec2ForPreTraining
@ -391,7 +390,6 @@ class ExamplesTests(TestCasePlus):
--output_dir {tmp_dir}
--model_name_or_path google/vit-base-patch16-224-in21k
--dataset_name hf-internal-testing/cats_vs_dogs_sample
--trust_remote_code
--do_train
--do_eval
--learning_rate 1e-4
@ -415,7 +413,6 @@ class ExamplesTests(TestCasePlus):
result = get_results(tmp_dir)
self.assertGreaterEqual(result["eval_accuracy"], 0.8)
@unittest.skip("temporary to avoid failing on circleci")
def test_run_speech_recognition_ctc(self):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
@ -426,7 +423,6 @@ class ExamplesTests(TestCasePlus):
--dataset_config_name clean
--train_split_name validation
--eval_split_name validation
--trust_remote_code
--do_train
--do_eval
--learning_rate 1e-4
@ -447,7 +443,6 @@ class ExamplesTests(TestCasePlus):
result = get_results(tmp_dir)
self.assertLess(result["eval_loss"], result["train_loss"])
@unittest.skip("temporary to avoid failing on circleci")
def test_run_speech_recognition_ctc_adapter(self):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
@ -458,7 +453,6 @@ class ExamplesTests(TestCasePlus):
--dataset_config_name clean
--train_split_name validation
--eval_split_name validation
--trust_remote_code
--do_train
--do_eval
--learning_rate 1e-4
@ -481,7 +475,6 @@ class ExamplesTests(TestCasePlus):
self.assertTrue(os.path.isfile(os.path.join(tmp_dir, "./adapter.tur.safetensors")))
self.assertLess(result["eval_loss"], result["train_loss"])
@unittest.skip("temporary to avoid failing on circleci")
def test_run_speech_recognition_seq2seq(self):
tmp_dir = self.get_auto_remove_tmp_dir()
testargs = f"""
@ -492,7 +485,6 @@ class ExamplesTests(TestCasePlus):
--dataset_config_name clean
--train_split_name validation
--eval_split_name validation
--trust_remote_code
--do_train
--do_eval
--learning_rate 1e-4
@ -520,7 +512,6 @@ class ExamplesTests(TestCasePlus):
--output_dir {tmp_dir}
--model_name_or_path hf-internal-testing/tiny-random-wav2vec2
--dataset_name anton-l/superb_demo
--trust_remote_code
--dataset_config_name ks
--train_split_name test
--eval_split_name test
@ -555,7 +546,6 @@ class ExamplesTests(TestCasePlus):
--dataset_name hf-internal-testing/librispeech_asr_dummy
--dataset_config_names clean
--dataset_split_names validation
--trust_remote_code
--learning_rate 1e-4
--per_device_train_batch_size 4
--per_device_eval_batch_size 4
@ -576,7 +566,6 @@ class ExamplesTests(TestCasePlus):
run_mae.py
--output_dir {tmp_dir}
--dataset_name hf-internal-testing/cats_vs_dogs_sample
--trust_remote_code
--do_train
--do_eval
--learning_rate 1e-4

View File

@ -46,7 +46,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/text-classification/requirements.txt")

View File

@ -48,7 +48,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/text-classification/requirements.txt")

View File

@ -48,7 +48,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)

View File

@ -47,7 +47,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/text-classification/requirements.txt")

View File

@ -48,7 +48,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/token-classification/requirements.txt")

View File

@ -55,7 +55,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/token-classification/requirements.txt")

View File

@ -51,7 +51,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/translation/requirements.txt")

View File

@ -56,7 +56,7 @@ from transformers.utils.versions import require_version
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = get_logger(__name__)
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/translation/requirements.txt")

View File

@ -50,7 +50,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version(
"datasets>=1.8.0", "To fix: pip install -r examples/tensorflow/contrastive-image-text/requirements.txt"

View File

@ -54,7 +54,7 @@ from transformers.utils.versions import require_version
logger = logging.getLogger(__name__)
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/image-classification/requirements.txt")

View File

@ -49,7 +49,7 @@ from transformers.utils import check_min_version, send_example_telemetry
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = logging.getLogger(__name__)

View File

@ -61,7 +61,7 @@ except (ModuleNotFoundError, ImportError):
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
logger = logging.getLogger(__name__)

View File

@ -52,7 +52,7 @@ from transformers.utils.versions import require_version
# region Checking dependencies
# Will error if the minimal version of Transformers is not installed. Remove at your own risks.
check_min_version("4.53.0.dev0")
check_min_version("4.53.0")
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/summarization/requirements.txt")

Some files were not shown because too many files have changed in this diff Show More