* init swissai model
* AutoModelForCausalLM
* AutoModelForCausalLM mapping
* qk norm and post ln optional
* fix wrong shape of qk norm: megatron uses head_dim
* automodel fixes
* minor fix in forward
* fix rope validation to accept llama3 scaling
* `SwissAIForTokenClassification` support
* Align `SwissAI` to v4.52.4
* Align `SwissAI` to v4.53.1
* Init CUDA xIELU
* `SwissAI*`->`Apertus*`
* ci fix
* check_docstring ignore ApertusConfig
* Licensing and placeholder tests
* Placeholder doc
* XIELU syntax
* `_xielu_python` optimization
* Fix xIELU
* [tmp] `{beta,eps}` persistent=False
until {beta,eps} saved in checkpoint
* Modular `Apertus`
* CUDA xIELU logging
* ci fix
* ci fix
* ci fix
* Update license
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* Update tests/models/apertus/test_modeling_apertus.py
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* `.utils.import_utils.is_torchdynamo_compiling`
* `Apertus` class ordering
* `past_key_value{->s}`, `make fix-copies`
* ci fix
* Remove unused configuration parameters
* `{beta,eps}` saved in checkpoint
* `{beta,eps}` Temporarily on CPU
* Suggestions
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* ci fix
* remove fx_compatible (deprecated)
* remove `rotary_embedding_layer`
As the tests are written for a config without default scaling (which is not the case in Apertus) - besides, rope scaling is tested in other models so it's all safe.
* fully removing `Mask4DTestHard` class
Not needed (for now)
* switch to `dtype` instead of `torch_dtype`
Following this:
https://github.com/huggingface/transformers/pull/39782
* remove unused imports
* remove `cache_implementation="static"`
* +Apertus to `docs/source/en/_toctree.yml` for the doc builder
---------
Co-authored-by: Alexander Hagele <alexanderhagele@gmail.com>
Co-authored-by: dhia680 <garbayad@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
Co-authored-by: Dhia Garbaya <84809366+dhia680@users.noreply.github.com>
* fix in modular
* remove leftover print
* fix everything except when it's in assignment
* fix assignment as well
* more general
* better
* better
* better comment
* docstring
* cleaner
* remove base
* doc
* init
* add modular
* fixup
* update configuration
* add processing file
* update auto files
* update
* update modular
* green setup_and_quality ci
* it works
* fix some tests
* commit florence2
* update test
* make test cases done - 16 left
* style
* fix few test cases
* fix some tests
* fix init test
* update florence2 vision style
* hope is green
* fix init test
* fix init
* update modular
* refactor vision module
* fix: channel attention use dynamic scale
* update modular
* update
* update attention mask
* update
* fix naming
* Update src/transformers/models/florence2/processing_florence2.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* spatial block works
* more beautiful
* more more beautiful
* merge main
* merge main and fixup
* fix typing hint
* update modeling
* fix eager matches sdpa
* fix style
* fix compile test - all green
* remove florence2 language
* remove Florence2LanguageModel things
* fix style
* update florence2 model
* override prepare encoder_decoder for generation
* add weight conversion script
* rewrite channel attention to use sdpa
* eleminate 1 tranpose op
* support fa2
* fix quality check
* chore: reformat `test_modeling_florence2.py`
* some refactor for processor
* some refactor for processor
* update naming convention and remove BC
* make it pass the test
* fix: correct Embedding Cosine
* update comments and docstring
* support input_embeds
* support input embeds ideally
* fix style
* fix style
* fix style again :D
* add test prcoessor
* refactor processor and add test for processor
* reformat test processor
* make fixup
* fix schema check
* remove image_token
* ensure image token in tokenizer and fix integration tests
* fix processor test
* add more integration tests for large model and rename test_processor to test_processing
* test_assisted_decoding_sample should pass
* update doc and make model work with image text to text pipeline
* docs: add sdpa bagde
* resolve cyril's comments
* fix import torch error
* add helper get_placeholder_mask
* inherit from llava
* florence2 may not _supports_attention_backend because of bart ...
* move florence2 model card to multimodal
* let base model always return_dict
* fix style
* tiny update doc
* set _checkpoint_conversion_mapping = {}
* fix code quality
* support flex and compile graph and move external func to internal func
* remove condition because it always true
* remove window funcs
* move post processor config out
* fix ci
* new intro to trigger test
* remove `kernel_size` argument
---------
Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* added dates to the models with a single hf papers link
* added the dates for models with multiple papers
* half of no_papers models done
* rest of no_papers models also done, only the exceptions left
* added copyright disclaimer to sam_hw, cohere, cohere2 + dates
* some more fixes, hf links + typo
* some new models + a rough script
* the script looks robust, changed all paper links to hf
* minor change to handle technical reports along with blogs
* ran make fixup to remove the white space
* refactor
* initial comment
* test
* initial conversion for outline
* intermediate commit for configuration
* chore:init files for sam2
* adding arbitary undefined config
* check
* add vision
* make style
* init sam2 base model
* Fix imports
* Linting
* chore:sam to sam2 classes
* Linting
* Add sam2 to models.__init__
* chore:match prompt encoder with sam2 code
* chore:prepare kwargs for mask decoder
* Add image/video predictors
* Add CUDA kernel
* Add output classes
* linting
* Add logging info
* tmp commit
* docs for sam2
* enable image processing
* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize
* enable promptencoder of sam2
* fix promprencoder
* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)
* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)
* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)
* SamModel is now available (Need more chore for name)
* make fix-copies
* make style
* make CI happy
* Refactor VisionEncoder and PostioinEmbedding
* TO DO : fix the image_embeddings and sparse_embeddings part
* pure image inference done
* reusable features fix and make style
* styling
* refactor memoryattention
* tmp
* tmp
* refactor memoryencoder
TO DO : convert and inference the video pipeline
* TO DO : fix the image_encoder shape
* conversion finish
TO DO: need to check video inference
* make style
* remove video model
* lint
* change
* python utils/check_docstringspy --check_all
* python utils/check_config_attributes.py
* remove copies for sam2promptencoder due to configuration
* change __init__.py
* remove tensorflow version
* fix that to not use direct comparison
* make style
* add missing import
* fix image_embedding_size
* refactor Sam2 Attention
* add fully working video inference (refactoring todo)
* clarify _prepare_memory_conditioned_features
* simplify modeling code, remove unused paths
* use one model
* use auto_docstring
* refactor rope embeddings
* nit
* not using multimask when several points given
* add all sam2.1
* add video tmp
* add Sam2VideoSessionState + fast image proc + video proc
* remove init_states from model
* fix batch inference
* add image integration tests
* uniformize modeling code with other sam models and use modular
* pass vision tests an most model tests
* All tests passing
* add offloading inference state and video to cpu
* fix inference from image embedding and existing mask
* fix multi_boxes mask inference
* Fix batch images + batch boxes inference
* improve processing for image inference
* add support for mask generation pipeline
* add support for get_connected_components post processing in mask generation
* add fast image processor sam, image processor tests and use modular for sam2 image processor
* fix mistake in sam after #39120
* fix init weights
* refactor convert
* add integration tests for video + other improvements
* add needed missing docstrings
* Improve docstrings and
* improve inference speed by avoiding cuda sync
* add test
* skip test for vision_model
* minor fix for vision_model
* fix vision_model by adding sam2model and change the torch dependencies
* remove patch_size
* remove image_embedding_size
* fix patch_size
* fix test
* make style
* Separate hieradet and vision encoder in sam2
* fixup
* review changes part 1
* remove MemoryEncoderConfig and MemoryAttentionConfig
* pass q_stride instead of q_pool module
* add inference on streamed videos
* explicitely process streamed frames
* nit
* Improve docstrings in Sam2Model
* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel
* improve video inference api
* change inference_state to inference_session
* use modular for Sam2Model
* fix convert sam2 hf
* modular
* Update src/transformers/models/sam2/video_processing_sam2.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix minor config
* fix attention loading error
* update modeling tests to use hub checkpoints
* Use CI A10 runner for integration tests values + higher tolerance for video integration tests
* PR review part 1
* fix doc
* nit improvements
* enforce one input format for points, labels and boxes
* nit
* last few nits from PR review
* fix style
* fix the input type
* fix docs
* add sam2 model as conversion script
* improve sam2 doc
* nit fixes + optimization
* split sam2 and sam2_video in two models
* PR review part 1
* fix None for default slow processor of sam2
* remove unecessary code path in sam2_video
* refactor/simplify RoPE
* replace embedding module list with embedding matrix
* fix tests
* remove kernel
* nit
* use lru_cache for sine_pos_embeddings
* reorder sam2_video methods
* simplify sam2_video
* PR review part 1
* simplify sam2 video a lot
* more simplification
* update integration tests with updated conftest
* more explicit config for hieradet
* do post_processing outside of sam2 video model
* Improve Sam2VideoVisionRotaryEmbedding
* fix tests
* update docs and fix mask2former/oneformer
* avoid unnecessary reshapes/permute
* fix device concatenating points
* small dtype fix
* PR review
* nit
* fix style and finish up doc
* fix style
* fix docstrings
* fix modular
---------
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* Add initial collated reports script and job definition
* provide commit hash for this run. Also use hash in generated artifact name. Json formatting
* tidy
* Add option to upload collated reports to hf hub
* Add glob pattern for test report folders
* Fix glob
* Use machine_type as path filter instead of glob. Include machine_type in collated report
* init
* update
* uupdate
* ruff
* t patch is 2 defalut not 1
* draft
* back
* back1
* update
* config update
* update using glm-41 format
* add self.rope_scaling = config.rope_scaling
* update config
* update
* remove the processor
* update
* fix tests
* update
* for test
* update
* update 2126
* self.rope_scaling is missing in GLM4MOE lets add it
* update
* update
* Update modular_glm4v_moe.py
* change config
* update apply_multimodal_rotary_pos_emb
* format
* update
* Delete 3-rollout_qas_thinking_answers.py
* use right name
* update with place holder
* update
* use right rotary
* Update image_processing_glm4v_fast.py
* rope_config_validation needs to rewrite the entire config file in modular
* update
* changed name
* update
* Update modeling_glm4v_moe.py
* _init_weights shoud be add in Glm4vMoePreTrainedModel
* remove use_qk_norm
* Update modular_glm4v_moe.py
* remove use_qk_norm as it is not use
* fix style
* deprecations are not needed on new models
* fix merge issues
---------
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* fix
* nice
* where i am at
* Bro this works
* Update src/transformers/integrations/tensor_parallel.py
* cleanups
* yups that was breaking
* Update src/transformers/models/openai_moe/modeling_openai_moe.py
* gather on experts and not mlp
* add changes for latest convert branch
* adds options to get output_router_logits from config
* bring chat temlate + special tokens back into the script.
* initial commmit
* update
* working with shards
* add model.safetensors.index.json
* fix
* fix
* mxfp4 flag
* rm print
* Fix PAD/EOS/BOS (#18)
* fix pad/eos/bos
* base model maybe one day
* add some doc
* special tokens based on harmony.
* add in tokenizer config as well.
* prepare for rebase with main
* Fix for initialize_tensor_parallelism now returning 4-tuple
```
[rank0]: File "/fsx/edward/work/openai-tsm-examples/examples/generate.py", line 17, in <module>
[rank0]: model = AutoModelForCausalLM.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/models/auto/auto_factory.py", line 600, in from_pretrained
[rank0]: return model_class.from_pretrained(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 316, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/fsx/edward/work/new-model-addition-openai/src/transformers/modeling_utils.py", line 4748, in from_pretrained
[rank0]: tp_plan, device_map, device_mesh = initialize_tensor_parallelism(tp_plan, tp_size=None)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: ValueError: too many values to unpack (expected 3)
```
* mxfp4
* mxfp4 draft
* fix
* fix import
* draft
* draft impl
* finally working !
* simplify
* add import
* working version
* consider blocks and scales
* device mesh fix
* initial commit
* add working dequant + quant logic
* update
* non nan, gibberish output
* working EP + quantization finally !
* start cleaning
* remove reversing process
* style
* some cleaning
* initial commmit
* more cleaning
* more cleaning
* simplify
* more cleaning
* rm duplicated function
* changing tp_plan
* update tp plan check
* add loading attribute
* dequantizing logic
* use subfunctions
* import cleaning
* update_param_name
* adds clamped swiglu
* add clamping to training path
* simplify dequant logic
* update
* Bad merge
* more simplifications & tests
* fix !
* fix registering custom attention
* fix order
* fixes
* some test nits
* nits
* nit
* fix
* Clamp sink logits
* Clean
* Soft-max trick
* Clean up
* p
* fix deepspeed
* update both modeling and modular for cleanup
* contiguous
* update tests
* fix top_k router call
* revert renaming
* test nits
* small fixes for EP
* fix path for our local tests
* update as I should not have broken that!
* fix the loss of mixtral
* revert part of the changes related to router_scores, kernel probably no ready for that!
* deleting a small nit
* update arch
* fix post processing
* update
* running version but not expected output
* moving to cuda
* initial commit
* revert
* erroring when loading on cpu
* updates
* del blocks, scales
* fix
* style
* rm comm
* comment
* add comment
* style
* remove duplicated lines
* Fix minor issue with weight_map conversion script
* fix sampling params
* rename to final name
* upate pre-final version of template
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* fix batched inference
* serve fixes
* swizzle !
* update final chat template by Matt.
* fix responses; pin oai
* sinplify
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
* fix
* Use ROCm kernels from HUB
* Make kernel modes explicit
* update final chat template by Matt. x2
* Thanks Matt for his tireless efforts!
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
* Fix installation
* Update setup.py
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
* allow no content
* fix: update message handling in write_tokenizer function
* Fix template logic for user message role
* last nits for CB and flash_paged!
* there was one bad merge
* fix CB (hardcode for now, its just using kv groups instead)
* fix
* better fix for device_map
* minor device fix
* Fix flash paged
* updates
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* Revert "remove dtensors, not explicit (#39840)"
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* fix merge
* fix
* Fix line break when custom model indentity
* nits testing
* to locals first and pass sliding window to flash paged
* register modes for MegaBlocksMoeMlp
* add integration test in fixtures -> now update the tests to use it!
* update integration tests
* initial fix
* style and update tests
* fix
* chore(gpt oss): remove mlp_bias from configuration
It was just a leftover.
* stats
* Integration tests
* whoops
* Shouldn't move model
* Ensure assistant messages without thinking always go to "final" channel
* More checks to ensure expected format
* Add pad_token_id to model configuration in write_model function (#51)
* Add oai fix fast tests (#59)
* Fix some fast tests
* Force some updates
* Remove unnecessary fixes
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
* Update src/transformers/models/gpt_oss/convert_gpt_oss_weights_to_hf.py
* reasoning -> Reasoning
* Add additional integration tests
* fixup
* Slight fixes
* align chat template with harmony
* simplify
* Add comment
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* torch testing assert close
* Revert fixup
* skip 2 test remove todo
* merge
* padding side should be left for integration tests
* fix modular wrt to changes made to modeling
* style
* isort
* fix opies for the loss
* mmmm
---------
Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: edbeeching <edbeeching@gmail.com>
Co-authored-by: Vaibhavs10 <vaibhavs10@gmail.com>
Co-authored-by: MekkCyber <mekk.cyber@gmail.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Edward Beeching <edbeeching@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Lewis Tunstall <lewis.c.tunstall@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan@openai.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: joao@huggingface.co <joao@ip-10-53-88-32.ec2.internal>
Co-authored-by: Rocketknight1 <Rocketknight1@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Akos Hadnagy <akos@ahadnagy.com>
Co-authored-by: Ákos Hadnagy <akos.hadnagy@gmail.com>
Co-authored-by: Alvaro Moran <alvaro.moran@huggingface.co>
Co-authored-by: Lysandre <hi@lysand.re>
Co-authored-by: Matt <rocketknight1@gmail.com>
* first commit
Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface.
TODO: Some tests are failing so that needs to be fixed.
* fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model
* cleaned up a hack in the conversion script
* Fixed the expected values in integration tests
Cross att masking and cpu-gpu consistency tests are still failing however.
* changes for make style and quality
* add documentation
* clean up contrastive embedding
* add mm grounding dino to loss mapping
* add model link to config docstring
* hack fix for mm grounding dino consistency tests
* add special cases for unused config attr check
* add all models and update docs
* update model doc to the new style
* Use super_kwargs for modular config
* Move init to the _init_weights function
* Add copied from for tests
* fixup
* update typehints
* Fix-copies for tests
* fix-copies
* Fix init test
* fix snippets in docs
* fix consistency
* fix consistency
* update conversion script
* fix nits in readme and remove old comments from conversion script
* add license
* remove unused config args
* remove unnecessary if/else in model init
* fix quality
* Update references
* fix test
* fixup
---------
Co-authored-by: qubvel <qubvel@gmail.com>