* update
* batch update model code
* typos
* too many diffs, dump
* dump again
* another dump
* fix copies
* make `rope_scaling_dict` self attr
* fix a few more tests
* another update
* fix a few more tests, hopefully last ones
* fox copies
* fix copies again
* fix newly added models, I hate rebasing on main
* update config files
* modular files
* fix rope utils test
* docstring has to be indented more, why?
* oops forgot to update some modualr files
* copy from doesn't copy decorators?
* fix overriden test as well
* add a new test
* fix failing tests again
* update docstrings
* fix phi3
* fix two models
* fix copies
* forgot to add
* stupid bug from modular conversion
* fix slow tests
* update to call rotary emb once per model forward
* 3K tests failing?!
* update
* update more models
* fix copies
* fix the rest of tests hopefully
* fix after rebase
* fix the rope tests
* fix docs omni
* change a bit
* models with layer types
* why it was deleted?
* fix a few tests
* fix last test!
* delete extra empty lines
* add a test case
* more changes
* fix models
* typing hint for nested rope params
* missed when resolving conflicts
* delete layer types and fix typo
* fix copies
* fix copies
* update docs text
* docs
* huuge update all models
* fix copies
* rename attr to align with new format
* delete redundant rope tests
* trigger ci
* update the case
* this is why i hate rebasing
* maybe fixed?
* oops
* now fix?
* fix last tests and copies
* fix copies?
* fix minimax and gemma3n
* update typo
* deprecation end version
* final fix copies :fingers-crossed:
* oh my, add the docs in toctree
* oke, this is really the last fix
* fix copies and hope that tests won't start failing again
* use rope scaling if saved
* fix slow tests
* fix cwm and unrelated deepseek
* fix last
* update
* hope it works now, it took so long
* lets keep None for now, I will try to remove after checking tests
* some more fixes, i find and replace does not always find all cases
* last fix of tests
* arthur's comment for extra foreward kwargs
* delete unused code
* fix slow qwen tests
* delete layer types from models
* faulty modular conversion
* fix qwen omni
* fix copies and style
* address my comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* [new-models] LFM2-MoE
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [docs] add in template lfm2_moe doc files
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [configuration] update configuration class
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modular][lfm] minor: fix rotary_emb typo
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modeling] modular/modeling files for Lfm2Moe
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modeling][lfm2_moe] fix Lfm2Moe modular/modeling
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [configuration][lfm2_moe] update configuration keys with latest config changes
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [misc] make fixup
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modular][lfm2_moe] address comments: dtype, mlp, buffers
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [configuration][lfm2_moe] add initializer_range
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modular][lfm2_moe] include init_weights to pass test_initialization
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [tests][causal_lm] include pos_emb as possible rope attribute
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modeling][lfm2_moe] remove load_balancing_loss_func due to lack of support for hooking expert biases
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [misc] make style
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [modeling][lfm2_moe] MoE refactor PR update in LFM2Moe
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [tests] lfm2_moe: unit tests
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [misc] update LFM2-8B-A1B repo id
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [tests] lfm2: update ModelTests for lfm2
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* Update LFM2 documentation
Updated the LFM2 documentation to reflect the addition of a new model size and clarified architectural details.
* Add Lfm2Moe documentation
Add Lfm2Moe model documentation with overview and example usage.
* [misc] fix ci
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [docs] remove trust_remote_code
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* [misc] ci: fix modular
Signed-off-by: Paul Pak <paulpak58@gmail.com>
* reapply modular
* simplify
* remove static address and inplace op
* simplify
* simplify a bit more the modular
* imports
---------
Signed-off-by: Paul Pak <paulpak58@gmail.com>
Co-authored-by: Maxime Labonne <81252890+mlabonne@users.noreply.github.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* update modeling mixtral
* oups[13;2u
* fix
* better naming?
* compute softmax and top_k inside the experts
* update minamax as well
* models that will need an update
* more models that need a fix
* stash
* fix mixtral
* update olmoe
* update
* update
* current changes
* nits
* molmoe is now fixed
* olmoe is good to go!
* refactor qwen2_moe
* fixes
* fixed moe
* fix qwen2 modular
* nit
* qwen2_moie test script works
* tricky rope !
* fix qwen3
* DeepSeek v3 MoE Standardization (#40538)
* DeepSeek-v3
Shared
Shared
* Dependents of DS3
* Standardize GLM4V MoE (#40539)
* up
* Standardize VitPose's MoE (#40549)
* VitPose
* outside
* outside
* outside
* fix
* update dbrx
* dbrx... the magix
* Refactor Ernie 4.5's MoE (#40547)
* Isolate Ernie fixes
* fix moe
---------
Co-authored-by: Vasqu <antonprogamer@gmail.com>
* fix style
* style
* fix copies
* style
* latest changes
* fixes
* had to stage
* current updaters
* up
* another modular
* modular graniteMoe
* some update
* draft another modular moe
* updaters
* up
* fix nit
* q3 nit
* fix phi moe
* we're going up up up up its our mooooment
* fix switch transformers this time around
* up
* gptsan japanese is deprecated forget about it
* fix mixtral to not be a linear (gives us more freedom)
* update
* fix copies gone wrong try catch nothing
* fix mixtral
* new refactor again
* update aria as well
* up dbrx and deepseekv3
* nit
* fix phimoe?
* fix deepseek v3
* nits
* don't bother with this one please
* up olmoe
* ??
* fix olmoe
* yups
* fiupx
* ish
* hot patch
* new qwen3
* updates
* up
* nit
* fix copies
* fix
* nits
* we're going up up up
* nits
* switch_transformesr edge case
* lol modular gptsan?
* fix deepseek
* finally all modeling match modular
* update
* up
* up
* dang
* up
* up aria
* fix dbrx
* nits here and there
* finish fixing dbrx
* fix deepseek
* upd
* up
* fix flex olmo
* updated
* update jamba
* JAMBA is stil a bit todo
* forward forward
* fix dots11
* update
* fix hunyuan
* fix some other
* update phimoe
* fuck you phimoe you are now submitted
* submit granitemoe as well
* try to fix some other models, reduces some of the failures
* fix olmoe and qwem2moe
* up
* up
* fix qwen2_moe
* update modular make it again, simpler
* nits
* up
* up
* fix
* someswitch reductions
* up
* fix qwen3vl
* some fixes to jetmo
* these should be shipped to the modular to fix jetmoe
* fix most of the nllb failures
* more nllb fixes
* fix the modular
* remove nllb modular as it sucks for now
* ?
* fix granitemoe
* granitemoehybrid don't have rope
* use rope when rope, no rope when no rope
* updates
* finish fixing dumbgrainite
* fix most of minimax
* fix
* update modular
* ?
* up
* up jetmoe still broken
* up
* fix, now align the moe
* fix jetmoe
* fix styling and qwen3 repo consitency
* updatge
* up up
* update ruff?
* nits
* modeling is goot now for switch
* fix
* more fixses to switch!
* fix some siwtch test
* ?
* ?
* up
* fix switch modular!
* nit?
* uip
* subtest
* can't believe I wasted so much time on this...
* fix
* updates
* nits
* nit jamba is fucking annoying
* ?
* fix?
* oups
* good good
* styling
* up
* make sure qwen2 sliding works!
* fix dbrx small
* lol
* nits
* fix one test
* fix load balancing loss issue
* fix jamba
* fix nllbmoe
* fix jamba consistency and doc?
* up
* thse are correct
* up
* up
* up
* some of the final cleanup
* update
* up
* fix some revert in granimoe
* bring back attention multipliers for the granite family we'll see later on if they need removal
* small jamba fix docstring and typing
* fix phimoe
* yup
* fix unk returndict in granitemoes
* up
* fix qwen config
* fix phiemoe check quality
* nits
* update based on caught non relative imports!
* fix dbrx
* Apply suggestions from code review
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* fix copies
* fiuxp
* fix dot1 regression!
* fix phimoe issue
* fix phi moe
* fix float() for some models
* fix jamba regression
* ui
* more dtype issues
* fix deepseek2 and 3?
* proper update
* fix modular deepseek!
* jamba jambaaaaaa
---------
Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Vasqu <antonprogamer@gmail.com>
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
* initial comment
* test
* initial conversion for outline
* intermediate commit for configuration
* chore:init files for sam2
* adding arbitary undefined config
* check
* add vision
* make style
* init sam2 base model
* Fix imports
* Linting
* chore:sam to sam2 classes
* Linting
* Add sam2 to models.__init__
* chore:match prompt encoder with sam2 code
* chore:prepare kwargs for mask decoder
* Add image/video predictors
* Add CUDA kernel
* Add output classes
* linting
* Add logging info
* tmp commit
* docs for sam2
* enable image processing
* check difference of original SAM2
- difference is the order of ToTensor()
- please see https://pytorch.org/vision/main/_modules/torchvision/transforms/functional.html#resize
* enable promptencoder of sam2
* fix promprencoder
* Confirmed that PromptEncoder is exactly same (Be aware of bfloat16 and float32 difference)
* Confirmed that ImageEncoder is exactly same (Be aware the linting of init)
* Confirmed that MaskDecoder is exactly same (TO DO: lint variable name)
* SamModel is now available (Need more chore for name)
* make fix-copies
* make style
* make CI happy
* Refactor VisionEncoder and PostioinEmbedding
* TO DO : fix the image_embeddings and sparse_embeddings part
* pure image inference done
* reusable features fix and make style
* styling
* refactor memoryattention
* tmp
* tmp
* refactor memoryencoder
TO DO : convert and inference the video pipeline
* TO DO : fix the image_encoder shape
* conversion finish
TO DO: need to check video inference
* make style
* remove video model
* lint
* change
* python utils/check_docstringspy --check_all
* python utils/check_config_attributes.py
* remove copies for sam2promptencoder due to configuration
* change __init__.py
* remove tensorflow version
* fix that to not use direct comparison
* make style
* add missing import
* fix image_embedding_size
* refactor Sam2 Attention
* add fully working video inference (refactoring todo)
* clarify _prepare_memory_conditioned_features
* simplify modeling code, remove unused paths
* use one model
* use auto_docstring
* refactor rope embeddings
* nit
* not using multimask when several points given
* add all sam2.1
* add video tmp
* add Sam2VideoSessionState + fast image proc + video proc
* remove init_states from model
* fix batch inference
* add image integration tests
* uniformize modeling code with other sam models and use modular
* pass vision tests an most model tests
* All tests passing
* add offloading inference state and video to cpu
* fix inference from image embedding and existing mask
* fix multi_boxes mask inference
* Fix batch images + batch boxes inference
* improve processing for image inference
* add support for mask generation pipeline
* add support for get_connected_components post processing in mask generation
* add fast image processor sam, image processor tests and use modular for sam2 image processor
* fix mistake in sam after #39120
* fix init weights
* refactor convert
* add integration tests for video + other improvements
* add needed missing docstrings
* Improve docstrings and
* improve inference speed by avoiding cuda sync
* add test
* skip test for vision_model
* minor fix for vision_model
* fix vision_model by adding sam2model and change the torch dependencies
* remove patch_size
* remove image_embedding_size
* fix patch_size
* fix test
* make style
* Separate hieradet and vision encoder in sam2
* fixup
* review changes part 1
* remove MemoryEncoderConfig and MemoryAttentionConfig
* pass q_stride instead of q_pool module
* add inference on streamed videos
* explicitely process streamed frames
* nit
* Improve docstrings in Sam2Model
* update sam2 modeling with better gestion of inference state and cache, and separate Sam2Model and Sam2VideoModel
* improve video inference api
* change inference_state to inference_session
* use modular for Sam2Model
* fix convert sam2 hf
* modular
* Update src/transformers/models/sam2/video_processing_sam2.py
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* fix minor config
* fix attention loading error
* update modeling tests to use hub checkpoints
* Use CI A10 runner for integration tests values + higher tolerance for video integration tests
* PR review part 1
* fix doc
* nit improvements
* enforce one input format for points, labels and boxes
* nit
* last few nits from PR review
* fix style
* fix the input type
* fix docs
* add sam2 model as conversion script
* improve sam2 doc
* add rough necessarry changes
* first working edgetam
* fix issue with object pointers
* Use modular as much as possible
* nit fixes + optimization
* refactor spatial perceiver
* cleanup after merge
* add working edgetam
* improve perceiver resampler code
* simplify/unify rope attention logic
* Improve comments in apply_rotary_pos_emb_2d
* add working tests
* fix test timmwrapper
* add docs
* make fixup
* nits
* fix modular
* fix modular
* PR review part 1
* split apply_rotary_pos_emb_2d
* add granularity to _prepare_memory_conditioned_features
* add dates to doc
* add separate mlp for memory attention
* Fix memory on wrong device
* store processed frames in dict
* update checkpoints in tests
* update dates
---------
Co-authored-by: sangbumchoi <danielsejong55@gmail.com>
Co-authored-by: RUFFY-369 <prakarshkaushik369@gmail.com>
Co-authored-by: Sangbum Daniel Choi <34004152+SangbumChoi@users.noreply.github.com>
Co-authored-by: Haitham Khedr <haithamkhedr@meta.com>
Co-authored-by: sangbum choi <sangbumchoi@sangbumui-MacBookAir.local>
Co-authored-by: Pavel Iakubovskii <qubvel@gmail.com>
* halfway through the models
* update test checks
* refactor all
* another one
* use tuples
* more deletions
* solve bad inheritance patterns
* type
* PR ready?
* automatic model class inference from the base class
* vaultgemma
* make fixup
* make fixup
* rebase with gpt2
* make fixup :'(
* gpt2 is special
* Add FA to docker
* Use caching mechanism for qwen2_5
* Fix a typo in important models list
* Partial fixes for gemma3
* Added a commit ID for FA repo
* Detailled the expectation storage format
* Rebase fix
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Adapt and test huggingface_hub v1.0.0.rc0
* forgot to bump hfh
* bump
* code quality
* code quality
* relax dependency table
* fix has_file
* install hfh 1.0.0.rc0 in circle ci jobs
* repostiryo
* push to hub now returns a commit url
* catch HfHubHTTPError
* check commit on branch
* add it back
* fix ?
* remove deprecated test
* uncomment another test
* trigger
* no proxies
* many more small changes
* fix load PIL Image from httpx
* require 1.0.0.rc0
* fix mocked tests
* fix others
* unchange
* unchange
* args
* Update .circleci/config.yml
* Bump to 1.0.0.rc1
* bump kernels version
* fix deps
* Add Qwen3Omni
* make fix-copies, import properly
* nit
* fix wrong setup. Why was audio_token_id renamed ?
* upds
* more processing fixes
* yup
* fix more generation tests
* down to 1?
* fix import issue
* style, update check repo
* up
* fix quality at my best
* final quality?
* fix doc building
* FINAL COMMIT: SKIP IMPORTANT BUT FAILING TESTS FOR MERGE
* SKIP THE TEMPLATE ONE
---------
Co-authored-by: lvyuanjun.lyj <lvyuanjun.lyj@alibaba-inc.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
* setup
* start the purge
* continue the purge
* more and more
* more
* continue the quest: remove loading tf/jax checkpoints
* style
* fix configs
* oups forgot conflict
* continue
* still grinding
* always more
* in tje zone
* never stop
* should fix doc
* fic
* fix
* fix
* fix tests
* still tests
* fix non-deterministic
* style
* remove last rebase issues
* onnx configs
* still on the grind
* always more references
* nearly the end
* could it really be the end?
* small fix
* add converters back
* post rebase
* latest qwen
* add back all converters
* explicitly add functions in converters
* re-add