* update
* batch update model code
* typos
* too many diffs, dump
* dump again
* another dump
* fix copies
* make `rope_scaling_dict` self attr
* fix a few more tests
* another update
* fix a few more tests, hopefully last ones
* fox copies
* fix copies again
* fix newly added models, I hate rebasing on main
* update config files
* modular files
* fix rope utils test
* docstring has to be indented more, why?
* oops forgot to update some modualr files
* copy from doesn't copy decorators?
* fix overriden test as well
* add a new test
* fix failing tests again
* update docstrings
* fix phi3
* fix two models
* fix copies
* forgot to add
* stupid bug from modular conversion
* fix slow tests
* update to call rotary emb once per model forward
* 3K tests failing?!
* update
* update more models
* fix copies
* fix the rest of tests hopefully
* fix after rebase
* fix the rope tests
* fix docs omni
* change a bit
* models with layer types
* why it was deleted?
* fix a few tests
* fix last test!
* delete extra empty lines
* add a test case
* more changes
* fix models
* typing hint for nested rope params
* missed when resolving conflicts
* delete layer types and fix typo
* fix copies
* fix copies
* update docs text
* docs
* huuge update all models
* fix copies
* rename attr to align with new format
* delete redundant rope tests
* trigger ci
* update the case
* this is why i hate rebasing
* maybe fixed?
* oops
* now fix?
* fix last tests and copies
* fix copies?
* fix minimax and gemma3n
* update typo
* deprecation end version
* final fix copies :fingers-crossed:
* oh my, add the docs in toctree
* oke, this is really the last fix
* fix copies and hope that tests won't start failing again
* use rope scaling if saved
* fix slow tests
* fix cwm and unrelated deepseek
* fix last
* update
* hope it works now, it took so long
* lets keep None for now, I will try to remove after checking tests
* some more fixes, i find and replace does not always find all cases
* last fix of tests
* arthur's comment for extra foreward kwargs
* delete unused code
* fix slow qwen tests
* delete layer types from models
* faulty modular conversion
* fix qwen omni
* fix copies and style
* address my comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Default implementation - no time improvement
* Improved implementation - apparently 2 times faster with only simple function refactor
* elementary torch first approach, still need further implementation of torch first method
* torch-first approach finished
* refactor processor
* refactor test
* partial doc update
* EfficientLoFTRImageProcessorFast based implementation
* EfficientLoFTRImageProcessorFast based implementation
* Logic checked - Test Passed - Validated execution speed
* use modular for efficientloftr
* fix import
---------
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
* Fix EncoderDecoder cache
* Add the option for the ddp data tuples to have 2 elems
* Modifiy the order of the KV and sliding
* Adapted RAG and Whisper to new EncoderDecoderCache
* A single comma
* Remove kwargs in map
* Fixed order in manual injection cache test
* Slight changes to support legacy format
* Removed Nonnes
* toggle the serialization
* prob this fixes it
* fix tests
* typo
* delete legacy save entirely
* remove extra nesting in if
* revert test and serialzie a public attr instead of private
* Add logits_to_keep to CausalLM models
* Skip failing test for git model
* Remove unused return_dict from kosmos2 signature
* Revert BlipForQuestionAnswering
* Add video processor for VideoMAE
* Document VideoMAE video processor
* Add regression tests for VideoMAE video processor
* refactor: Use direct batch key access for pixel_values_videos
* test: add parity test for VideoMAEVideoProcessor vs VideoMAEImageProcessor
* docs(videomae): update model docstring example to demonstrate VideoMAEVideoProcessor (TorchCodec-based decoding and sampling)
* Merge conflict
* add fast processor
* add fast processor
* make style
* add new convert rgb
* use nested group by shape in mllama fast, add support for multiple inputs in group by shape
* refactor after review
---------
Co-authored-by: Vincent <phamvinh257@gmail.com>
* [wip][cwm] Code World Model stubs and setup in HF Transformers
* [wip] Get other things working
* [wip] Working
* Tokenizer pad
* fix: cwm window attn
* temp remove test
* temp remove test
* Fixes
* Temporarily add auto config remapping option until VLLM 0.11 is out
* Fix model type and add layer validation
* Lint, remove CwmForSequenceClassification
* Lint, tests
* Remove CwmForSequenceClassification
* Lint
* Remove intermediary layer expors/doc errorss, fix tests
* Lint
* run python utils/sort_auto_mappings.py --check_only
* Remove Cwm processor mapping, get check_repo passing
* Remove CwmTextConfig from test
* Add docstring for CwmConfig
* remove global_window and window_pattern params from config
* Fix docstrings
* Revert change to auto docstring util
* lint
* Fixes minus test improvements
* Alter tests to simply check logits
* lint
* Have slow tests use repo, make CwmPretrainedModel passthrough
* Remove decoder layer implementation, use Llama3Decoder + CwmAttetion
* Use linear w/o bias for CwmAttention, add token-level integration test
* Don't ignore config attention bias
* Remove attention bias parameter entirely from config
---------
Co-authored-by: galco <galco@meta.com>
* new masks
* fixes
* adjust comments
* fix unnecessary mask creation on sdpa
* simplify masks more
* propogate to other models
* style + repo consistency
* copies
* no comment
* fix attempt
* finally fix grounding dinos
* fix distilbert
* fix executorch
* move to own module
* address first few comments WIP
* revert device comments, simplify executorch further
* fix typo
* add a test for cuda graphs
* move cleanup...
* fix conflict with new main
* fix esm and evolla