* update
* batch update model code
* typos
* too many diffs, dump
* dump again
* another dump
* fix copies
* make `rope_scaling_dict` self attr
* fix a few more tests
* another update
* fix a few more tests, hopefully last ones
* fox copies
* fix copies again
* fix newly added models, I hate rebasing on main
* update config files
* modular files
* fix rope utils test
* docstring has to be indented more, why?
* oops forgot to update some modualr files
* copy from doesn't copy decorators?
* fix overriden test as well
* add a new test
* fix failing tests again
* update docstrings
* fix phi3
* fix two models
* fix copies
* forgot to add
* stupid bug from modular conversion
* fix slow tests
* update to call rotary emb once per model forward
* 3K tests failing?!
* update
* update more models
* fix copies
* fix the rest of tests hopefully
* fix after rebase
* fix the rope tests
* fix docs omni
* change a bit
* models with layer types
* why it was deleted?
* fix a few tests
* fix last test!
* delete extra empty lines
* add a test case
* more changes
* fix models
* typing hint for nested rope params
* missed when resolving conflicts
* delete layer types and fix typo
* fix copies
* fix copies
* update docs text
* docs
* huuge update all models
* fix copies
* rename attr to align with new format
* delete redundant rope tests
* trigger ci
* update the case
* this is why i hate rebasing
* maybe fixed?
* oops
* now fix?
* fix last tests and copies
* fix copies?
* fix minimax and gemma3n
* update typo
* deprecation end version
* final fix copies :fingers-crossed:
* oh my, add the docs in toctree
* oke, this is really the last fix
* fix copies and hope that tests won't start failing again
* use rope scaling if saved
* fix slow tests
* fix cwm and unrelated deepseek
* fix last
* update
* hope it works now, it took so long
* lets keep None for now, I will try to remove after checking tests
* some more fixes, i find and replace does not always find all cases
* last fix of tests
* arthur's comment for extra foreward kwargs
* delete unused code
* fix slow qwen tests
* delete layer types from models
* faulty modular conversion
* fix qwen omni
* fix copies and style
* address my comment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* Type hints and small fixes
* Remove unusued params
* Made slice inputs the default
* ruffed
* Updated some var name and moved index slicing
* Logging arg in example
* Added some padding debug var and reformat out cg
* First working CG, fixe size
* Working flexible CG
* CG are compatible with all implementations
* Fixed CG API
* Update example
* Documentation
* Fix padding tokens in FA
* Review compliance
* Better doc around weird bug
* Style
* Fix for sliding with CG
* update all models
* fix copies
* skip aria tests
* update other models
* skip should be in test, not tester
* i think this is more descriptive as a name
* find and replace for new models
* first attempt at removing
* copies
* last bits in core
* quick fixes
* tests purge
* docs and examples
* some fixes
* more
* another round of cleanups
* fix
* fix a bunch of models
* fix dummy bert
* fix
* fix new model
* fix signature change
* fix
* fix style/copies
* new models
* fix copies didnt find that damn
* test
* this shouldnt have happened during model addition
* fix param_needs_quantization
* rewrite most hqq
* clean
* fix
* comment
* remove it from exception of safetensors
* start on bnb 4bits
* post-rebase fix
* make bnb4 bit a good citizen
* remove forgotten print
* make bnb 8bits a good citizen
* better hqq
* fix
* clean
* remove state dict from signature
* switch method
* make torchao a good citizen
* fixes
* fix torchao
* add check
* typo
* remove unexpected keys from inputs (they have nothing to do there)
* remove input
* simplify a lot init
* fix
* fix check for non-persistent buffer
* revert because too many old and bad models...
* remove comment
* type hint
* make it a real test
* remove model_to_load -> always use the same model
* typo
* remove legacy offload_folder (we never waste that memory anymore)
* do not change prefix anymore
* change very bad function name
* create adjust method
* remove useless method
* restrict
* BC
* remove unused method
* CI
* remove unused args
* small fix
* fix
* CI
* CI
* avoid too many loops
* fix regex
* cleaner
* typo
* fix
* fix
* setup
* start the purge
* continue the purge
* more and more
* more
* continue the quest: remove loading tf/jax checkpoints
* style
* fix configs
* oups forgot conflict
* continue
* still grinding
* always more
* in tje zone
* never stop
* should fix doc
* fic
* fix
* fix
* fix tests
* still tests
* fix non-deterministic
* style
* remove last rebase issues
* onnx configs
* still on the grind
* always more references
* nearly the end
* could it really be the end?
* small fix
* add converters back
* post rebase
* latest qwen
* add back all converters
* explicitly add functions in converters
* re-add
* Fix for CB attn mask and refactor
* Tests for CB (not all passing)
* Passing tests and a logger fix
* Fixed the KV metrics that were broken when we moved to hybrid alloc
* Fix circular import and style
* Added tests for FA
* Unfolded test to have device expectations
* Fixes for H100
* more fixes for h100
* H100 are good
* Style
* Adding some comments from #40831
* Rename test
* Avoid 1 letter variables
* Dictonnary is only removed during kwargs
* Test for supported sample
* Fix a unvoluntary slice
* Fixes for non-sliced inputs and small example improvments
* Slice inputs is more understandabe
* Style
* CB example: better compare feature
* Cache managers, still issue w/ effective length
* WIP -- fix for effective length
* Renames
* Wroking, need better parity checks, we mind be missing 1 token
* Small fixes
* Fixed wrong attn mask and broke cache into pieces
* Warmup is slowing down things, disabling it
* Cache was too big, fixed
* Simplified index objects
* Added a profile option to the example
* Avoid calls to memory reporing tools
* Restore full attention read indices for better latency
* Adressed some TODOS and style
* Docstrings for cache managers
* Docstrings for Schedulers
* Refactor scheudlers
* [Important] Cache fix for sliding window, check with small sw size
* Updated doc for cache memory compute and cache as a whole
* Moved a todo
* Nits and style
* Fix for when sliding window is smaller than max batch per token
* Paged interface update
* Support for FLash in new API
* Fix example CB
* Fix bug in CB for paged
* Revert example
* Style
* Review compliance
* Style
* Styleeeee
* Removed NO_SLIDING_WINDOW
* Review #2 compliance
* Better art
* Turn cum_seqlens_k in a dict
* Attn mask is now a dict
* Update examples/pytorch/continuous_batching.py
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Adressed McPatate pro review
* Style and fix
---------
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
* Rework of the CB example
* Further rework of CB example
* Refactor PA cache, slice on tokens, add debug prints -- WIP
* Slice cache -- WIP
* Added a mechanism to check batched outputs in CB script
* Less logging, debug flag for slice, !better reset! -- WIP
* QOL and safety margins
* Refactor and style
* Better saving of cb example
* Fix
* Fixes and QOL
* Mor einformations about metrics
* Further logging
* Style
* Licenses
* Removed some comments
* Add a slice input flag
* Fix in example
* Added back some open-telemetry deps
* Removed some aux function
* Added FA2 option to example script
* Fixed math (all of it)
* Added a simple example
* Renamed core to classes
* Made allocation of attention mask optionnal
* Style
* simplify common get/set
* remove some noise
* change some 5 years old modeling utils
* update examples
* fix copies
* revert some changes
* fixes, gah
* format
* move to Mixin
* remove smolvlm specific require grad
* skip
* force defaults
* remodularise some stuff
* remodularise more stuff
* add safety for audio models
* style
* have a correct fallback, you daft donkey
* remove this argh
* change heuristic for audio models
* fixup
* revert
* this works
* this should be explicit
* fix Nth ESM exception
* tryout decoder
* this as well
* revert again
* 🧠
* aaah ESM has two modelings aaah
* broom broom
* format
* wrong copies
* copies
* modular cleanups
* format
* modularities
* wrong mergefix
* seriously
* align with new model
* new model
* update everywhere
* style
* pipelines
* switch it everywhere in tests
* switch it everywhere in docs
* switch in converters everywhere
* update in examples
* update in model docstrings
* style
* warnings
* style
* Update configuration_utils.py
* fix
* Update configuration_utils.py
* fixes and add first test
* add pipeline tests
* Update test_pipelines_common.py
* add config test
* Update test_modeling_common.py
* add new ones
* post rebase
* add new
* post rebase adds