b764c20b09
Fix: loading DBRX back from saved path ( #35728 )
...
* fix dtype as dict for some models + add test
* add comment in tests
2025-01-28 11:38:45 +01:00
b912f5ee43
use torch.testing.assertclose instead to get more details about error in cis ( #35659 )
...
* use torch.testing.assertclose instead to get more details about error in cis
* fix
* style
* test_all
* revert for I bert
* fixes and updates
* more image processing fixes
* more image processors
* fix mamba and co
* style
* less strick
* ok I won't be strict
* skip and be done
* up
2025-01-24 16:55:28 +01:00
8c1b5d3782
🚨 🚨 🚨 An attempt to fix #29554 . Include 'LayerNorm.' in gamma/beta rename scope, optimize string search. ( #35615 )
...
* An attempt to fix #29554 . Include 'LayerNorm.' in gamma/beta rename scope, reduce number of characters searched on every load considerably.
* Fix fix on load issue
* Fix gamma/beta warning test
* A style complaint
* Improve efficiency of weight norm key rename. Add better comments about weight norm and layer norm renaming.
* Habitual elif redunant with the return
2025-01-16 17:25:44 -08:00
aeeceb9916
[cache] add a test to confirm we can use cache at train time ( #35709 )
...
* add test
* augment test as suggested
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* rerun tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2025-01-16 17:02:34 +00:00
84a6789145
Enable different torch dtype in sub models ( #34873 )
...
* fix
* fix test
* add tests
* add more tests
* fix tests
* supposed to be a torch.dtype test
* handle BC and make fp32 default
2025-01-13 13:42:08 +01:00
2c47618c1a
🚨 All attention refactor 🚨 ( #35235 )
...
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com >
2024-12-18 16:53:39 +01:00
1eee1cedfd
Fix loading with only state dict and low_cpu_mem_usage = True ( #35217 )
...
* fix loading with only state dict and config
* style
* add tests
---------
Co-authored-by: Sayak Paul <spsayakpaul@gmail.com >
2024-12-18 09:54:32 +01:00
b0a51e5cff
Fix flaky Hub CI (test_trainer.py) ( #35062 )
...
* fix
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com >
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* fix
* check
* check
* check
* check
* check
* check
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com >
* Update src/transformers/testing_utils.py
Co-authored-by: Lucain <lucainp@gmail.com >
* check
* check
* check
* Final space
* Final adjustment
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: Lucain <lucainp@gmail.com >
2024-12-05 17:02:27 +01:00
f297af55df
Fix: take into account meta device ( #34134 )
...
* Do not load for meta device
* Make some minor improvements
* Add test
* Update tests/utils/test_modeling_utils.py
Update test parameters
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Make the test simpler
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2024-11-20 11:32:07 +01:00
13493215ab
🧼 remove v4.44 deprecations ( #34245 )
...
* remove v4.44 deprecations
* PR comments
* deprecations scheduled for v4.50
* hub version update
* make fiuxp
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
2024-11-15 23:07:24 +01:00
34927b0f73
MPS: isin_mps_friendly can support 0D tensors ( #34538 )
...
* apply fix
* tested
* make fixup
2024-11-04 16:18:50 +00:00
655bec2da7
use a tinymodel to test generation config which aviod timeout ( #34482 )
...
* use a tinymodel to test generation config which aviod timeout
* remove tailing whitespace
2024-10-29 09:39:06 +01:00
409dd2d19c
Fix failing conversion ( #34010 )
...
* Fix
* Tests
* Typo
* Typo
2024-10-11 14:59:23 +02:00
3557f9a14a
Generate: can_generate() recursive check ( #33718 )
...
* add recursive check and test warnings
* missing space
* models without can_generate
2024-09-26 18:11:14 +01:00
e15687fffe
Generation: deprecate PreTrainedModel inheriting from GenerationMixin ( #33203 )
2024-09-23 18:28:36 +01:00
7542fac2c7
Pipeline: no side-effects on model.config and model.generation_config 🔫 ( #33480 )
2024-09-18 15:43:06 +01:00
72d4a3f9c1
mps: add isin_mps_friendly, a wrapper function for torch.isin ( #33099 )
2024-08-26 15:34:19 +01:00
970a16ec7f
Forbid PretrainedConfig from saving generate parameters; Update deprecations in generate-related code 🧹 ( #32659 )
...
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-08-23 11:12:53 +01:00
8ec028aded
Reduce the error log when using core models that need their weights renamed, and provide a step forward ( #32656 )
...
* Fin
* Modify msg
* Finish up nits
2024-08-16 13:05:57 -04:00
a5a8291ad1
Fix tests ( #32649 )
...
* skip failing tests
* [no-filter]
* [no-filter]
* fix wording catch in FA2 test
* [no-filter]
* trigger normal CI without filtering
2024-08-13 09:46:21 +01:00
f1c8542ff7
"to be not" -> "not to be" ( #32636 )
...
* "to be not" -> "not to be"
* Update sam.md
* Update trainer.py
* Update modeling_utils.py
* Update test_modeling_utils.py
* Update test_modeling_utils.py
2024-08-12 20:20:17 +01:00
7e5d46ded4
Respect the config's attn_implementation if set ( #32383 )
...
* Respect the config's attn if set
* Update test - can override in from_config
* Fix
2024-08-05 16:33:19 +01:00
df6eee9201
Follow up for #31973 ( #32025 )
...
* fix
* [test_all] trigger full CI
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-07-25 16:12:23 +02:00
817a676bd7
Don't default to other weights file when use_safetensors=True ( #31874 )
...
* Don't default to other weights file when use_safetensors=True
* Add tests
* Update tests/utils/test_modeling_utils.py
* Add clarifying comments to tests
* Update tests/utils/test_modeling_utils.py
* Update tests/utils/test_modeling_utils.py
2024-07-22 18:29:50 +01:00
693cb828ff
Fix bad test about slower init ( #32002 )
...
Bronked main
2024-07-16 10:33:05 -04:00
e0dfd7bcaf
Speedup model init on CPU (by 10x+ for llama-3-8B as one example) ( #31771 )
...
* 1,100%!
* Clean
* Don't touch DS
* Experiment with dtype allocation
* skip test_load_save_without_tied_weights test
* A little faster
* Include proper upscaling?
* Fixup tests
* Potentially skip?
* Let's see if this fixes git history
* Maintain new dtype
* Fin
* Rm hook idea for now
* New approach, see what breaks
* stage
* Clean
* Stash
* Should be fin now, just need to mark failing models
* Clean up
* Simplify
* Deal with weird models
* Enc/Dec
* Skip w/ reason
* Adjust test
* Fix test
* one more test
* Keep experimenting
* Fix ref
* TO REMOVE: testing feedback CI
* Right push
* Update tests/utils/test_modeling_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* disable
* Add new func
* Test nits from Amy
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Adjust comment
* Adjust comment on skip
* make private
* Fin
* Should be a not flag
* Clarify and rename test
---------
Co-authored-by: Marc Sun <marc@huggingface.co >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-07-16 09:32:01 -04:00
a1a34657d4
Avoid race condition ( #31973 )
...
* [test_all] hub
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* remove delete
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
* [test_all]
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-07-15 17:56:24 +02:00
1499a55008
Add warning message for beta and gamma parameters ( #31654 )
...
* Add warning message for and parameters
* Fix when the warning is raised
* Formatting changes
* Improve testing and remove duplicated warning from _fix_key
2024-07-11 13:01:47 +01:00
4c2538b863
Test loading generation config with safetensor weights ( #31550 )
...
fix test
2024-07-09 16:22:43 +02:00
8c5c180de0
Fix serialization for offloaded model ( #31727 )
...
* Fix serialization
* style
* add test
2024-07-05 08:07:07 +02:00
93cd94b79d
Move some test files (tets/test_xxx_utils.py) to tests/utils ( #31730 )
...
* move
* move
* move
* move
* Update tests/utils/test_image_processing_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-07-02 13:46:03 +02:00