c7bd5350f0
Fix fsdp for generic-task models #40191
2025-08-18 14:44:16 +02:00
20ce210ab7
Revert "remove dtensors, not explicit ( #39840 )" ( #39912 )
...
* Revert "remove dtensors, not explicit (#39840 )"
This did not work with generation (lm_head needs extra care!)
This reverts commit 6dfd561d9cd722dfc09f702355518c6d09b9b4e3.
* update
* style?
2025-08-05 15:12:14 +02:00
6dfd561d9c
remove dtensors, not explicit ( #39840 )
...
* remove dtensors, not explicit
Co-authored-by: 3outeille <3outeille@users.noreply.github.com >
* style
* fix test
* update
* as we broke saving try to fix
* output layouts should exit
* nit
* devicemesh exists if it was distributed
* use _device_mesh of self
* update
* lol
* fix
* nit
* update
* fix!
* this???
* grumble grumble
* ?
* fuck me
---------
Co-authored-by: 3outeille <3outeille@users.noreply.github.com >
2025-08-01 22:02:47 +02:00
300d42a43e
Add ep ( #39501 )
...
* EP + updates
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com >
Co-authored-by: drbh <drbh@users.noreply.github.com >
* remove unrelated change
* not working yet but let's see where it goes!
* update the api a bit
* udpate
* where I am at for now
* fix ep
* refactor the API
* yups
* fix
* fixup
* clean modeling
* just support llama4 for now!
* properly avoid
* fix
* nits
* Update src/transformers/models/llama4/modeling_llama4.py
* Update src/transformers/integrations/tensor_parallel.py
* style
* ,,,,
* update
---------
Co-authored-by: Nouamane Tazi <NouamaneTazi@users.noreply.github.com >
Co-authored-by: drbh <drbh@users.noreply.github.com >
2025-07-25 19:46:17 +02:00
aff7df8436
enable static cache on TP model ( #39164 )
...
* enable static cache on TP model
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* check tp size before init kv cache
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix docstring
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* add tp tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix other cache head size
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-07-09 21:14:45 +00:00
2100ee6545
fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8 ( #39116 )
...
* fix UT failures on XPU w/ stock PyTorch 2.7 & 2.8
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* zamba2
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* xx
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* internvl
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* tp cases
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-06-30 11:49:03 +02:00
caf708da1b
[TP] Change command in tests to python3
( #38555 )
...
* Fix: change to `python3`
* update
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-06-03 11:03:33 +00:00
3bd1c20149
enable misc cases on XPU & use device agnostic APIs for cases in tests ( #38192 )
...
* use device agnostic APIs in tests
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* more
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* add reset_peak_memory_stats API
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* update
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2025-05-20 10:09:01 +02:00
46a4b7c909
Feat: save_pretrained for tensor parallel (and other parallelisms) models ( #37919 )
...
* tmp: initial save pretrained with dtensors
* Feat: add correctness tests
* Refactor: version checks
* Temp: 1:1 checkpoint llama4
* refactor
* Tests
* Feat: works
* Style
* Feat: version checks + minor fixes
* Style
* Fix: version checks in tests
* Feat: move more stuff into tensor_parallel.py
2025-05-19 18:16:21 +00:00
286393fbb1
enable tp on CPU ( #36299 )
...
* enable tp on CPU
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* get rank from cpu
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* update
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* enable TP tests
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix comment
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* em print
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix model id
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix conflict
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix index and add doc
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-03-31 10:55:47 +02:00
7f5077e536
fix typos in the tests directory ( #36717 )
2025-03-17 17:45:57 +00:00
1c4b62b219
Refactor some core stuff ( #36539 )
...
* some config changes
* update
* current state
* update
* update
* updates and cleanup
* something that works
* fixup
* fixes
* nits
* nit
* nits and fix
* Update src/transformers/integrations/tensor_parallel.py
Co-authored-by: Lysandre Debut <hi@lysand.re >
* Update src/transformers/integrations/tensor_parallel.py
Co-authored-by: Lysandre Debut <hi@lysand.re >
* cleanup
* style
* safe import
* fix
* updates
* rename stuff an clean
* style
* small updates
* ups
* oups
* nit
* protect imports
* update tp
* rodfl
* arf
* turbo nit on init
* fix import error
* frumble gumbgle
* try to fix the import error
* should fix the non model test
* update keep in float32
* update
* fix
* nits
* fix subvconfigs
* test was weird
* nit
* fix failing test
* fix instruct blip
* fixes
* style
* x.com
* fix overwrite
* ok last bit of failing test
---------
Co-authored-by: Lysandre Debut <hi@lysand.re >
2025-03-11 09:26:28 +01:00