5998f8625b
refactor: nit change for get_parameters_from_modules (code debt) ( #3815 )
...
* refactor: nit change for get_parameters_from_modules
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* fix: quality check
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
---------
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
2025-10-14 14:11:32 +02:00
f0313a64a2
Fix tracking swanlab ( #3810 )
...
* py310 and some changes
* fix
* Revert "py310 and some changes"
This reverts commit 0434d2929285d2a17c5c2e014c9c7c6cd06f0d9a.
* fix
2025-10-10 18:42:52 +02:00
df0c1870d9
Bump to python3.10 + update linter ( #3809 )
...
* py310 and some changes
* fix
* better
2025-10-10 18:22:51 +02:00
bc2478a472
fix ( #3808 )
2025-10-08 15:32:18 +02:00
057edec226
fix (skip) cache flush when original device is cpu
and offloaded to disk meta
( #3796 )
2025-10-08 11:48:04 +02:00
14383311c2
Remove deprecated FindTiedParametersResult ( #3786 )
...
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
2025-09-19 15:00:44 +02:00
a737437c8a
Revert "fix: correct dictionary unpacking in recursively_apply function ( #3766 )" ( #3787 )
...
This reverts commit 3db9fb6991a296d0535e97d765f53da6b7246ff3.
2025-09-19 12:50:53 +02:00
6997855ace
rm mlflow ( #3783 )
2025-09-19 11:32:37 +02:00
401075ffff
Add optional typing ( #3769 )
...
* Fix typing
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
* Format code
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
---------
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com >
2025-09-18 18:08:54 +02:00
8031e24e84
refactor: Use with
in Accelerator.autocast()instead of __enter__()
and __exit__()
for more elegant style. ( #3767 )
...
* refactor: Use ` with` in `Accelerator.autocast()`instead of `__enter__()` and `__exit__()`for more elegant style.
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-18 15:27:12 +02:00
3db9fb6991
fix: correct dictionary unpacking in recursively_apply function ( #3766 )
2025-09-18 15:18:28 +02:00
fe795fd324
switch XPU ccl backend to torch-builtin xccl in test_zero3_integration ( #3773 )
...
* switch XPU ccl backend to torch-builtin xccl in test_zero3_integration
remove xpu workaround in RegressionModel, we are OK now
rename test_multigpu to test_multidevice to reflect the fact
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* fix ci issues
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* xx
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
2025-09-18 11:50:32 +02:00
409b356f45
Lower complexity of get_balanced_memory by adding a set ( #3776 )
...
* Lower complexity by adding a set
* Push vibe coded eval script
* Clean
2025-09-17 18:30:55 +02:00
1b50d93999
enable 2 model hook ut cases on XPU ( #3774 )
...
* enable 2 model hooks tests on XPU
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
* xx
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
---------
Signed-off-by: Yao, Matrix <matrix.yao@intel.com >
2025-09-17 01:32:59 +02:00
e79f383625
Added Tip for better rendering ( #3781 )
2025-09-15 16:22:56 +02:00
0cb1a33475
fix Muti node CUDA error: invalid device ordinal #3775 ( #3779 )
2025-09-13 15:32:47 +02:00
dfdc219018
use reset_peak_memory_stats on xpu ( #3772 )
...
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-09-12 15:05:31 +02:00
45959d7b96
fix FSDP2 test case failure on XPU ( #3771 )
...
* fix FSDP2 test case failure on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-09-12 15:05:05 +02:00
8b493524c8
Fix: typo makes tests fail ( #3765 )
2025-09-09 12:06:05 +02:00
9ead94e556
fix: torch_npu import error ( #3764 )
2025-09-09 11:38:57 +02:00
a0bc36e8ed
feat: allow mixed precision policy as dtype ( #3751 )
...
* feat: allow mixed precision as dtype
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* feat: allow mixed precision as dtype
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* feat: allow mixed precision as dtype
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* test: extend test for MP as str dtype
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* Fix: style
---------
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com >
2025-09-08 23:29:20 +02:00
8830e58a91
Fix typos ( #3753 )
...
* Fix typos
Signed-off-by: cyy <cyyever@outlook.com >
* Fix: style
---------
Signed-off-by: cyy <cyyever@outlook.com >
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com >
2025-09-08 13:33:18 +02:00
40ebb4bea3
make torch_native_parallelism examples device agnostic ( #3759 )
...
* make torch_native_parallelism examples device agnostic
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* xxx
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* xxx
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* Style + deprecation warning
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com >
2025-09-08 12:16:56 +02:00
ec92b1af7a
fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 ( #3762 )
...
* fix :`model.set_requires_gradient_sync(False)` should be called to turn off gradient synchronization in FSDP2.
* fix: remove trailing whitespace
2025-09-06 23:57:46 +02:00
62ede1ed2a
CP docs typos fixed ( #3761 )
2025-09-05 12:23:33 +02:00
9f9c490c6b
fix: specify device for process_tensor in example usage ( #3755 )
2025-09-03 11:05:24 +02:00
8b55e62b2c
xpu INT64 all_gather issue fixed in 2.9 ( #3756 )
...
* xpu gather issue fixed in 2.9 and validated config_yamls on XPU
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* xxx
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-09-03 10:56:14 +02:00
0e4419b347
Add bf16/fp16 support for amp with mps device ( #3373 )
...
* Fix tests
* format
* amp mps support for fp16/bf16
* add error
* revert
* revert
* fix
* ruff
2025-08-28 14:20:56 +02:00
3b67c21696
Add support for TE MXFP8 recipe in accelerate ( #3688 )
...
* Add support for MXFP8 recipe in accelerate
* ruff reformat
* add and fix test for deepspeed / fp8 from config
* minor lints
Signed-off-by: Peter St. John <pstjohn@nvidia.com >
---------
Signed-off-by: Peter St. John <pstjohn@nvidia.com >
2025-08-27 14:08:34 +02:00
7b981788ca
[ND Parallel] Update examples, cleanup ( #3737 )
...
* Fix: update cp example
* Feat: add rename examples
* WIP: Cleanup with_trainer
* Feat: more cleanup
* Feat: more refactor + better readme + more configs
* Fin
2025-08-26 14:41:14 +02:00
c4460e33ef
fix: specify device_ids in torch.distributed.barrier for PartialState ( #3744 )
2025-08-26 14:05:33 +02:00
5dd3d0b690
Protect import for device_mesh ( #3742 )
2025-08-22 15:44:56 +02:00
5fe4460ccd
Feat: add to_json ( #3743 )
2025-08-22 15:25:38 +02:00
979d81e4a9
fix: cpu ram efficient loading for nd or hsdp parallelisms ( #3740 )
...
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
2025-08-21 13:40:06 +02:00
7c25f696b8
Fix convert LayerNorm without bias to fp8 ( #3725 )
2025-08-18 22:28:48 +02:00
a7d6f28f99
feat: add ignored_params support for fsdp2 ( #3731 )
...
* feat: add ignored_params support for fsdp2
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* feat: add ignored_params support for fsdp2
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* feat: add ignored_params support for fsdp2
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* feat: add ignored_params support for fsdp2
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* test: update testcase for fsdp2 ignored_params
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* fix: add defensive use of ignored params
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
* fix: styling errors
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
---------
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com >
2025-08-18 14:31:19 +02:00
23cf4ef8a3
Fix tests ( #3722 )
...
* fix tests
* fix skorch tests
* fix deepspeed
* pin torch as compile tests don't pass and create segmentation fault
* skip compile tests
* fix
* forgot v ...
* style
2025-08-07 16:59:29 +02:00
ff872f5f71
bump to 1.11.0dev0
2025-08-07 12:58:08 +02:00
2941a6b0fb
remove ( #3721 )
2025-08-07 12:48:11 +02:00
c0a3aefea8
feature: CpuOffload pre_forward don't attempt to move if already on device ( #3695 )
...
* feature: added optimisation to not attempt to move devices if allready on that the device. This is more noticiable in large step itterations on diffusion loops when the pre_froward can get called many times
* fix: linting
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-06 19:46:13 +02:00
42fdda1c1f
Remove ParallelismConfig
from PartialState
( #3720 )
...
* remove
* style
* fix
* valueerror instead
* add device_mesh
2025-08-06 19:00:26 +02:00
e23b004b30
TST Add test for FSDP ignored_modules as str ( #3719 )
...
Follow up to #3698 .
2025-08-06 18:05:54 +02:00
898cad39e8
Fix: tp size wouldn't read from env ( #3716 )
2025-08-06 15:08:55 +02:00
24c8157bba
Set parallelism_config in constructor due to Trainer reset of State ( #3713 )
2025-08-06 13:47:49 +02:00
6891c57072
Feat: context parallel v2.0 ( #3700 )
...
* Cleanup: context parallel
* Feat: cleanup
* Feat: concept guide
* Fix: rename + version check
* Style
* Fix: add to namespace in a test
* Fix: add skip_if on dataclass tests
* Fix: proper version for version check
* Feat: add tests and cleanup
* Fix: properly version check added tests
* Feat: address comments
* Fix: add both shift_labels and labels to make the model.forward calculate loss
* Fix: remove import, improve comment
* Fix: final checks
* Fix: style
* Fix: style
2025-08-05 16:17:13 +02:00
24e48f3d20
ENH: Allow FSDP ignored modules to be regex ( #3698 )
...
* ENH: Allow FSDP ignored modules to be regex
Description
For FSDP, there is an option to indicate ignored_modules, which should
be a list of modules are ignored by FSDP. Even though this argument was
supported in accelerate, it was not very usable:
1. Listing all modules can tricky, especially with something like PEFT,
where the whole model is wrapped and thus the module structure changes.
2. When configuring this argument, accelerate takes a detour via
environment variables. These can only be strings. Therefore, passing a
list of modules is not feasible.
Moreover, I noticed that the environment variable for ignored_modules
was not even set, so configuring this argument didn't even work.
Status
This PR is lacking tests. I would be happy for pointers on how to add
those.
Context
When using PEFT with LoRA and the target_parameters feature, I ran into
an issue training such a model with FSDP. The only working fix I found
was to ignore the layers targeted by LoRA. However, I could not
configure accelerate to do that. With this PR, it is possible. I could
successfully trained such a PEFT model that targets q_proj and v_proj by
setting fsdp_ignored_modules: '.*\.(q_proj$|v_proj$)'.
* Fix type annotation
* Fix failing test
2025-08-05 14:23:14 +02:00
6640ff415c
Fix: Ensure environment variable values are case-insensitive in Accelerate ( #3712 )
...
* Add: lower
* apply ruff
2025-08-05 13:22:00 +02:00
c173b4fdd6
Fix: prepare works even if nothing except tp specified (rare) ( #3707 )
2025-08-05 13:07:37 +02:00
cb343c63d7
Add Parallelism getter property to Accelerator class ( #3703 )
...
* Add rank property to Accelerator class
Signed-off-by: WoosungMyung <dntjd517@naver.com >
* Raise errors when parallelism configuration is not enabled
Signed-off-by: WoosungMyung <dntjd517@naver.com >
* Fix: PR feedback
Signed-off-by: WoosungMyung <dntjd517@naver.com >
* Fix: style
---------
Signed-off-by: WoosungMyung <dntjd517@naver.com >
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com >
2025-08-02 18:20:08 +02:00
9359a0194f
Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) ( #3682 )
...
* Feat: init
* Feat: add validation + init from kwargs
* Fix: minor fixes
* Feat: more cleanup
* Minor refactor
* remove import
* adding support for pre-configured device mesh
* adding device mesh to fsdp2
* moving mesh dim defn to parralismconfig
* tests
* WIP device mesh/accelerator validation
* WIP more tests
* Test Driven Development (TDD)
* fixing build_device_mesh
* FSDP dim names
* adding example
* WIP
* fixing HSDP
* Feat: add back old options
* working example
* debugging
* adding parallelism config to partialstate
* Feat: revert ddp changes
* Revert DDP
* Feat: (untested) update mesh dims and some minor tweaks
* adding dp_cp dims
* updating comments
* WIP
* wip 2
* reverting
* storing state in accelerator rather than acceleratorstate
* Fix: minor tweaks
* wip example update
* Fixes for non-fsdp2 case
* Feat: ensure ddp/tp only works
* updating example
* updating example
* updating examples, fixing state
* fixed state
* comments
* fixing partial state check
* linting
* comments
* removing fn
* WIP: fix tp
* comments
* removing return
* reverting upcast
* add guards
* guards for empty self.parallelism_config
* use len on tuple to check if empty
* Feat: cleanup example
* Feat: some cleanup of example
* Feat: add trackio
* Fix: improve trackio
* Feat: TP works
* Feat: some fsdp2 improv
* Feat: working examples
* handle clipping for tensor parallel
* Implicit replicate
* Refactor: move to separate file + cleanup + basic comments
* Fix: add unadded files, fix circular import
* Feat: better readme
* Feat: add blog + ultrascale links
* Tmp: should_save_model now returns only true
* Fix: remove implicit_replication and style
* Fix: remove optional
* add guard on parallelism_config.tp_enabled
* fix import
* fixing empty parallelism_config
* fix import path for test patch
* fixing patch
---------
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com >
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com ”>
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-07-30 21:03:13 +02:00