1863 Commits

Author SHA1 Message Date
5998f8625b refactor: nit change for get_parameters_from_modules (code debt) (#3815)
* refactor: nit change for get_parameters_from_modules

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: quality check

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-10-14 14:11:32 +02:00
f0313a64a2 Fix tracking swanlab (#3810)
* py310 and some changes

* fix

* Revert "py310 and some changes"

This reverts commit 0434d2929285d2a17c5c2e014c9c7c6cd06f0d9a.

* fix
2025-10-10 18:42:52 +02:00
df0c1870d9 Bump to python3.10 + update linter (#3809)
* py310 and some changes

* fix

* better
2025-10-10 18:22:51 +02:00
bc2478a472 fix (#3808) 2025-10-08 15:32:18 +02:00
057edec226 fix (skip) cache flush when original device is cpu and offloaded to disk meta (#3796) 2025-10-08 11:48:04 +02:00
14383311c2 Remove deprecated FindTiedParametersResult (#3786)
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-19 15:00:44 +02:00
a737437c8a Revert "fix: correct dictionary unpacking in recursively_apply function (#3766)" (#3787)
This reverts commit 3db9fb6991a296d0535e97d765f53da6b7246ff3.
2025-09-19 12:50:53 +02:00
6997855ace rm mlflow (#3783) 2025-09-19 11:32:37 +02:00
401075ffff Add optional typing (#3769)
* Fix typing

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

* Format code

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>

---------

Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
2025-09-18 18:08:54 +02:00
8031e24e84 refactor: Use with in Accelerator.autocast()instead of __enter__() and __exit__() for more elegant style. (#3767)
* refactor: Use ` with`  in `Accelerator.autocast()`instead of  `__enter__()` and `__exit__()`for more elegant style.

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-09-18 15:27:12 +02:00
3db9fb6991 fix: correct dictionary unpacking in recursively_apply function (#3766) 2025-09-18 15:18:28 +02:00
fe795fd324 switch XPU ccl backend to torch-builtin xccl in test_zero3_integration (#3773)
* switch XPU ccl backend to torch-builtin xccl in test_zero3_integration
remove xpu workaround in RegressionModel, we are OK now
rename test_multigpu to test_multidevice to reflect the fact

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix ci issues

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-09-18 11:50:32 +02:00
409b356f45 Lower complexity of get_balanced_memory by adding a set (#3776)
* Lower complexity by adding a set

* Push vibe coded eval script

* Clean
2025-09-17 18:30:55 +02:00
1b50d93999 enable 2 model hook ut cases on XPU (#3774)
* enable 2 model hooks tests on XPU

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* xx

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
2025-09-17 01:32:59 +02:00
e79f383625 Added Tip for better rendering (#3781) 2025-09-15 16:22:56 +02:00
0cb1a33475 fix Muti node CUDA error: invalid device ordinal #3775 (#3779) 2025-09-13 15:32:47 +02:00
dfdc219018 use reset_peak_memory_stats on xpu (#3772)
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-09-12 15:05:31 +02:00
45959d7b96 fix FSDP2 test case failure on XPU (#3771)
* fix FSDP2 test case failure on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-09-12 15:05:05 +02:00
8b493524c8 Fix: typo makes tests fail (#3765) 2025-09-09 12:06:05 +02:00
9ead94e556 fix: torch_npu import error (#3764) 2025-09-09 11:38:57 +02:00
a0bc36e8ed feat: allow mixed precision policy as dtype (#3751)
* feat: allow mixed precision as dtype

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: allow mixed precision as dtype

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: allow mixed precision as dtype

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* test: extend test for MP as str dtype

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* Fix: style

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 23:29:20 +02:00
8830e58a91 Fix typos (#3753)
* Fix typos

Signed-off-by: cyy <cyyever@outlook.com>

* Fix: style

---------

Signed-off-by: cyy <cyyever@outlook.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 13:33:18 +02:00
40ebb4bea3 make torch_native_parallelism examples device agnostic (#3759)
* make torch_native_parallelism examples device agnostic

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xxx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xxx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Style + deprecation warning

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-09-08 12:16:56 +02:00
ec92b1af7a fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 (#3762)
* fix :`model.set_requires_gradient_sync(False)` should be called to turn off gradient synchronization in FSDP2.

* fix: remove trailing whitespace
2025-09-06 23:57:46 +02:00
62ede1ed2a CP docs typos fixed (#3761) 2025-09-05 12:23:33 +02:00
9f9c490c6b fix: specify device for process_tensor in example usage (#3755) 2025-09-03 11:05:24 +02:00
8b55e62b2c xpu INT64 all_gather issue fixed in 2.9 (#3756)
* xpu gather issue fixed in 2.9 and validated config_yamls on XPU

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* xxx

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-09-03 10:56:14 +02:00
0e4419b347 Add bf16/fp16 support for amp with mps device (#3373)
* Fix tests

* format

* amp mps support for fp16/bf16

* add error

* revert

* revert

* fix

* ruff
2025-08-28 14:20:56 +02:00
3b67c21696 Add support for TE MXFP8 recipe in accelerate (#3688)
* Add support for MXFP8 recipe in accelerate

* ruff reformat

* add and fix test for deepspeed / fp8 from config

* minor lints

Signed-off-by: Peter St. John <pstjohn@nvidia.com>

---------

Signed-off-by: Peter St. John <pstjohn@nvidia.com>
2025-08-27 14:08:34 +02:00
7b981788ca [ND Parallel] Update examples, cleanup (#3737)
* Fix: update cp example

* Feat: add rename examples

* WIP: Cleanup with_trainer

* Feat: more cleanup

* Feat: more refactor + better readme + more configs

* Fin
2025-08-26 14:41:14 +02:00
c4460e33ef fix: specify device_ids in torch.distributed.barrier for PartialState (#3744) 2025-08-26 14:05:33 +02:00
5dd3d0b690 Protect import for device_mesh (#3742) 2025-08-22 15:44:56 +02:00
5fe4460ccd Feat: add to_json (#3743) 2025-08-22 15:25:38 +02:00
979d81e4a9 fix: cpu ram efficient loading for nd or hsdp parallelisms (#3740)
Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-08-21 13:40:06 +02:00
7c25f696b8 Fix convert LayerNorm without bias to fp8 (#3725) 2025-08-18 22:28:48 +02:00
a7d6f28f99 feat: add ignored_params support for fsdp2 (#3731)
* feat: add ignored_params support for fsdp2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add ignored_params support for fsdp2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add ignored_params support for fsdp2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* feat: add ignored_params support for fsdp2

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* test: update testcase for fsdp2 ignored_params

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: add defensive use of ignored params

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

* fix: styling errors

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>

---------

Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
2025-08-18 14:31:19 +02:00
23cf4ef8a3 Fix tests (#3722)
* fix tests

* fix skorch tests

* fix deepspeed

* pin torch as compile tests don't pass and create segmentation fault

* skip compile tests

* fix

* forgot v ...

* style
2025-08-07 16:59:29 +02:00
ff872f5f71 bump to 1.11.0dev0 2025-08-07 12:58:08 +02:00
2941a6b0fb remove (#3721) 2025-08-07 12:48:11 +02:00
c0a3aefea8 feature: CpuOffload pre_forward don't attempt to move if already on device (#3695)
* feature: added optimisation to not attempt to move devices if allready on that the device. This is more noticiable in large step itterations on diffusion loops when the pre_froward can get called many times

* fix: linting

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-08-06 19:46:13 +02:00
42fdda1c1f Remove ParallelismConfig from PartialState (#3720)
* remove

* style

* fix

* valueerror instead

* add device_mesh
2025-08-06 19:00:26 +02:00
e23b004b30 TST Add test for FSDP ignored_modules as str (#3719)
Follow up to #3698.
2025-08-06 18:05:54 +02:00
898cad39e8 Fix: tp size wouldn't read from env (#3716) 2025-08-06 15:08:55 +02:00
24c8157bba Set parallelism_config in constructor due to Trainer reset of State (#3713) 2025-08-06 13:47:49 +02:00
6891c57072 Feat: context parallel v2.0 (#3700)
* Cleanup: context parallel

* Feat: cleanup

* Feat: concept guide

* Fix: rename + version check

* Style

* Fix: add to namespace in a test

* Fix: add skip_if on dataclass tests

* Fix: proper version for version check

* Feat: add tests and cleanup

* Fix: properly version check added tests

* Feat: address comments

* Fix: add both shift_labels and labels to make the model.forward calculate loss

* Fix: remove import, improve comment

* Fix: final checks

* Fix: style

* Fix: style
2025-08-05 16:17:13 +02:00
24e48f3d20 ENH: Allow FSDP ignored modules to be regex (#3698)
* ENH: Allow FSDP ignored modules to be regex

Description

For FSDP, there is an option to indicate ignored_modules, which should
be a list of modules are ignored by FSDP. Even though this argument was
supported in accelerate, it was not very usable:

1. Listing all modules can tricky, especially with something like PEFT,
where the whole model is wrapped and thus the module structure changes.
2. When configuring this argument, accelerate takes a detour via
environment variables. These can only be strings. Therefore, passing a
list of modules is not feasible.

Moreover, I noticed that the environment variable for ignored_modules
was not even set, so configuring this argument didn't even work.

Status

This PR is lacking tests. I would be happy for pointers on how to add
those.

Context

When using PEFT with LoRA and the target_parameters feature, I ran into
an issue training such a model with FSDP. The only working fix I found
was to ignore the layers targeted by LoRA. However, I could not
configure accelerate to do that. With this PR, it is possible. I could
successfully trained such a PEFT model that targets q_proj and v_proj by
setting fsdp_ignored_modules: '.*\.(q_proj$|v_proj$)'.

* Fix type annotation

* Fix failing test
2025-08-05 14:23:14 +02:00
jp
6640ff415c Fix: Ensure environment variable values are case-insensitive in Accelerate (#3712)
* Add: lower

* apply ruff
2025-08-05 13:22:00 +02:00
c173b4fdd6 Fix: prepare works even if nothing except tp specified (rare) (#3707) 2025-08-05 13:07:37 +02:00
cb343c63d7 Add Parallelism getter property to Accelerator class (#3703)
* Add rank property to Accelerator class

Signed-off-by: WoosungMyung <dntjd517@naver.com>

* Raise errors when parallelism configuration is not enabled

Signed-off-by: WoosungMyung <dntjd517@naver.com>

* Fix: PR feedback

Signed-off-by: WoosungMyung <dntjd517@naver.com>

* Fix: style

---------

Signed-off-by: WoosungMyung <dntjd517@naver.com>
Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
2025-08-02 18:20:08 +02:00
9359a0194f Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) (#3682)
* Feat: init

* Feat: add validation + init from kwargs

* Fix: minor fixes

* Feat: more cleanup

* Minor refactor

* remove import

* adding support for pre-configured device mesh

* adding device mesh to fsdp2

* moving mesh dim defn to parralismconfig

* tests

* WIP device mesh/accelerator validation

* WIP more tests

* Test Driven Development (TDD)

* fixing build_device_mesh

* FSDP dim names

* adding example

* WIP

* fixing HSDP

* Feat: add back old options

* working example

* debugging

* adding parallelism config to partialstate

* Feat: revert ddp changes

* Revert DDP

* Feat: (untested) update mesh dims and some minor tweaks

* adding dp_cp dims

* updating comments

* WIP

* wip 2

* reverting

* storing state in accelerator rather than acceleratorstate

* Fix: minor tweaks

* wip example update

* Fixes for non-fsdp2 case

* Feat: ensure ddp/tp only works

* updating example

* updating example

* updating examples, fixing state

* fixed state

* comments

* fixing partial state check

* linting

* comments

* removing fn

* WIP: fix tp

* comments

* removing return

* reverting upcast

* add guards

* guards for empty self.parallelism_config

* use len on tuple to check if empty

* Feat: cleanup example

* Feat: some cleanup of example

* Feat: add trackio

* Fix: improve trackio

* Feat: TP works

* Feat: some fsdp2 improv

* Feat: working examples

* handle clipping for tensor parallel

* Implicit replicate

* Refactor: move to separate file + cleanup + basic comments

* Fix: add unadded files, fix circular import

* Feat: better readme

* Feat: add blog + ultrascale links

* Tmp: should_save_model now returns only true

* Fix: remove implicit_replication and style

* Fix: remove optional

* add guard on parallelism_config.tp_enabled

* fix import

* fixing empty parallelism_config

* fix import path for test patch

* fixing patch

---------

Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>
Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com”>
Co-authored-by: Wing Lian <wing@axolotl.ai>
2025-07-30 21:03:13 +02:00