Commit Graph

  • 5998f8625b refactor: nit change for get_parameters_from_modules (code debt) (#3815) main Mehant Kammakomati 2025-10-14 17:41:32 +05:30
  • f0313a64a2 Fix tracking swanlab (#3810) Marc Sun 2025-10-10 18:42:52 +02:00
  • df0c1870d9 Bump to python3.10 + update linter (#3809) Marc Sun 2025-10-10 18:22:51 +02:00
  • bc2478a472 fix (#3808) Marc Sun 2025-10-08 15:32:18 +02:00
  • 057edec226 fix (skip) cache flush when original device is cpu and offloaded to disk meta (#3796) Qubitium-ModelCloud 2025-10-08 17:48:04 +08:00
  • bc2f43d9e0 Update build_pr_documentation.yml mishig25-patch-2 Mishig 2025-09-30 13:12:29 +02:00
  • d581eb0599 Change workflow reference to strip_to_docstrings mishig25-patch-1 Mishig 2025-09-30 10:57:40 +02:00
  • 7eb413b8af [wip docbuilder] Mishig 2025-09-30 10:54:06 +02:00
  • db02e4dfd8 WIP low-bit-fsdp2 S1ro1 2025-09-24 12:16:47 +00:00
  • a98a6f0c98 Fix: truth value of tensor with more than 1 variable is unambiguous fix-grad-norm S1ro1 2025-09-23 12:17:17 +00:00
  • 14383311c2 Remove deprecated FindTiedParametersResult (#3786) Yuanyuan Chen 2025-09-19 21:00:44 +08:00
  • a737437c8a Revert "fix: correct dictionary unpacking in recursively_apply function (#3766)" (#3787) Marc Sun 2025-09-19 12:50:53 +02:00
  • 6997855ace rm mlflow (#3783) Marc Sun 2025-09-19 11:32:37 +02:00
  • 401075ffff Add optional typing (#3769) Yuanyuan Chen 2025-09-19 00:08:54 +08:00
  • 8031e24e84 refactor: Use with in Accelerator.autocast()instead of __enter__() and __exit__() for more elegant style. (#3767) Walker 2025-09-18 21:27:12 +08:00
  • 3db9fb6991 fix: correct dictionary unpacking in recursively_apply function (#3766) Quentin Gallouédec 2025-09-18 07:18:28 -06:00
  • fe795fd324 switch XPU ccl backend to torch-builtin xccl in test_zero3_integration (#3773) Yao Matrix 2025-09-18 02:50:32 -07:00
  • 409b356f45 Lower complexity of get_balanced_memory by adding a set (#3776) Samuel Barry 2025-09-17 09:30:55 -07:00
  • 1b50d93999 enable 2 model hook ut cases on XPU (#3774) Yao Matrix 2025-09-16 16:32:59 -07:00
  • b260de260e Feat: initial impl ulysses-sp S1ro1 2025-09-16 14:26:41 +00:00
  • 50042518db Some stuff feat/async-checkpointing S1ro1 2025-09-16 12:08:49 +00:00
  • e79f383625 Added Tip for better rendering (#3781) Sergio Paniego Blanco 2025-09-15 16:22:56 +02:00
  • 571ca0200d Merge branch 'main' into feat/async-checkpointing S1ro1 2025-09-13 14:16:07 +00:00
  • 0cb1a33475 fix Muti node CUDA error: invalid device ordinal #3775 (#3779) Ricardo Dominguez-Olmedo 2025-09-13 15:32:47 +02:00
  • dfdc219018 use reset_peak_memory_stats on xpu (#3772) Yao Matrix 2025-09-12 06:05:31 -07:00
  • 45959d7b96 fix FSDP2 test case failure on XPU (#3771) Yao Matrix 2025-09-12 06:05:05 -07:00
  • 8b493524c8 Fix: typo makes tests fail (#3765) Matej Sirovatka 2025-09-09 12:06:05 +02:00
  • 9ead94e556 fix: torch_npu import error (#3764) Ju4tCode 2025-09-09 17:38:57 +08:00
  • a0bc36e8ed feat: allow mixed precision policy as dtype (#3751) Mehant Kammakomati 2025-09-09 02:59:20 +05:30
  • 8830e58a91 Fix typos (#3753) Yuanyuan Chen 2025-09-08 19:33:18 +08:00
  • 40ebb4bea3 make torch_native_parallelism examples device agnostic (#3759) Yao Matrix 2025-09-08 03:16:56 -07:00
  • ec92b1af7a fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 (#3762) Walker 2025-09-07 05:57:46 +08:00
  • 62ede1ed2a CP docs typos fixed (#3761) Sergio Paniego Blanco 2025-09-05 12:23:33 +02:00
  • 9f9c490c6b fix: specify device for process_tensor in example usage (#3755) Quentin Gallouédec 2025-09-03 03:05:24 -06:00
  • 8b55e62b2c xpu INT64 all_gather issue fixed in 2.9 (#3756) Yao Matrix 2025-09-03 01:56:14 -07:00
  • 0e4419b347 Add bf16/fp16 support for amp with mps device (#3373) Marc Sun 2025-08-28 14:20:56 +02:00
  • 3b67c21696 Add support for TE MXFP8 recipe in accelerate (#3688) Peter St. John 2025-08-27 06:08:34 -06:00
  • 7b981788ca [ND Parallel] Update examples, cleanup (#3737) Matej Sirovatka 2025-08-26 14:41:14 +02:00
  • c4460e33ef fix: specify device_ids in torch.distributed.barrier for PartialState (#3744) Quentin Gallouédec 2025-08-26 05:05:33 -07:00
  • 623cc0ba58 Release: v1.10.1 v1.10.1 v1.10.0-release Marc Sun 2025-08-25 15:40:49 +02:00
  • d73e921ba3 Protect import for device_mesh (#3742) Marc Sun 2025-08-22 15:44:56 +02:00
  • f4593e36fc Feat: add to_json (#3743) Matej Sirovatka 2025-08-22 15:25:38 +02:00
  • a3f8d23402 Merge updated examples context-parallel-flex-attn S1ro1 2025-08-23 15:42:00 +00:00
  • 8ecadce10a Feat: cleanup S1ro1 2025-08-23 15:34:55 +00:00
  • 91985ab9d7 Feat: first version S1ro1 2025-08-23 15:03:28 +00:00
  • 5dd3d0b690 Protect import for device_mesh (#3742) Marc Sun 2025-08-22 15:44:56 +02:00
  • 5fe4460ccd Feat: add to_json (#3743) Matej Sirovatka 2025-08-22 15:25:38 +02:00
  • 979d81e4a9 fix: cpu ram efficient loading for nd or hsdp parallelisms (#3740) Mehant Kammakomati 2025-08-21 17:10:06 +05:30
  • 7c25f696b8 Fix convert LayerNorm without bias to fp8 (#3725) Junya Morioka 2025-08-19 05:28:48 +09:00
  • a7d6f28f99 feat: add ignored_params support for fsdp2 (#3731) Mehant Kammakomati 2025-08-18 18:01:19 +05:30
  • 6918a5ab78 Feat: rename fsdp2 examples transformers-nd-parallel S1ro1 2025-08-13 17:52:53 +00:00
  • 23cf4ef8a3 Fix tests (#3722) Marc Sun 2025-08-07 16:59:29 +02:00
  • ff872f5f71 bump to 1.11.0dev0 Marc Sun 2025-08-07 12:58:08 +02:00
  • 5dcd16e789 Release: v1.10.0 v1.10.0 Marc Sun 2025-08-07 12:50:34 +02:00
  • 2941a6b0fb remove (#3721) Marc Sun 2025-08-07 12:48:11 +02:00
  • c0a3aefea8 feature: CpuOffload pre_forward don't attempt to move if already on device (#3695) Joe Gaffney 2025-08-06 18:46:13 +01:00
  • 42fdda1c1f Remove ParallelismConfig from PartialState (#3720) Marc Sun 2025-08-06 19:00:26 +02:00
  • e23b004b30 TST Add test for FSDP ignored_modules as str (#3719) Benjamin Bossan 2025-08-06 18:05:54 +02:00
  • 898cad39e8 Fix: tp size wouldn't read from env (#3716) Matej Sirovatka 2025-08-06 15:08:55 +02:00
  • 24c8157bba Set parallelism_config in constructor due to Trainer reset of State (#3713) Wing Lian 2025-08-06 07:47:49 -04:00
  • 6891c57072 Feat: context parallel v2.0 (#3700) Matej Sirovatka 2025-08-05 16:17:13 +02:00
  • 24e48f3d20 ENH: Allow FSDP ignored modules to be regex (#3698) Benjamin Bossan 2025-08-05 14:23:14 +02:00
  • 6640ff415c Fix: Ensure environment variable values are case-insensitive in Accelerate (#3712) jp 2025-08-05 20:22:00 +09:00
  • c173b4fdd6 Fix: prepare works even if nothing except tp specified (rare) (#3707) Matej Sirovatka 2025-08-05 13:07:37 +02:00
  • 6c624c4a6b WIP wip-from-pretrained S1ro1 2025-08-04 00:03:01 +00:00
  • cb343c63d7 Add Parallelism getter property to Accelerator class (#3703) WoosungMyung 2025-08-03 01:20:08 +09:00
  • 354b0b5da3 WIP: very much wip but works (probably) S1ro1 2025-08-01 01:28:49 +00:00
  • 9359a0194f Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) (#3682) salman 2025-07-30 20:03:13 +01:00
  • aafde25cfc Feat: working examples fsdp2-tp S1ro1 2025-07-28 11:11:46 +00:00
  • d21ff9f245 Feat: some fsdp2 improv S1ro1 2025-07-27 22:56:20 +00:00
  • 00dd4af6ec Feat: TP works S1ro1 2025-07-27 14:53:56 +00:00
  • 9fdc320d2b Fix: improve trackio S1ro1 2025-07-27 01:49:02 +00:00
  • 7ddb3abb82 Feat: add trackio S1ro1 2025-07-27 01:14:02 +00:00
  • 36a1234cf1 Merge branch 'main' into device_mesh_parallelism_config S1ro1 2025-07-27 00:59:09 +00:00
  • 1017752ab4 Feat: some cleanup of example S1ro1 2025-07-27 00:58:10 +00:00
  • 235d29ff8c Feat: cleanup example S1ro1 2025-07-26 18:28:52 +00:00
  • a402faff9f use len on tuple to check if empty Wing Lian 2025-07-25 00:05:39 -04:00
  • e8963dc19d guards for empty self.parallelism_config Wing Lian 2025-07-24 21:39:28 -04:00
  • 76a546fd68 add guards Wing Lian 2025-07-24 21:01:06 -04:00
  • 168b520279 reverting upcast Salman Mohammadi 2025-07-24 18:43:31 +00:00
  • 379daa0b06 removing return Salman Mohammadi 2025-07-24 18:41:02 +00:00
  • 74009ea783 comments Salman Mohammadi 2025-07-24 18:37:50 +00:00
  • 133ef5f7da WIP: fix tp S1ro1 2025-07-23 21:20:04 +00:00
  • 52c178fe4f merging Salman Mohammadi 2025-07-23 16:48:39 +00:00
  • 80deb7ee32 removing fn Salman Mohammadi 2025-07-23 16:41:18 +00:00
  • a6feca96db comments Salman Mohammadi 2025-07-23 16:39:55 +00:00
  • f274b35401 linting Salman Mohammadi 2025-07-23 16:28:28 +00:00
  • 07bf2b3ba9 fixing partial state check Salman Mohammadi 2025-07-23 15:48:56 +00:00
  • 1a49c164b0 comments Salman Mohammadi 2025-07-23 15:47:43 +00:00
  • f21547f3c7 fixed state Salman Mohammadi 2025-07-23 15:45:41 +00:00
  • dc145c2bc9 updating examples, fixing state Salman Mohammadi 2025-07-23 15:30:07 +00:00
  • 4a2dd58fd8 updating example Salman Mohammadi 2025-07-23 11:58:58 +00:00
  • 7f243e0997 updating example Salman Mohammadi 2025-07-23 11:40:23 +00:00
  • dd894525c8 Feat: ensure ddp/tp only works S1ro1 2025-07-22 23:54:38 +00:00
  • f96fea3cb8 Fixes for non-fsdp2 case S1ro1 2025-07-22 17:32:02 +00:00
  • 61868c29d3 merging Salman Mohammadi 2025-07-22 17:25:49 +00:00
  • 3d235cb4bb wip example update Salman Mohammadi 2025-07-22 17:25:01 +00:00
  • 4e99b9cd7a Fix: minor tweaks S1ro1 2025-07-22 17:04:20 +00:00
  • aa745766c5 storing state in accelerator rather than acceleratorstate Salman Mohammadi 2025-07-22 16:55:58 +00:00
  • aa749ad364 reverting Salman Mohammadi 2025-07-22 15:43:05 +00:00