accelerate

mirror of https://github.com/huggingface/accelerate.git synced 2025-10-20 10:03:46 +08:00

Author	SHA1	Message	Date
Mehant Kammakomati	5998f8625b	refactor: nit change for get_parameters_from_modules (code debt) (#3815 ) * refactor: nit change for get_parameters_from_modules Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: quality check Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>	2025-10-14 14:11:32 +02:00
Marc Sun	f0313a64a2	Fix tracking swanlab (#3810 ) * py310 and some changes * fix * Revert "py310 and some changes" This reverts commit 0434d2929285d2a17c5c2e014c9c7c6cd06f0d9a. * fix	2025-10-10 18:42:52 +02:00
Marc Sun	df0c1870d9	Bump to python3.10 + update linter (#3809 ) * py310 and some changes * fix * better	2025-10-10 18:22:51 +02:00
Marc Sun	bc2478a472	fix (#3808 )	2025-10-08 15:32:18 +02:00
Qubitium-ModelCloud	057edec226	fix (skip) cache flush when original device is `cpu` and offloaded to disk `meta` (#3796 )	2025-10-08 11:48:04 +02:00
Yuanyuan Chen	14383311c2	Remove deprecated FindTiedParametersResult (#3786 ) Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-19 15:00:44 +02:00
Marc Sun	a737437c8a	Revert "fix: correct dictionary unpacking in recursively_apply function (#3766 )" (#3787 ) This reverts commit 3db9fb6991a296d0535e97d765f53da6b7246ff3.	2025-09-19 12:50:53 +02:00
Marc Sun	6997855ace	rm mlflow (#3783 )	2025-09-19 11:32:37 +02:00
Yuanyuan Chen	401075ffff	Add optional typing (#3769 ) * Fix typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-18 18:08:54 +02:00
Walker	8031e24e84	refactor: Use `with` in Accelerator.autocast()instead of `__enter__()` and `__exit__()` for more elegant style. (#3767 ) * refactor: Use ` with` in `Accelerator.autocast()`instead of `__enter__()` and `__exit__()`for more elegant style. * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-09-18 15:27:12 +02:00
Quentin Gallouédec	3db9fb6991	fix: correct dictionary unpacking in recursively_apply function (#3766 )	2025-09-18 15:18:28 +02:00
Yao Matrix	fe795fd324	switch XPU ccl backend to torch-builtin xccl in test_zero3_integration (#3773 ) * switch XPU ccl backend to torch-builtin xccl in test_zero3_integration remove xpu workaround in RegressionModel, we are OK now rename test_multigpu to test_multidevice to reflect the fact Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * fix ci issues Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * xx Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-09-18 11:50:32 +02:00
Samuel Barry	409b356f45	Lower complexity of get_balanced_memory by adding a set (#3776 ) * Lower complexity by adding a set * Push vibe coded eval script * Clean	2025-09-17 18:30:55 +02:00
Yao Matrix	1b50d93999	enable 2 model hook ut cases on XPU (#3774 ) * enable 2 model hooks tests on XPU Signed-off-by: Yao, Matrix <matrix.yao@intel.com> * xx Signed-off-by: Yao, Matrix <matrix.yao@intel.com> --------- Signed-off-by: Yao, Matrix <matrix.yao@intel.com>	2025-09-17 01:32:59 +02:00
Sergio Paniego Blanco	e79f383625	Added Tip for better rendering (#3781 )	2025-09-15 16:22:56 +02:00
Ricardo Dominguez-Olmedo	0cb1a33475	fix Muti node CUDA error: invalid device ordinal #3775 (#3779 )	2025-09-13 15:32:47 +02:00
Yao Matrix	dfdc219018	use reset_peak_memory_stats on xpu (#3772 ) Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-09-12 15:05:31 +02:00
Yao Matrix	45959d7b96	fix FSDP2 test case failure on XPU (#3771 ) * fix FSDP2 test case failure on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-09-12 15:05:05 +02:00
Matej Sirovatka	8b493524c8	Fix: typo makes tests fail (#3765 )	2025-09-09 12:06:05 +02:00
Ju4tCode	9ead94e556	fix: torch_npu import error (#3764 )	2025-09-09 11:38:57 +02:00
Mehant Kammakomati	a0bc36e8ed	feat: allow mixed precision policy as dtype (#3751 ) * feat: allow mixed precision as dtype Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * feat: allow mixed precision as dtype Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * feat: allow mixed precision as dtype Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * test: extend test for MP as str dtype Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * Fix: style --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>	2025-09-08 23:29:20 +02:00
Yuanyuan Chen	8830e58a91	Fix typos (#3753 ) * Fix typos Signed-off-by: cyy <cyyever@outlook.com> * Fix: style --------- Signed-off-by: cyy <cyyever@outlook.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>	2025-09-08 13:33:18 +02:00
Yao Matrix	40ebb4bea3	make torch_native_parallelism examples device agnostic (#3759 ) * make torch_native_parallelism examples device agnostic Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Style + deprecation warning --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>	2025-09-08 12:16:56 +02:00
Walker	ec92b1af7a	fix: model.set_requires_gradient_sync(False) should be called to turn off gradient synchronization in FSDP2 (#3762 ) * fix :`model.set_requires_gradient_sync(False)` should be called to turn off gradient synchronization in FSDP2. * fix: remove trailing whitespace	2025-09-06 23:57:46 +02:00
Sergio Paniego Blanco	62ede1ed2a	CP docs typos fixed (#3761 )	2025-09-05 12:23:33 +02:00
Quentin Gallouédec	9f9c490c6b	fix: specify device for process_tensor in example usage (#3755 )	2025-09-03 11:05:24 +02:00
Yao Matrix	8b55e62b2c	xpu INT64 all_gather issue fixed in 2.9 (#3756 ) * xpu gather issue fixed in 2.9 and validated config_yamls on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-09-03 10:56:14 +02:00
Marc Sun	0e4419b347	Add bf16/fp16 support for amp with mps device (#3373 ) * Fix tests * format * amp mps support for fp16/bf16 * add error * revert * revert * fix * ruff	2025-08-28 14:20:56 +02:00
Peter St. John	3b67c21696	Add support for TE MXFP8 recipe in accelerate (#3688 ) * Add support for MXFP8 recipe in accelerate * ruff reformat * add and fix test for deepspeed / fp8 from config * minor lints Signed-off-by: Peter St. John <pstjohn@nvidia.com> --------- Signed-off-by: Peter St. John <pstjohn@nvidia.com>	2025-08-27 14:08:34 +02:00
Matej Sirovatka	7b981788ca	[ND Parallel] Update examples, cleanup (#3737 ) * Fix: update cp example * Feat: add rename examples * WIP: Cleanup with_trainer * Feat: more cleanup * Feat: more refactor + better readme + more configs * Fin	2025-08-26 14:41:14 +02:00
Quentin Gallouédec	c4460e33ef	fix: specify device_ids in torch.distributed.barrier for PartialState (#3744 )	2025-08-26 14:05:33 +02:00
Marc Sun	5dd3d0b690	Protect import for device_mesh (#3742 )	2025-08-22 15:44:56 +02:00
Matej Sirovatka	5fe4460ccd	Feat: add to_json (#3743 )	2025-08-22 15:25:38 +02:00
Mehant Kammakomati	979d81e4a9	fix: cpu ram efficient loading for nd or hsdp parallelisms (#3740 ) Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>	2025-08-21 13:40:06 +02:00
Junya Morioka	7c25f696b8	Fix convert LayerNorm without bias to fp8 (#3725 )	2025-08-18 22:28:48 +02:00
Mehant Kammakomati	a7d6f28f99	feat: add ignored_params support for fsdp2 (#3731 ) * feat: add ignored_params support for fsdp2 Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * feat: add ignored_params support for fsdp2 Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * feat: add ignored_params support for fsdp2 Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * feat: add ignored_params support for fsdp2 Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * test: update testcase for fsdp2 ignored_params Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: add defensive use of ignored params Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> * fix: styling errors Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>	2025-08-18 14:31:19 +02:00
Marc Sun	23cf4ef8a3	Fix tests (#3722 ) * fix tests * fix skorch tests * fix deepspeed * pin torch as compile tests don't pass and create segmentation fault * skip compile tests * fix * forgot v ... * style	2025-08-07 16:59:29 +02:00
Marc Sun	ff872f5f71	bump to 1.11.0dev0	2025-08-07 12:58:08 +02:00
Marc Sun	2941a6b0fb	remove (#3721 )	2025-08-07 12:48:11 +02:00
Joe Gaffney	c0a3aefea8	feature: CpuOffload pre_forward don't attempt to move if already on device (#3695 ) * feature: added optimisation to not attempt to move devices if allready on that the device. This is more noticiable in large step itterations on diffusion loops when the pre_froward can get called many times * fix: linting * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-08-06 19:46:13 +02:00
Marc Sun	42fdda1c1f	Remove `ParallelismConfig` from `PartialState` (#3720 ) * remove * style * fix * valueerror instead * add device_mesh	2025-08-06 19:00:26 +02:00
Benjamin Bossan	e23b004b30	TST Add test for FSDP ignored_modules as str (#3719 ) Follow up to #3698.	2025-08-06 18:05:54 +02:00
Matej Sirovatka	898cad39e8	Fix: tp size wouldn't read from env (#3716 )	2025-08-06 15:08:55 +02:00
Wing Lian	24c8157bba	Set parallelism_config in constructor due to Trainer reset of State (#3713 )	2025-08-06 13:47:49 +02:00
Matej Sirovatka	6891c57072	Feat: context parallel v2.0 (#3700 ) * Cleanup: context parallel * Feat: cleanup * Feat: concept guide * Fix: rename + version check * Style * Fix: add to namespace in a test * Fix: add skip_if on dataclass tests * Fix: proper version for version check * Feat: add tests and cleanup * Fix: properly version check added tests * Feat: address comments * Fix: add both shift_labels and labels to make the model.forward calculate loss * Fix: remove import, improve comment * Fix: final checks * Fix: style * Fix: style	2025-08-05 16:17:13 +02:00
Benjamin Bossan	24e48f3d20	ENH: Allow FSDP ignored modules to be regex (#3698 ) * ENH: Allow FSDP ignored modules to be regex Description For FSDP, there is an option to indicate ignored_modules, which should be a list of modules are ignored by FSDP. Even though this argument was supported in accelerate, it was not very usable: 1. Listing all modules can tricky, especially with something like PEFT, where the whole model is wrapped and thus the module structure changes. 2. When configuring this argument, accelerate takes a detour via environment variables. These can only be strings. Therefore, passing a list of modules is not feasible. Moreover, I noticed that the environment variable for ignored_modules was not even set, so configuring this argument didn't even work. Status This PR is lacking tests. I would be happy for pointers on how to add those. Context When using PEFT with LoRA and the target_parameters feature, I ran into an issue training such a model with FSDP. The only working fix I found was to ignore the layers targeted by LoRA. However, I could not configure accelerate to do that. With this PR, it is possible. I could successfully trained such a PEFT model that targets q_proj and v_proj by setting fsdp_ignored_modules: '.\.(q_proj$\|v_proj$)'. Fix type annotation * Fix failing test	2025-08-05 14:23:14 +02:00
jp	6640ff415c	Fix: Ensure environment variable values are case-insensitive in Accelerate (#3712 ) * Add: lower * apply ruff	2025-08-05 13:22:00 +02:00
Matej Sirovatka	c173b4fdd6	Fix: prepare works even if nothing except tp specified (rare) (#3707 )	2025-08-05 13:07:37 +02:00
WoosungMyung	cb343c63d7	Add Parallelism getter property to Accelerator class (#3703 ) * Add rank property to Accelerator class Signed-off-by: WoosungMyung <dntjd517@naver.com> * Raise errors when parallelism configuration is not enabled Signed-off-by: WoosungMyung <dntjd517@naver.com> * Fix: PR feedback Signed-off-by: WoosungMyung <dntjd517@naver.com> * Fix: style --------- Signed-off-by: WoosungMyung <dntjd517@naver.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>	2025-08-02 18:20:08 +02:00
salman	9359a0194f	Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) (#3682 ) * Feat: init * Feat: add validation + init from kwargs * Fix: minor fixes * Feat: more cleanup * Minor refactor * remove import * adding support for pre-configured device mesh * adding device mesh to fsdp2 * moving mesh dim defn to parralismconfig * tests * WIP device mesh/accelerator validation * WIP more tests * Test Driven Development (TDD) * fixing build_device_mesh * FSDP dim names * adding example * WIP * fixing HSDP * Feat: add back old options * working example * debugging * adding parallelism config to partialstate * Feat: revert ddp changes * Revert DDP * Feat: (untested) update mesh dims and some minor tweaks * adding dp_cp dims * updating comments * WIP * wip 2 * reverting * storing state in accelerator rather than acceleratorstate * Fix: minor tweaks * wip example update * Fixes for non-fsdp2 case * Feat: ensure ddp/tp only works * updating example * updating example * updating examples, fixing state * fixed state * comments * fixing partial state check * linting * comments * removing fn * WIP: fix tp * comments * removing return * reverting upcast * add guards * guards for empty self.parallelism_config * use len on tuple to check if empty * Feat: cleanup example * Feat: some cleanup of example * Feat: add trackio * Fix: improve trackio * Feat: TP works * Feat: some fsdp2 improv * Feat: working examples * handle clipping for tensor parallel * Implicit replicate * Refactor: move to separate file + cleanup + basic comments * Fix: add unadded files, fix circular import * Feat: better readme * Feat: add blog + ultrascale links * Tmp: should_save_model now returns only true * Fix: remove implicit_replication and style * Fix: remove optional * add guard on parallelism_config.tp_enabled * fix import * fixing empty parallelism_config * fix import path for test patch * fixing patch --------- Co-authored-by: S1ro1 <matej.sirovatka@gmail.com> Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com”> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-30 21:03:13 +02:00

1 2 3 4 5 ...

1863 Commits