accelerate

mirror of https://github.com/huggingface/accelerate.git synced 2025-10-20 18:13:46 +08:00

Author	SHA1	Message	Date
S1ro1	13a00e2877	Lock datasets	2025-07-10 11:56:37 +00:00
S1ro1	b1498c7c5c	Revert "Bunch of FSDP improvements (#3671 )" This reverts commit d6c986c3f2dd8417c8689967a2d139576d617925.	2025-07-10 11:38:40 +00:00
Matej Sirovatka	d6c986c3f2	Bunch of FSDP improvements (#3671 ) * Feat: split tests * Feat: finito * Fix * Final, tests pass	2025-07-09 16:05:22 +02:00
Yao Matrix	1ac8643df7	xpu enablement on left cases (#3654 ) * 1. enable xpu for launcher 2. expand cuda only ds uts to xpu 3. expand profiler example to xpu Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * rename Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update profiler.py * Apply style fixes --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-07 18:10:53 +02:00
Matej Sirovatka	07ce74868c	Fix: properly error when DDP + Dtensor model (#3629 ) * Feat: add check * Refactor: nits	2025-06-27 01:33:45 +02:00
Liu Xu	175fe91589	Added a check in the no_sync() function to avoid errors when using deepspeed zero2/3. (#3656 )	2025-06-26 14:39:04 +02:00
Steven Shimizu	fe16ce8bce	Fix fsdp2 example (#3657 )	2025-06-26 14:08:51 +02:00
kilavvy	5987d79a53	Update gradient_accumulation.md (#3649 )	2025-06-23 11:58:31 +02:00
Marc Sun	31af8d4e8e	shards (#3645 )	2025-06-20 11:24:20 +02:00
Ilyas Moutawwakil	b7493a82b1	Add support for e5e2 and default to hybrid when launcher is used (#3640 ) * add support for e5e2 and defaumt to hybrid when launcher is used * style	2025-06-20 11:11:32 +02:00
Marc Sun	a16d2bb3c1	bump to v1.9.0dev	2025-06-19 15:13:41 +02:00
Marc Sun	cac22ed980	fix grad acc deepspeed (#3638 ) * fix grad acc deepspeed * style	2025-06-19 12:06:21 +02:00
Matej Sirovatka	be826a6b7b	Fix: correct labels (#3637 )	2025-06-19 11:01:56 +02:00
Matej Sirovatka	5939640829	Feat: add cpu offload (#3636 )	2025-06-18 18:13:45 +02:00
Kashif Rasul	7f9c8cbe34	[DeepSpeed] sync gradient accum steps from deepspeed plugin (#3632 ) * sync steps * add a debug log when overriding * make grad accum always consistent * remove debug	2025-06-18 16:45:57 +02:00
Marc Sun	9888c7ed23	feat: use datasets.IterableDataset shard if possible (#3635 ) * feat: use datasets.IterableDataset shard if possible. When `accelerator.prepare` is called on a `datasets.IterableDataset`, use the `shard` method to split the dataset across the available processes. This allows for more efficient data loading and processing. Without load and slice overhead of `IterableDatasetShard` * dataset * remove unused import * style --------- Co-authored-by: wuwenxu.01 <wuwenxu.01@bytedance.com>	2025-06-18 16:45:17 +02:00
leopardracer	42a68c30dc	Fix Typos in Documentation and Comments (#3621 ) * Update state.py * Update tracking.py	2025-06-18 15:53:02 +02:00
Shaohon Chen	6597dae780	Integrate SwanLab for offline/online experiment tracking for Accelerate (#3605 ) * add support for SwanLabTracker and update related documentation * add emoji in FRAMWORK * apply the style corrections and quality control * add support for SwanLabTracker in tests * fix bug in test_tracking	2025-06-18 15:42:29 +02:00
Ilyas Moutawwakil	8878d93745	remove hardcoded cuda from fsdpv2 (#3631 )	2025-06-17 14:32:10 +02:00
Yao Matrix	2eaf5cdbbc	remove ipex.optimize in accelerate (#3608 ) * remove ipex.optimize in accelerate Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix mis-style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update intel_cpu.md * Update launch.py * fix comments Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * add logging Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update launch.py * Apply style fixes --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-17 11:08:19 +02:00
Kashif Rasul	23c1d8db89	[Deepspeed] deepspeed auto grad accum (#3630 ) * deepspeed auto grad accum * add tests for grad accum * use tiny-random-gpt2 * Update tests/deepspeed/test_deepspeed_gradient_accumulation.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix redundant code * set_gradient_accumulation_boundary is always there * remove unused helper * no need for this * full revert * Apply style fixes * get_global_grad_norm is always there --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-16 16:28:24 +02:00
Yao Matrix	0af621bbec	add xpu support in TorchTensorParallelPlugin (#3627 ) * add xpu support in TorchTensorParallelPlugin Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix typo Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-06-13 17:45:51 +02:00
Marc Sun	bee04f1b01	Add fp8_e5m2 support in `dtype_byte_size` (#3625 ) * float8_e5m2 device_map * remove prints	2025-06-12 16:27:32 +02:00
jiqing-feng	8a953f08c6	fix xpu 8bit value loading (#3623 ) Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-06-12 14:55:14 +02:00
Ilyas Moutawwakil	3518c03584	small fix (#3619 )	2025-06-11 14:02:45 +02:00
Simon	2f8fd72e51	Remove device_count (#3587 )	2025-06-10 14:50:34 +02:00
Matej Sirovatka	d2e6b0313d	[FSDP2] Refactor + FP8 (#3585 ) * Fix double wrap * Clocking off, ~equal to torch baseline * works? * Working version * Partial rewrite * FSDP2 path works * Fix back prepare * Almost done, proper AC left * Feat: should work, cleanup + test more benchmarks left * Style+quality * Feat: fp8 example * Feat: better example * Feat: add readme * Docs + should be done * Fix: typos * Fix: protect imports * Feat: address comments * Feat: add flops image	2025-06-10 14:26:48 +02:00
Ilyas Moutawwakil	b9fee48c85	better handle FP8 with and without deepspeed (#3611 ) * use the state mixed precision which has undergone all preprocessing * Update src/accelerate/accelerator.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/accelerate/accelerator.py * accelerator state sets the mixed precision for deepspeed and fp8_enabled * fix * fix --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-06-10 14:24:43 +02:00
Marc Sun	3a82b056cf	Fix bf16 training with TP (#3610 ) * fix * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-06-10 11:29:59 +02:00
Ilyas Moutawwakil	6b61a373a2	fix deepspeed regional compilation (#3609 )	2025-06-06 14:48:43 +02:00
Ilyas Moutawwakil	682691deac	Update Gaudi Runners (#3593 ) * test * fix * push * in the morning * fix backend * run first * set habana modules * dynamo backend * trigger * remove on pr * remove on file change	2025-06-03 12:36:56 +02:00
Matej Sirovatka	791055b484	Fix: list object has no attribute keys (#3603 )	2025-06-03 12:24:20 +02:00
Yao Matrix	16bf1d8901	enable torchao and pippy test cases on XPU (#3599 ) * enable torchao and pippy test cases on XPU Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com>	2025-05-30 17:36:34 +02:00
Yao Matrix	ab3c604e48	enable big_model_inference on xpu (#3595 ) * enable big_model_inference on XPU Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix style Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix quality Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com>	2025-05-30 17:23:26 +02:00
Yao Matrix	273799c85d	enable fsdp2 benchmark on XPU (#3590 ) * enable fsdp2 benchmark on XPU Signed-off-by: Matrix YAO <matrix.yao@intel.com> * add deterministic Signed-off-by: Matrix YAO <matrix.yao@intel.com> --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com>	2025-05-27 14:08:59 +02:00
Yao Matrix	43526c5c08	add device-agnostic GradScaler (#3588 ) * add device-agnostic GradScaler Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix bug Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix review comments Signed-off-by: Matrix YAO <matrix.yao@intel.com> * fix Signed-off-by: Matrix YAO <matrix.yao@intel.com> * format Signed-off-by: Matrix YAO <matrix.yao@intel.com> * Apply style fixes --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-05-27 11:44:50 +02:00
Yao Matrix	07f2392f40	change to use torch.device (#3594 ) Signed-off-by: Matrix YAO <matrix.yao@intel.com>	2025-05-27 11:17:18 +02:00
Fanli Lin	ee2f48c2c3	[docs] no hard-coded cuda in the ddp documentation (#3589 ) * make device-agnostic * refactor	2025-05-27 11:16:42 +02:00
jiqing-feng	4f3abb73a7	Set ccl and KMP param in simple launch (#3575 ) * Even 1 CPU mechine can also run multi process Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix ccl and kml param setting Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * set master addr only when processes > 1 Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix num process check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix ccl args check Signed-off-by: jiqing-feng <jiqing.feng@intel.com> --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>	2025-05-26 15:55:10 +02:00
Yuanzhou Cai	db536cbfeb	Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup (#3581 ) * Fix tracker initialize distributed before InitProcessGroupKwargs * Fix tracker initialize distributed before InitProcessGroupKwargs * Add test for bug #3550 * Improve test for #3550 * Remove redundant code Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * fix style --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2025-05-26 15:08:13 +02:00
Yao Matrix	4e9d0deba6	enable regional_compilation benchmark on xpu (#3592 ) * enable regional_compilation benchmark on xpu Signed-off-by: Matrix YAO <matrix.yao@intel.com> * Apply style fixes --------- Signed-off-by: Matrix YAO <matrix.yao@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-05-26 15:05:42 +02:00
Luiz F. G. dos Santos	8cb3ace894	Add kwargs to optimizer, scheduler and dataloader using function `accelerator().load_state()` (#3540 ) * Added artifacts and figure tracking at MLFlow tracker * Added `log_artifact` to the MLFlowTracker * Remove changes * Added kwargs when loading state. * added doc string * Adjusted correct default types of kwargs * Changed the load kwargs to a single one * removed None value from kwargs * fix kwargs for loading the model * removed load_kwargs from optimizer state dict * make load_kwargs a dictionary * revert last changes * reverted load_kwargs * fix docstring * added dict initiation * Fix quality error during PR	2025-05-22 17:21:54 +02:00
Emmanuel Ferdman	b6d97cb856	Resolve logger warnings (#3582 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-22 16:26:31 +02:00
Francesco Laiti	33967d4733	Add support for standalone mode when default port is occupied on single node (#3576 ) * add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection * address review feedback: warn on port conflict only for single-node; raise error for multi-node * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-05-20 12:29:53 +02:00
Yao Matrix	5b1fcda371	enable test_cli & test_example cases on XPU (#3578 ) * enable test_cli & test_example cases on XPU Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * remove print Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix ci issue Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-05-20 12:04:24 +02:00
Yao Matrix	f55f0533b5	goodbye torch_ccl (#3580 ) Signed-off-by: Matrix Yao <matrix.yao@intel.com>	2025-05-20 12:02:14 +02:00
Yao Matrix	1ec99f0b58	enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU (#3579 ) * enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * Update test_load_checkpoint_and_dispatch_with_broadcast.py --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com>	2025-05-19 11:27:40 +02:00
Marc Sun	417bc52965	bump to v1.8.0dev	2025-05-15 12:02:44 +02:00
Yao Matrix	97c93c4809	enable test_dispatch_model_tied_weights_memory_with_nested_offload_cpu on xpu (#3569 ) * enable test_dispatch_model_tied_weights_memory_with_nested_offload_cpu case on XPU Signed-off-by: Matrix Yao <matrix.yao@intel.com> * replace hard-coded torch.cuda w/ device-dependent callings Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * use device agnostic clear_device_cache Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com>	2025-05-15 11:40:55 +02:00
Marc Sun	cd37bbb629	set backend correctly for CUDA+FSDP2+cpu-offload (#3574 ) * set backend correctly for CUDA+FSDP2+cpu-offload * offload * format --------- Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-05-15 11:38:53 +02:00

1 2 3 4 5 ...

1803 Commits