accelerate

mirror of https://github.com/huggingface/accelerate.git synced 2025-10-21 02:33:46 +08:00

Author	SHA1	Message	Date
Marc Sun	bc2478a472	fix (#3808 )	2025-10-08 15:32:18 +02:00
Yuanyuan Chen	401075ffff	Add optional typing (#3769 ) * Fix typing Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> * Format code Signed-off-by: Yuanyuan Chen <cyyever@outlook.com> --------- Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>	2025-09-18 18:08:54 +02:00
Marc Sun	23cf4ef8a3	Fix tests (#3722 ) * fix tests * fix skorch tests * fix deepspeed * pin torch as compile tests don't pass and create segmentation fault * skip compile tests * fix * forgot v ... * style	2025-08-07 16:59:29 +02:00
Shaohon Chen	6597dae780	Integrate SwanLab for offline/online experiment tracking for Accelerate (#3605 ) * add support for SwanLabTracker and update related documentation * add emoji in FRAMWORK * apply the style corrections and quality control * add support for SwanLabTracker in tests * fix bug in test_tracking	2025-06-18 15:42:29 +02:00
Yao Matrix	5b1fcda371	enable test_cli & test_example cases on XPU (#3578 ) * enable test_cli & test_example cases on XPU Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * remove print Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix ci issue Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-05-20 12:04:24 +02:00
Ilyas Moutawwakil	d9e6af8773	HPU support (#3378 ) * init * style * is_hpu_available * fix * import habana_frameworks.torch.distributed.hccl * style * test * initialize dist proc group * revert * set backend to hccl only if hccl initialization sets a local rank * force backend hccl and multi_hpu type when sure of distributed launch * style * pass accelerator tests * pas big modeling tests with bigger atol/rtol for accelerators * fix hpu device count and skip tests requiring hpu:x * hpu autocast * hpu rng_state * hpu launch * hpu special device placement * hpu launch * rng state * distributed data loop tests * enforce non contiguity after device memory allocation * pass fsdp tests * enforce pt_hpu_lazy_mode=0 when fsdp testing * pass cli tests * pass and document grad sync tests * pass kwargs handler and autocast tests * memory utils * found source of int64 errors * skip some modeling utils tests * enable int64 * skip optimizer tests * pass checkpointing tests * pass accelerator tests with safetensors main * more hpu stuff * style * remove PT_HPU_LAZY_MODE and PT_ENABLE_INT64_SUPPORT as they should be in the testing environment * start testing on gaudi2 * support fp16 on gaudi2 * add testing order * custom hpu fsdp env dict * fix torch trace malloc * test ddp half precision comm hooks * fix * fix * remove lower bound for hpu * use 0.72 as lower bound * lower lower bound * order deepspeed tests * fix * deepspeed_use_hpu * assert non lazy mode with offloaded optimizer * make patching torch with habana frameworks the default * less of require_non_hpu * skip test_multi_device_merge_fsdp_weights for now as it halts * skip another flaky test * format * use habana_visible_modules * patch torch hpu device count * avoid setting HABANA_VISIBLE_MODULES * don't play with habana visible devices/modules * only with hpu * fixes and skips * skip * fix device ids and add some todos * skip offloading with generate() * fix * reduced atol/rtol for hpu * fix * tag deepspeed tests that should run first * enable a test path that was skipped * revert a test that was customized for gaudi1 * some patching to enable HABANA_VISIBLE_MODULES * fix zero3 test * misc * test DTensor TP * remove gaudi1 * test * style * comment * pass pad_across_processes * require_fp16 * pass memory utils test * test_ddp_comm_hook * skip half precision comm hooks on hpu * fix * is_fp16_available * fp16 * tp as part of integration tests * fix * write_basic_config * safetensors * local sgd and masked_fill_fwd_i64 * fix num_processes in test_load_states_by_steps * fp8 support * test * fix * add a workflow * Update src/accelerate/accelerator.py * review comments * ci * style * comments * test * habana_frameworks.torch * patch device count * fix * fix * require_fp8 * fix * fix * gaudi 1 * remove unnecessary * fixed maskd fill error in transformers * style * balanced_memory pass on hpu * remove for now * run first * Apply suggestions from code review * style after merge * Update src/accelerate/accelerator.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update src/accelerate/utils/transformer_engine.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * empty cache review comments * test_scirpt.py error messages * AccelerateTestCase for accelerator state cleanup * test * add gaudi1 workflow * fp8 avilability * fix * reduce batch size * concurrency * check cuda as well * nits and comments * mark fsdp tests that require_fp16 * style * mark deepspeed fp16 tests * update image * fix * updated * better msgs * skip pippy * test * test on 2 device * support up to 1% relative error in test_accelerate * skip hpu fp16 * allow for 1 byte differene * revert torch_device change * style * skip memory release since it's flaky * add accelerator state cleanup to fixture * fix * atol * fix * more rtol * equal grad test * revert * pass pippy on gaudi2 and skip on gaudi1 * enable sd 1.5 test with require fp16 * added warning on memory release * don't log warning in memory release as it requires PartialState to be initialized * Apply suggestions from code review --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2025-03-11 11:16:57 -04:00
Yoach Lacombe	acfbf72a7f	Give example on how to handle gradient accumulation with cross-entropy (#3193 ) * Add cross-entropy example in the gradient accumulation docs * add example of logs * correct skeleton code * replace gather_for_metrics with gather * batch_size -> per_device_batch_size * remove main_process_only=True * add autoregressive example in examples/ * Update docs/source/usage_guides/gradient_accumulation.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * ruff format * add grad accum test * update docs * Update examples/by_feature/gradient_accumulation_for_autoregressive_models.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * update tests --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-12-24 12:26:45 +01:00
Fanli Lin	4305033f80	add xpu skip (#3119 )	2024-09-18 19:13:16 +02:00
Zach Mueller	a768b2b753	No more t5 (#3107 )	2024-09-12 13:27:15 -04:00
Zach Mueller	8931e5e48c	Remove `skip_first_batches` support for StatefulDataloader and fix all the tests (#3068 ) * Pippy tests - good * Fix dataloader example tests * SD issue * Rm test * Docs * Rm from doc	2024-09-02 18:14:24 -04:00
Zach Mueller	726140cad2	Fixup dataloader state dict bugs + incorporate load/save_state API (#3034 ) * v1 * More testing, need to try on H100 * Bigger batch for h100 test * test tweak * Fixup all tests! * Bookmark * Fix issues, working now * rm num samples * Uncomment * Give stateful dl end of dl * Make skip DL stateful * Migrate to update_state_dict * try/finally * Add comments to test * rm comment * Document * refactor out for eventual override * Doc nit * Brute force it	2024-08-23 15:13:33 -04:00
Fanli Lin	eac206f063	make more cuda-only tests device-agnostic (#2876 ) * enable 3 cases * add ests * add 2 more * revert 1 back * revert 1 more * enable on xpu	2024-07-03 04:49:53 -04:00
YH	5d5d07abfc	Add Profiler Support for Performance Analysis (#2883 ) * Add torch profiler * Add example * Fix rank 0 saving * Add docstring * Add profile readme * Fix minor * Fix example path * Add exp test code * Rename profile dir * Change readme * Change save format * Minor * Enhance docstring example * Add user guide * Add memory profile guide * Enhance error msg * Fix type hinting * Minor refactor * Fix hf tag * Fix copyright year * Mv toctree * Fix image path * Fix license year * Change profiler pattern name * Update package reference * Add slow decorator * Check output value	2024-07-01 18:01:09 -04:00
YH	c0faec766c	Add DDP Communication Hooks (#2841 ) * Add ddp comm hook * Fix dataclass order * Merge ddp grad hook to ddp kwargs handler * Reset ddp kwargs key * Add test * Fix test case * Split ddp grad test * Fix test case * Ehance docstring * Minor * Use naive baseenum for ddp comm hook type * Add by feature example * Add multi device deco * Add user guide * Update examples/by_feature/ddp_comm_hook.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update examples/by_feature/ddp_comm_hook.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Add wrapper and state option details * Update toctree * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Mv ddp comm hook index * Fix ddp comm hook user guid * Del empty line --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-06-13 10:34:20 -04:00
Marc Sun	83317b3081	add distributed examples (#2672 ) * add distributed examples * typo * uncomment * require multigpu * add stable diffusion example * style * add copyright * style * remove tqdm * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * add comments * remove print * More comments --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-04-25 11:13:56 +02:00
Zach Mueller	6d92198ef4	Schedule free optimizer support (#2631 ) * Schedule free optimizer supporT * Fin * Doc * Add in eval * Add to exclude * Fix module issue	2024-04-08 11:28:27 -04:00
Marc Sun	d7bcd85d4d	fix llama example for pippy (#2616 ) * fix llama example * remove llama from tests	2024-04-03 08:22:16 -04:00
Zach Mueller	21b225e8d5	Check if hub down (#2506 ) * Let's try it out * Let's try this out * Some more cases * String * Require hub online for estimator * Add CI checker to alert on hub status * Format * Oops death by ctrl z * Fix import	2024-02-28 18:56:37 -05:00
Zach Mueller	9c071103f0	Remove all cases of torchrun in tests and centralize as `accelerate launch` (#2498 ) * Migrate torchrun to a full helper for tests * keep old namings * Metrics too * Fix examples * Bronked tests * Refactor * No need for setup	2024-02-27 13:09:05 -05:00
Zach Mueller	1127e670ca	Fix CI tests due to pathlib issues (#2491 ) * Fix tests * Fixup tests * Fix test * Actually cast to string! * Fixup deepspeed * fsdp and deepspeed fix * Since we're doing this, may as well get it all * Stragglers * Split only if we require config_file * Make list * Only convert if it's a path * type * Other func * rm parenth	2024-02-27 10:39:31 -05:00
Aarni Koskela	70cdf5fe52	Make test assertions more idiomatic (#2420 ) * Codemod `unittest` assertions into native assertions With https://github.com/akx/codemod-unittest-to-pytest-asserts * Use plain asserts instead of `assertDict` and `assertList` Done with ``` ast-grep run --pattern 'self.assertDictEqual($A, $B)' --rewrite 'assert $A == $B' -l python -i ast-grep run --pattern 'self.assertListEqual($A, $B)' --rewrite 'assert $A == $B' -l python -i `` * DRY some Deepspeed tests	2024-02-13 14:23:18 -05:00
Zach Mueller	c3aec59b12	Migrate pippy examples over and run tests (#2424 ) * Migrate examples over * Finish updating doc * torchpippy * Readme review nits * Mention gather op in examples	2024-02-09 10:01:56 -05:00
Dave Berenbaum	99877f56d6	Adds dvclive tracker (#2139 ) * dvclive tracker * add dvclive to test_trackers * fix dvclive tests * add dvclive example and respond to other feedback * fix dvclive tests * fix quality	2023-11-17 08:49:13 -05:00
Zach Mueller	40a73e0ae0	Introduce breakpoint API (#1940 ) * early stopping * Fix tests * Works on multi-gpu, uncomment * Rm reset * Check for >=1 * equal * Trigger * Fix test * Update docs/source/concept_guides/deferring_execution.md Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * Explicit example loop * Set to zero, not None * rename test * Check again to ensure it's been reset --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>	2023-09-13 12:42:38 -04:00
Zach Mueller	d0a7991b65	Fix nightly tests (#1696 ) * Debug start * Fix * Workflow	2023-07-11 08:36:23 -04:00
Zachary Mueller	62357f218f	Apply deprecations (#1537 ) * MPS * Update examples * Fix env var * device type * Fix test	2023-06-06 13:04:45 -04:00
Zachary Mueller	70920895e8	Fix skip first batch being perminant (#1466 ) * Better version of fix * Failing diff test * Special str	2023-05-22 14:18:16 -04:00
Leonid Boytsov	da39665055	Adding support for local SGD. (#1378 ) * Adding support for local SGD. * Update src/accelerate/local_sgd.py Co-authored-by: Zachary Mueller <muellerzr@gmail.com> * Update src/accelerate/local_sgd.py Co-authored-by: Zachary Mueller <muellerzr@gmail.com> * Update src/accelerate/local_sgd.py Co-authored-by: Zachary Mueller <muellerzr@gmail.com> * fixing reduction + adding a test. * style fix. * Update docs/source/usage_guides/local_sgd.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/accelerate/local_sgd.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update examples/by_feature/local_sgd.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Zachary Mueller <muellerzr@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2023-05-09 10:52:03 -04:00
Sourab Mangrulkar	e3ebf176b8	Megatron-LM integration (#667 ) * Megatron-LM integration * add code and resolve comment Co-Authored-By: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add code * add code * fix many 🐛 * add code * add code and reverting tracker processes * updating logging utilities, fixing Pipeline Parallelism and dataset/dataloader 🐛 s 1. Fixing bugs related to Pipeline Parallelism 2. Fixing bugs related to dataloaders/datasets. 3. Fixing logging utilities so that all logging and tracking happens on last process when using Megatron. * addressing comments * resolving comments * update code * refactoring and adding code to support custom implementation of`AbstractTrainStep` class * minor change * Many fixes for supporting custom TrainStep and Megatron Indexed Datasets * Add code, 🐛 fixes and a initial doc file with headings * fixing a big 🐛 related to loading checkpoints * adding doc and an example * example test CI * docs * more docs * more doc changes * more doc changes * docs * more docs * doc fixing * trying if we can directly import megatronlm utils * doc fixing and throwing error if megatron isn't available. * resolving comments * fixes to bert and t5 and more docs Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-13 00:34:08 +05:30
Zachary Mueller	5fff81bac8	Auto grad accum example (#742 ) * Auto grad accum example * Include auto grad accum to exlcusion list * Typo fix calculate -> calculate Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-10-05 11:42:08 -04:00
Zachary Mueller	015f228c5e	Fix tests, add wandb to gitignore (#573 ) * Fix tests, add wandb to gitignore * Clean	2022-07-26 16:08:35 -04:00
Zachary Mueller	1486fa35b1	Fix step (#572 )	2022-07-26 12:29:05 -04:00
Zachary Mueller	7a49418e51	Speed up main CI (#571 ) * Speed up ci by reducing training epochs	2022-07-26 11:35:18 -04:00
Zachary Mueller	7abc708be2	Fixup all example CI tests and properly fail (#517 ) * Clean and make all tests pass	2022-07-15 18:15:45 +02:00
Zachary Mueller	450d51ce01	Create Gradient Accumulation Example (#431 ) * Gradient accumulation example	2022-06-08 14:46:04 -04:00
Sourab Mangrulkar	1703b79a79	DeepSpeed Revamp (#405 ) * deepspeed revamp * Update dataclasses.py * Update deepspeed.py * quality * fixing code * quality * FIx imports * saving 16bit model in zero stage 3 1. Saving 16bit model in zero stage 3 2. zero init in stage 3 support using HFDeepSpeedConfig * quality * adding test and fixing bugs * update makefile for deepspeed tests * Update test.yml * adding `deepspeed` as requirement for tests * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * quality * addressing comments * add example and minor updates 1. Add example to show the usage of config file with revamped deepspeed support. 2. update required deepspeed version to 0.6.5 2. reverting `reinit` change as it is not required, 3. raising Exception when using `clip_grad_value` with DeepSpeed/FSDP. * Documentation and Zero-3 Inference Support 1. Changes to support ZeRo Stage-3 Inference support. 2. minor bug fixes. 3. Documentation. * doc fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * addressing comments * update doc to address comments and bug fixes 1. update tests and add new one testing autofill functionality of `prepare` method. 2. fix bug related to zero-3 init related to HFDeepSpeedConfig 3. Update documentation addressing comments. * removing image and hosting it on `documentation-images` dataset * check for hidden_size for zero_opt heurisitics Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-06-07 00:52:18 +05:30
Zachary Mueller	d176b552c9	Introduce nightly runners (#410 ) * Introduce nightly builds * Fixup docker images slightly * Make device-count specific test use `torch.cuda.device_count()` rather than `Accelerator.num_processes` to avoid bug.	2022-05-31 14:14:02 -04:00
Zachary Mueller	a91575f1bb	Fix CUDA examples tests (#407 ) * Fix CUDA tests * Use num_processes to keep everything under one test	2022-05-31 09:51:21 -04:00
Sourab Mangrulkar	d1f7f99684	improve metrics logged in examples (#399 )	2022-05-26 17:29:49 +05:30
Zachary Mueller	23c0341262	Refactor tests to use accelerate launch (#373 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-05-19 11:48:12 -04:00
Sourab Mangrulkar	4736c754bf	fix tracking (#361 ) * fixing trackers * quality * bug fix * bug fix * addressing comments and fixing tests * Fixing script diff test	2022-05-13 17:20:27 +05:30
Sourab Mangrulkar	a74c7c9538	Create peak_memory_uasge_tracker.py (#336 ) * Create peak_memory_uasge_tracker.py Adding the example by feature for tracking peak memory usage of GPU. One example of usage is to track the peak memory reduction when using FSDP. * fixing the typo in the file name * reformatting * exclude peak_memory_usage_tracker.py from tests * renaming and highlighting proper usage * Update test_examples.py 😅	2022-04-29 22:38:34 +05:30
Zachary Mueller	b028a1981d	Add a memory-aware decorator for CUDA OOM avoidance (#324 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-04-26 10:43:06 -04:00
Zachary Mueller	3e14dd16be	Fixup all checkpointing examples (#323 )	2022-04-21 14:25:10 -04:00
Zachary Mueller	fa476d03ce	Update examples to show how to deal with extra validation copies (#319 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-04-20 14:02:58 -04:00
Zachary Mueller	2d7fbbdc73	Create Cross-Validation example (#317 )	2022-04-19 16:14:07 -04:00
Zachary Mueller	209db19dc8	Create a testing framework for example scripts and fix current ones (#313 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-04-13 13:24:36 -04:00

47 Commits