accelerate

mirror of https://github.com/huggingface/accelerate.git synced 2025-10-20 10:03:46 +08:00

Author	SHA1	Message	Date
Sergio Paniego Blanco	e79f383625	Added Tip for better rendering (#3781 )	2025-09-15 16:22:56 +02:00
Yao Matrix	40ebb4bea3	make torch_native_parallelism examples device agnostic (#3759 ) * make torch_native_parallelism examples device agnostic Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Style + deprecation warning --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: S1ro1 <matej.sirovatka@gmail.com>	2025-09-08 12:16:56 +02:00
Sergio Paniego Blanco	62ede1ed2a	CP docs typos fixed (#3761 )	2025-09-05 12:23:33 +02:00
Yao Matrix	8b55e62b2c	xpu INT64 all_gather issue fixed in 2.9 (#3756 ) * xpu gather issue fixed in 2.9 and validated config_yamls on XPU Signed-off-by: YAO Matrix <matrix.yao@intel.com> * xxx Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-09-03 10:56:14 +02:00
Matej Sirovatka	7b981788ca	[ND Parallel] Update examples, cleanup (#3737 ) * Fix: update cp example * Feat: add rename examples * WIP: Cleanup with_trainer * Feat: more cleanup * Feat: more refactor + better readme + more configs * Fin	2025-08-26 14:41:14 +02:00
Matej Sirovatka	6891c57072	Feat: context parallel v2.0 (#3700 ) * Cleanup: context parallel * Feat: cleanup * Feat: concept guide * Fix: rename + version check * Style * Fix: add to namespace in a test * Fix: add skip_if on dataclass tests * Fix: proper version for version check * Feat: add tests and cleanup * Fix: properly version check added tests * Feat: address comments * Fix: add both shift_labels and labels to make the model.forward calculate loss * Fix: remove import, improve comment * Fix: final checks * Fix: style * Fix: style	2025-08-05 16:17:13 +02:00
salman	9359a0194f	Parallelism config + TP + HSDP + BYODM (Bring Your Own Device Mesh) (#3682 ) * Feat: init * Feat: add validation + init from kwargs * Fix: minor fixes * Feat: more cleanup * Minor refactor * remove import * adding support for pre-configured device mesh * adding device mesh to fsdp2 * moving mesh dim defn to parralismconfig * tests * WIP device mesh/accelerator validation * WIP more tests * Test Driven Development (TDD) * fixing build_device_mesh * FSDP dim names * adding example * WIP * fixing HSDP * Feat: add back old options * working example * debugging * adding parallelism config to partialstate * Feat: revert ddp changes * Revert DDP * Feat: (untested) update mesh dims and some minor tweaks * adding dp_cp dims * updating comments * WIP * wip 2 * reverting * storing state in accelerator rather than acceleratorstate * Fix: minor tweaks * wip example update * Fixes for non-fsdp2 case * Feat: ensure ddp/tp only works * updating example * updating example * updating examples, fixing state * fixed state * comments * fixing partial state check * linting * comments * removing fn * WIP: fix tp * comments * removing return * reverting upcast * add guards * guards for empty self.parallelism_config * use len on tuple to check if empty * Feat: cleanup example * Feat: some cleanup of example * Feat: add trackio * Fix: improve trackio * Feat: TP works * Feat: some fsdp2 improv * Feat: working examples * handle clipping for tensor parallel * Implicit replicate * Refactor: move to separate file + cleanup + basic comments * Fix: add unadded files, fix circular import * Feat: better readme * Feat: add blog + ultrascale links * Tmp: should_save_model now returns only true * Fix: remove implicit_replication and style * Fix: remove optional * add guard on parallelism_config.tp_enabled * fix import * fixing empty parallelism_config * fix import path for test patch * fixing patch --------- Co-authored-by: S1ro1 <matej.sirovatka@gmail.com> Co-authored-by: Salman Mohammadi <“salman.mohammadi@outlook.com”> Co-authored-by: Wing Lian <wing@axolotl.ai>	2025-07-30 21:03:13 +02:00
Peter St. John	847ae58c74	Fix FP8 tests, enable FP8 to be used without direct `Accelerator()` configuring (#3677 ) * single-gpu tests passing * install deepspeed in fp8 container * revert mixed_precision check	2025-07-15 15:20:57 +02:00
Marc Sun	6e104f31de	unpin datasets (#3681 )	2025-07-15 15:00:35 +02:00
Yao Matrix	1ac8643df7	xpu enablement on left cases (#3654 ) * 1. enable xpu for launcher 2. expand cuda only ds uts to xpu 3. expand profiler example to xpu Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix style Signed-off-by: YAO Matrix <matrix.yao@intel.com> * rename Signed-off-by: YAO Matrix <matrix.yao@intel.com> * Update profiler.py * Apply style fixes --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-07 18:10:53 +02:00
Steven Shimizu	fe16ce8bce	Fix fsdp2 example (#3657 )	2025-06-26 14:08:51 +02:00
Matej Sirovatka	be826a6b7b	Fix: correct labels (#3637 )	2025-06-19 11:01:56 +02:00
Shaohon Chen	6597dae780	Integrate SwanLab for offline/online experiment tracking for Accelerate (#3605 ) * add support for SwanLabTracker and update related documentation * add emoji in FRAMWORK * apply the style corrections and quality control * add support for SwanLabTracker in tests * fix bug in test_tracking	2025-06-18 15:42:29 +02:00
Matej Sirovatka	d2e6b0313d	[FSDP2] Refactor + FP8 (#3585 ) * Fix double wrap * Clocking off, ~equal to torch baseline * works? * Working version * Partial rewrite * FSDP2 path works * Fix back prepare * Almost done, proper AC left * Feat: should work, cleanup + test more benchmarks left * Style+quality * Feat: fp8 example * Feat: better example * Feat: add readme * Docs + should be done * Fix: typos * Fix: protect imports * Feat: address comments * Feat: add flops image	2025-06-10 14:26:48 +02:00
Yao Matrix	5b1fcda371	enable test_cli & test_example cases on XPU (#3578 ) * enable test_cli & test_example cases on XPU Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix style Signed-off-by: Matrix Yao <matrix.yao@intel.com> * remove print Signed-off-by: Matrix Yao <matrix.yao@intel.com> * fix ci issue Signed-off-by: YAO Matrix <matrix.yao@intel.com> --------- Signed-off-by: Matrix Yao <matrix.yao@intel.com> Signed-off-by: YAO Matrix <matrix.yao@intel.com>	2025-05-20 12:04:24 +02:00
omahs	7013365791	fix typos (#3549 )	2025-05-08 14:10:12 +02:00
cyyever	3169339f5b	Bump ruff to 0.11.2 (#3471 ) * ruff format * Bump ruff to 0.11.2	2025-04-01 11:57:06 +02:00
cyyever	a0edc8dcf2	Apply ruff py39 fixes (#3461 ) * Apply ruff py39 fixes * Ruff format	2025-03-31 19:10:08 +02:00
Jacob Berg	ffb27138f7	Changed --config arg to --config_file in the slurm multinode fsdp example. (#3447 )	2025-03-20 10:14:18 -04:00
Ilyas Moutawwakil	d9e6af8773	HPU support (#3378 ) * init * style * is_hpu_available * fix * import habana_frameworks.torch.distributed.hccl * style * test * initialize dist proc group * revert * set backend to hccl only if hccl initialization sets a local rank * force backend hccl and multi_hpu type when sure of distributed launch * style * pass accelerator tests * pas big modeling tests with bigger atol/rtol for accelerators * fix hpu device count and skip tests requiring hpu:x * hpu autocast * hpu rng_state * hpu launch * hpu special device placement * hpu launch * rng state * distributed data loop tests * enforce non contiguity after device memory allocation * pass fsdp tests * enforce pt_hpu_lazy_mode=0 when fsdp testing * pass cli tests * pass and document grad sync tests * pass kwargs handler and autocast tests * memory utils * found source of int64 errors * skip some modeling utils tests * enable int64 * skip optimizer tests * pass checkpointing tests * pass accelerator tests with safetensors main * more hpu stuff * style * remove PT_HPU_LAZY_MODE and PT_ENABLE_INT64_SUPPORT as they should be in the testing environment * start testing on gaudi2 * support fp16 on gaudi2 * add testing order * custom hpu fsdp env dict * fix torch trace malloc * test ddp half precision comm hooks * fix * fix * remove lower bound for hpu * use 0.72 as lower bound * lower lower bound * order deepspeed tests * fix * deepspeed_use_hpu * assert non lazy mode with offloaded optimizer * make patching torch with habana frameworks the default * less of require_non_hpu * skip test_multi_device_merge_fsdp_weights for now as it halts * skip another flaky test * format * use habana_visible_modules * patch torch hpu device count * avoid setting HABANA_VISIBLE_MODULES * don't play with habana visible devices/modules * only with hpu * fixes and skips * skip * fix device ids and add some todos * skip offloading with generate() * fix * reduced atol/rtol for hpu * fix * tag deepspeed tests that should run first * enable a test path that was skipped * revert a test that was customized for gaudi1 * some patching to enable HABANA_VISIBLE_MODULES * fix zero3 test * misc * test DTensor TP * remove gaudi1 * test * style * comment * pass pad_across_processes * require_fp16 * pass memory utils test * test_ddp_comm_hook * skip half precision comm hooks on hpu * fix * is_fp16_available * fp16 * tp as part of integration tests * fix * write_basic_config * safetensors * local sgd and masked_fill_fwd_i64 * fix num_processes in test_load_states_by_steps * fp8 support * test * fix * add a workflow * Update src/accelerate/accelerator.py * review comments * ci * style * comments * test * habana_frameworks.torch * patch device count * fix * fix * require_fp8 * fix * fix * gaudi 1 * remove unnecessary * fixed maskd fill error in transformers * style * balanced_memory pass on hpu * remove for now * run first * Apply suggestions from code review * style after merge * Update src/accelerate/accelerator.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update src/accelerate/utils/transformer_engine.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * empty cache review comments * test_scirpt.py error messages * AccelerateTestCase for accelerator state cleanup * test * add gaudi1 workflow * fp8 avilability * fix * reduce batch size * concurrency * check cuda as well * nits and comments * mark fsdp tests that require_fp16 * style * mark deepspeed fp16 tests * update image * fix * updated * better msgs * skip pippy * test * test on 2 device * support up to 1% relative error in test_accelerate * skip hpu fp16 * allow for 1 byte differene * revert torch_device change * style * skip memory release since it's flaky * add accelerator state cleanup to fixture * fix * atol * fix * more rtol * equal grad test * revert * pass pippy on gaudi2 and skip on gaudi1 * enable sd 1.5 test with require fp16 * added warning on memory release * don't log warning in memory release as it requires PartialState to be initialized * Apply suggestions from code review --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2025-03-11 11:16:57 -04:00
Vladislav Bronzov	b271eb1365	add distributed example for llava next video (#3417 )	2025-03-11 11:07:46 -04:00
Fanli Lin	24f8d0276c	[examples] upgrade code for seed setting (#3387 ) * replace set_seed * update import	2025-02-11 16:31:41 +01:00
XiaobingZhang	ce63623421	works for fp8 with deepspeed (#3361 ) * works for fp8 with deepspeed * Add tests --------- Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>	2025-02-10 09:31:15 -05:00
Yoach Lacombe	acfbf72a7f	Give example on how to handle gradient accumulation with cross-entropy (#3193 ) * Add cross-entropy example in the gradient accumulation docs * add example of logs * correct skeleton code * replace gather_for_metrics with gather * batch_size -> per_device_batch_size * remove main_process_only=True * add autoregressive example in examples/ * Update docs/source/usage_guides/gradient_accumulation.md Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * ruff format * add grad accum test * update docs * Update examples/by_feature/gradient_accumulation_for_autoregressive_models.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * update tests --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-12-24 12:26:45 +01:00
Cheetah	11818e657b	Fix: Resolve #3257 (#3261 )	2024-12-02 13:41:00 -05:00
cell-dame	f1f2b4d1a8	Adding multi gpu speech generation (#3149 ) * skeleton code * fix some errors for downloading the model * fix some tqdm error * fix some error * fix some gpu errors with torch * fix some gpu errors with torch * testing simple way * testing simple way * testing simple way * testing simple way * actual code * actual code * final testing with serialization * add multi_gpu speech generation * fix some comments * fix some style and quality	2024-10-10 12:40:15 -04:00
hlky	f4ee5a2dc7	Florence2 distributed inference example (#3123 ) * Florence2 distributed inference example * optimized * Documentation	2024-10-09 05:49:05 -04:00
Benjamin Bossan	3fd02e60dc	MAINT: Upgrade ruff to v0.6.4 (#3095 ) * MNT Upgrade ruff to 0.6.4 Currently used version, 0.2.1, is quite old at this point. Not a lot needed to be changed: - Change ruff version in setup.py - Remove deprecated ignore-init-module-imports option for ruff - Type comparison should use is and not == - Use f-string instead of % formatting - Some line wrapping and empty lines * Oops	2024-09-10 10:43:37 -04:00
Sayak Paul	ed9a574564	Update README.md to include distributed image generation gist (#3077 ) * Update README.md to include distributed image generation gist * add script	2024-09-10 10:42:35 -04:00
Zach Mueller	939ce400cb	Update torchpippy (#2938 ) * rm warning * Take 3 * Take 4 * Annotate * Take 6 * Updated * Spec * Last fix * Don't padd input * Finished * Continue refactor * Rm comment * Adjust the err * Start adjustment * GPT2 works, T5 does not * llama too now I think * Flag the t5 example	2024-08-26 14:21:13 -04:00
Zach Mueller	654e1d9984	Add a SLURM example with minimal config (#2950 ) * Add an example with minimal config * Improve * Even more minimal * Rm slurm arg * Update examples/slurm/submit_multinode_fsdp.sh Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>	2024-08-26 10:38:10 -04:00
Zach Mueller	726140cad2	Fixup dataloader state dict bugs + incorporate load/save_state API (#3034 ) * v1 * More testing, need to try on H100 * Bigger batch for h100 test * test tweak * Fixup all tests! * Bookmark * Fix issues, working now * rm num samples * Uncomment * Give stateful dl end of dl * Make skip DL stateful * Migrate to update_state_dict * try/finally * Add comments to test * rm comment * Document * refactor out for eventual override * Doc nit * Brute force it	2024-08-23 15:13:33 -04:00
Zach Mueller	1a6af0bd6d	Improve config handling and add a zoo (#3029 ) * Improve config handling and add a zoo * Docs * rm comment * Tweak doc	2024-08-20 10:40:21 -04:00
Zach Mueller	52fae0960c	Add end_training/destroy_pg to everything and unpin numpy (#3030 ) * Add end_training/destroy_pg to everything * Carry over to AcceleratorState * If forked, ignore * More numpy fun * Skip only init	2024-08-20 10:40:12 -04:00
okhleif-IL	6882ff2bea	Added a MultiCPU SLURM example using Accelerate Launch and MPIRun (#2902 ) * initial commit for slurm multicpu script * changed output path * Added multicpu example using accelerate + mpirun + slurm * removed file * rename file * deleted file * refactored for cleanliness * updated docs * fixed variable names * quality update * test fix * addressed review comments * fix typo for activateEnvironment.sh * added ACCELERATE path * Edit wording Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com> * added back mistakenly deleted line --------- Co-authored-by: Dina Suehiro Jones <dina.s.jones@intel.com>	2024-07-03 04:14:02 -04:00
YH	5d5d07abfc	Add Profiler Support for Performance Analysis (#2883 ) * Add torch profiler * Add example * Fix rank 0 saving * Add docstring * Add profile readme * Fix minor * Fix example path * Add exp test code * Rename profile dir * Change readme * Change save format * Minor * Enhance docstring example * Add user guide * Add memory profile guide * Enhance error msg * Fix type hinting * Minor refactor * Fix hf tag * Fix copyright year * Mv toctree * Fix image path * Fix license year * Change profiler pattern name * Update package reference * Add slow decorator * Check output value	2024-07-01 18:01:09 -04:00
YH	c0faec766c	Add DDP Communication Hooks (#2841 ) * Add ddp comm hook * Fix dataclass order * Merge ddp grad hook to ddp kwargs handler * Reset ddp kwargs key * Add test * Fix test case * Split ddp grad test * Fix test case * Ehance docstring * Minor * Use naive baseenum for ddp comm hook type * Add by feature example * Add multi device deco * Add user guide * Update examples/by_feature/ddp_comm_hook.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update examples/by_feature/ddp_comm_hook.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Add wrapper and state option details * Update toctree * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/usage_guides/ddp_comm_hook.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Mv ddp comm hook index * Fix ddp comm hook user guid * Del empty line --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>	2024-06-13 10:34:20 -04:00
Zach Mueller	2b1e7bd462	Fixup `free_memory` to deal with garbage collection (#2716 ) * Fixup cleanup * Return * Fixup test * Fix test * DeepSpeed * More careful guard * bring back as none * passing * bring forward	2024-04-30 03:28:57 -04:00
Marc Sun	83317b3081	add distributed examples (#2672 ) * add distributed examples * typo * uncomment * require multigpu * add stable diffusion example * style * add copyright * style * remove tqdm * Apply suggestions from code review Co-authored-by: Zach Mueller <muellerzr@gmail.com> * add comments * remove print * More comments --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-04-25 11:13:56 +02:00
Lucain	b77c65398c	Don't use deprecated `Repository` anymore (#2658 ) * Don't use deprecated Repository anymore * oops * Update requirements.txt	2024-04-12 09:05:54 -04:00
Zach Mueller	6d92198ef4	Schedule free optimizer support (#2631 ) * Schedule free optimizer supporT * Fin * Doc * Add in eval * Add to exclude * Fix module issue	2024-04-08 11:28:27 -04:00
Marc Sun	d7bcd85d4d	fix llama example for pippy (#2616 ) * fix llama example * remove llama from tests	2024-04-03 08:22:16 -04:00
Dina Suehiro Jones	8aa72b9748	Launch mpirun from accelerate launch for multi-CPU training (#2493 ) * Update accelerate config and launch to abstract out mpirun * Fix var * Documentation updates, updating the launch script to work with other MPI programs, and fixing the nlp example when using IPEX * Style fixes * Add a test * Style fixes * Formatting fix * Updates based on review feedback. * Remove model.train() * Doc update * Update doc regarding the accelerate config with the old method of mpirun and accelerate * Fix typo in comment * Quality and test updates * Updates based on review feedback * Quality fix * Fix mock patch path * Updates based on review feedback * Quality fixes	2024-03-06 13:52:08 -05:00
Zach Mueller	7a2feecad4	Add copyright + some ruff lint things (#2523 ) * Copyright and ruff stuff * lol	2024-03-04 09:14:31 -05:00
Ang Wang	b443be70fb	Make torch xla available on GPU (#2176 ) * Make torch xla available on GPU * format code * fix documentation build error * update according to the comments * Replace DistributedType.TPU with DistributedType.XLA * make all ut pass * format code * update comments * skip test * format code * skip FSDPPluginIntegration for torchxla * bring back custom_sampler_check * fix ut * format code * format code --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-02-14 10:19:25 -05:00
Aarni Koskela	13e79ccfab	Enable more Ruff lints & fix issues (#2419 ) * Remove antiquated flake8 and isort configuration * Bump to Ruff 0.2.1 * Explain ruff options * Autofix Ruff B010 (static `setattr`) * Autofix Ruff B009 (static `getattr`) * Enable Ruff UP (not UP007); auto-fix * Fix remaining Ruff UP complaints * Fix a couple more format calls	2024-02-14 08:59:42 -05:00
Zach Mueller	c3aec59b12	Migrate pippy examples over and run tests (#2424 ) * Migrate examples over * Finish updating doc * torchpippy * Readme review nits * Mention gather op in examples	2024-02-09 10:01:56 -05:00
Huazhong Ji	b703efdcc3	Adding Local SGD support for NPU (#2415 )	2024-02-05 10:26:48 -05:00
Antoni-Joan Solergibert	cd7ff5e137	Added activateEnviroment.sh to readme (#2409 ) Clarification of the activateEnviroment.sh script in the examples working on a cluster with Slurm&Enviroment Modules	2024-02-01 14:21:55 -05:00
Niels Horn	14d7c3fca6	Fix `block_size` picking in megatron_lm_gpt_pretraining.py (#2342 ) Only cap `block_size` to 1024 if `tokenizer.model_max_length` is actually greater than 1024.	2024-01-18 13:04:23 -05:00

1 2 3 4

156 Commits