accelerate

mirror of https://github.com/huggingface/accelerate.git synced 2025-10-20 18:13:46 +08:00

Author	SHA1	Message	Date
Marc Sun	3b13453bbf	“Stop Halving My Batch!” · Default back-off 0.5 → 0.9 (#3684 ) * feat(memory): change default find_executable_batch_size to change by 10% instead of 50% * Update test_memory_utils.py * Apply style fixes --------- Co-authored-by: Amit Moryossef <amitmoryossef@gmail.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-07-16 12:32:46 +02:00
Ilyas Moutawwakil	d9e6af8773	HPU support (#3378 ) * init * style * is_hpu_available * fix * import habana_frameworks.torch.distributed.hccl * style * test * initialize dist proc group * revert * set backend to hccl only if hccl initialization sets a local rank * force backend hccl and multi_hpu type when sure of distributed launch * style * pass accelerator tests * pas big modeling tests with bigger atol/rtol for accelerators * fix hpu device count and skip tests requiring hpu:x * hpu autocast * hpu rng_state * hpu launch * hpu special device placement * hpu launch * rng state * distributed data loop tests * enforce non contiguity after device memory allocation * pass fsdp tests * enforce pt_hpu_lazy_mode=0 when fsdp testing * pass cli tests * pass and document grad sync tests * pass kwargs handler and autocast tests * memory utils * found source of int64 errors * skip some modeling utils tests * enable int64 * skip optimizer tests * pass checkpointing tests * pass accelerator tests with safetensors main * more hpu stuff * style * remove PT_HPU_LAZY_MODE and PT_ENABLE_INT64_SUPPORT as they should be in the testing environment * start testing on gaudi2 * support fp16 on gaudi2 * add testing order * custom hpu fsdp env dict * fix torch trace malloc * test ddp half precision comm hooks * fix * fix * remove lower bound for hpu * use 0.72 as lower bound * lower lower bound * order deepspeed tests * fix * deepspeed_use_hpu * assert non lazy mode with offloaded optimizer * make patching torch with habana frameworks the default * less of require_non_hpu * skip test_multi_device_merge_fsdp_weights for now as it halts * skip another flaky test * format * use habana_visible_modules * patch torch hpu device count * avoid setting HABANA_VISIBLE_MODULES * don't play with habana visible devices/modules * only with hpu * fixes and skips * skip * fix device ids and add some todos * skip offloading with generate() * fix * reduced atol/rtol for hpu * fix * tag deepspeed tests that should run first * enable a test path that was skipped * revert a test that was customized for gaudi1 * some patching to enable HABANA_VISIBLE_MODULES * fix zero3 test * misc * test DTensor TP * remove gaudi1 * test * style * comment * pass pad_across_processes * require_fp16 * pass memory utils test * test_ddp_comm_hook * skip half precision comm hooks on hpu * fix * is_fp16_available * fp16 * tp as part of integration tests * fix * write_basic_config * safetensors * local sgd and masked_fill_fwd_i64 * fix num_processes in test_load_states_by_steps * fp8 support * test * fix * add a workflow * Update src/accelerate/accelerator.py * review comments * ci * style * comments * test * habana_frameworks.torch * patch device count * fix * fix * require_fp8 * fix * fix * gaudi 1 * remove unnecessary * fixed maskd fill error in transformers * style * balanced_memory pass on hpu * remove for now * run first * Apply suggestions from code review * style after merge * Update src/accelerate/accelerator.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * Update src/accelerate/utils/transformer_engine.py Co-authored-by: Zach Mueller <muellerzr@gmail.com> * empty cache review comments * test_scirpt.py error messages * AccelerateTestCase for accelerator state cleanup * test * add gaudi1 workflow * fp8 avilability * fix * reduce batch size * concurrency * check cuda as well * nits and comments * mark fsdp tests that require_fp16 * style * mark deepspeed fp16 tests * update image * fix * updated * better msgs * skip pippy * test * test on 2 device * support up to 1% relative error in test_accelerate * skip hpu fp16 * allow for 1 byte differene * revert torch_device change * style * skip memory release since it's flaky * add accelerator state cleanup to fixture * fix * atol * fix * more rtol * equal grad test * revert * pass pippy on gaudi2 and skip on gaudi1 * enable sd 1.5 test with require fp16 * added warning on memory release * don't log warning in memory release as it requires PartialState to be initialized * Apply suggestions from code review --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2025-03-11 11:16:57 -04:00
Ang Wang	b443be70fb	Make torch xla available on GPU (#2176 ) * Make torch xla available on GPU * format code * fix documentation build error * update according to the comments * Replace DistributedType.TPU with DistributedType.XLA * make all ut pass * format code * update comments * skip test * format code * skip FSDPPluginIntegration for torchxla * bring back custom_sampler_check * fix ut * format code * format code --------- Co-authored-by: Zach Mueller <muellerzr@gmail.com>	2024-02-14 10:19:25 -05:00
Aarni Koskela	70cdf5fe52	Make test assertions more idiomatic (#2420 ) * Codemod `unittest` assertions into native assertions With https://github.com/akx/codemod-unittest-to-pytest-asserts * Use plain asserts instead of `assertDict` and `assertList` Done with ``` ast-grep run --pattern 'self.assertDictEqual($A, $B)' --rewrite 'assert $A == $B' -l python -i ast-grep run --pattern 'self.assertListEqual($A, $B)' --rewrite 'assert $A == $B' -l python -i `` * DRY some Deepspeed tests	2024-02-13 14:23:18 -05:00
wangshuai09	f88661b5d9	device agnostic cli/data_loader/grad_sync/kwargs_handlers/memory_utils testing (#2356 ) * test_cli * test_data_loader * test_grad_sync * test_kwargs_handlers * test_memory_utils * test_data_loader * style check	2024-01-26 09:26:40 +01:00
Zachary Mueller	c05ed13fc9	Fix clearning of memory (#1332 )	2023-04-18 10:53:32 -04:00
Zachary Mueller	b22f088ff6	Add new release_memory util (#990 ) * Add new release_memory util * Req cuda	2023-01-19 13:01:24 -05:00
Zachary Mueller	02e2ed567b	Refactor utils into its own module (#340 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-05-05 10:48:07 -04:00
Zachary Mueller	a60640d7e2	Patchfix infinite loop (#335 )	2022-04-29 08:34:37 -04:00
Zachary Mueller	611546f12d	Add guards for batch size finder (#334 ) * Fix zero reached Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-04-28 16:34:07 -04:00
Zachary Mueller	b028a1981d	Add a memory-aware decorator for CUDA OOM avoidance (#324 ) Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>	2022-04-26 10:43:06 -04:00

11 Commits