320 Commits

Author SHA1 Message Date
62ede1ed2a CP docs typos fixed (#3761) 2025-09-05 12:23:33 +02:00
6891c57072 Feat: context parallel v2.0 (#3700)
* Cleanup: context parallel

* Feat: cleanup

* Feat: concept guide

* Fix: rename + version check

* Style

* Fix: add to namespace in a test

* Fix: add skip_if on dataclass tests

* Fix: proper version for version check

* Feat: add tests and cleanup

* Fix: properly version check added tests

* Feat: address comments

* Fix: add both shift_labels and labels to make the model.forward calculate loss

* Fix: remove import, improve comment

* Fix: final checks

* Fix: style

* Fix: style
2025-08-05 16:17:13 +02:00
e2cc537db8 trackio (#3669)
* trackio

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Abubakar Abid <abubakar@huggingface.co>

* seven -> eight

* Add trackio as a real tracker instead

* Sort

* Style

* Style

* Remove step

* Disable trackio on Python < 3.10

* Update src/accelerate/tracking.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* More style

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Abubakar Abid <abubakar@huggingface.co>
2025-07-15 17:17:49 +02:00
5987d79a53 Update gradient_accumulation.md (#3649) 2025-06-23 11:58:31 +02:00
6597dae780 Integrate SwanLab for offline/online experiment tracking for Accelerate (#3605)
* add support for SwanLabTracker and update related documentation

* add emoji in FRAMWORK

* apply the style corrections and quality control

* add support for SwanLabTracker in tests

* fix bug in test_tracking
2025-06-18 15:42:29 +02:00
2eaf5cdbbc remove ipex.optimize in accelerate (#3608)
* remove ipex.optimize in accelerate

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix mis-style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update intel_cpu.md

* Update launch.py

* fix comments

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* add logging

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* Update launch.py

* Apply style fixes

---------

Signed-off-by: YAO Matrix <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-17 11:08:19 +02:00
ee2f48c2c3 [docs] no hard-coded cuda in the ddp documentation (#3589)
* make device-agnostic

* refactor
2025-05-27 11:16:42 +02:00
e6e717589e Add regional compilation to cli tools and env vars (#3572)
* add regional compilation to cli tools and env vars

* added seq parallel to gaudi docs

* explain that lm_head is also compiled separately

* style

* docstring

* style
2025-05-15 11:30:27 +02:00
32874257f3 Add Gaudi doc (#3537)
* Add Gaudi doc

* Address comment from review

* Remove point about region compilation

---------

Co-authored-by: Ilyas Moutawwakil <57442720+IlyasMoutawwakil@users.noreply.github.com>
2025-05-13 18:27:33 +02:00
3524a504c8 update path (#3561) 2025-05-13 13:57:29 +02:00
7b5774ac55 Dynamo regional compilation (#3529) 2025-05-12 09:49:29 +02:00
7013365791 fix typos (#3549) 2025-05-08 14:10:12 +02:00
d02e51cc21 Update big_modeling.md for layerwise casting (#3548)
* Update big_modeling.md for layerwise casting

* doc fix
2025-05-06 09:50:53 +02:00
0af45bf1e8 Fix logic in accelerator.prepare + IPEX for 2+ nn.Models and/or optim.Optimizers (#3517)
* Fix logic in _prepare_ipex

* Add caution about prepare in IPEX docs

* Add suggested workaround to IPEX docs

* Revert unnecessary change

* Update docs/source/usage_guides/ipex.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Remove double space

* Simplify logical checks for IPEX availability

* Revert unnecessary change

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2025-04-25 17:31:36 +02:00
8fb073536a [FSDP2] Enable FULL_STATE_DICT (#3527)
* Feat: enable FULL_STATE_DICT in config

* Feat: support FSDP2 FULL_STATE_DICT

* Refactor: remove deprecated save/load_state_dict

* Docs: add FULL_STATE_DICT as supported to docs

* Feat: update tests

* Feat: change Accelerator.get_state_dict() to use new api
2025-04-23 18:03:45 +02:00
4f35cf713c Solve link error in internal_mechanism documentation (#3506) (#3507)
* Solve link error in internal_mechanism (#3506)

* Link correctly to documentation (#3506)
2025-04-23 17:47:25 +02:00
8c0a29626d Update low_precision_training.md (#3488) 2025-04-08 11:39:58 +02:00
67a768be07 remove use_xpu to fix ut issues, we don't need this since XPU is OOB … (#3460)
* remove use_xpu to fix ut issues, we don't need this since XPU is OOB supported now

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* fix style

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>

* add deprecate warnings

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

* fix

Signed-off-by: YAO Matrix <matrix.yao@intel.com>

---------

Signed-off-by: Yao, Matrix <matrix.yao@intel.com>
Signed-off-by: YAO Matrix <matrix.yao@intel.com>
2025-04-01 11:55:37 +02:00
d7c741a6bc Initial FSDP2 support (#3394)
* Feat: initial conversion tool draft

* Feat: add value mapping to conversion tool

* Refactor: move from os to pathlib

* Feat: add first tests

* Feat: more tests

* Feat: minor fixes + dataclass conversions

* Feat: more remapping

* Fix: namespace has no attribute version + style

* Fix: offload params behavior

* Feat: add option to only rename keys in the config file to

* Fix: wrong attr name

* Fix: partially resolve comments

* Feat: work on config command + minor fixes to reflect changes

* Refactor: style + quality

* Feat: fsdp2 initial work

* Feat: some cleanups and first running fsdp2

* Fix: version checks + mixed precision policy

* Refactor: style + quality

* Remove obsolete todos

* Feat: grad norm clipping

* Fix: tests + rename attrs

* Refactor: style + quality

* Fix: None object is not iterable

* Fix: default cpu_offload for fsdp2

* Fix: cpu offload now behaves correctly

* Feat: apply_activation_checkpointing

* Fix: append to models

* Feat: start on concept guide

* wip: concept guide

* Fix: toctree

* cleanup of the concept guide

* Fix: minor fixes + mp

* Fix: quality + | to union

* Feat: backwards compatibility + args cleanup

* Fix: style + quality

* Feat: enable dropping refs when getting named params

* Fix: memory footprint with fsdp2

* Feat: cpu ram efficient loading

* Fix: mp

* Fix: not warn about sync_modules if fsdp version is 1

* Refactor: minor changes

* Small fixes + refactors

* Feat: docs + cleanup

* Feat: saving works (not sure about optim)

* More loading/saving work

* Feat: disable local_state_dict for fsdp2

* Fix: fsdp2 convergence

* Feat: working comparison script

* Feat: memory tracking fsdp2

* Feat: memory visualizer

* Feat: more work on benchmark

* Fix: raise error if model+optimizer arent prepared together

* Minor fixes

* Style

* More warnings

* Fix: reshard_after_forward vs sharding_strategy conflict

* Refactor: clean up accelerator

* Feat: more testing in fsdp2 benchmark

* Fix: memory visualizer

* Untested: support load/save_state

* Feat: concept guide improvements

* Refactor: concept guide

* Feat: benchmark works

* Feat: more work on fsdp2 benchmark

* Fix: note syntax

* Fix: small fixes + make original tests work

* Fix: grad scaling

* Feat: reshard after forward tests

* Feat: backward prefetch tests

* Feat: tests for fsdp2

* Refactor: minor fixes

* Feat: fsdp_utils docstrings

* Feat: autodoc fsdp.md

* Docs: get_module_children_bottom_up

* Fix: remove unused images

* Refactor: benchmark cleanup

* Fix: docs

* Feat: final doc changes

* Fix: torch.distributed has no attribute tensor

* Fix: style

* Feat: tests include version in failures

* Fix: benchmark force model to load in fp32

* Fix: rename runs

* Feat: last minor fixes

* Feat: new benchmark images
2025-03-27 15:01:18 -04:00
6de900e10a feat: Add no_ssh and slurm multinode launcher options for deepspeed (#3329)
* feat: Add no_ssh multinode launcher option for deepspeed

* fix: Add CLI hints and brief documentation, add slurm launcher, and ensure that deepspeed 0.14.5 version is used for nossh
2025-03-20 10:33:00 -04:00
ac3749dc11 Add Tecorigin SDAA accelerator support (#3330)
Co-authored-by: siqi <siqi@tecorigin.com>
2025-03-05 10:11:21 +01:00
90f81986b9 minor doc fixes (#3365) 2025-02-25 15:52:26 +01:00
fa26dc6156 add missing import (#3396) 2025-02-25 11:07:14 +01:00
8039158d71 Torchao float8 training (#3348)
* Bookmark

* bookmark

* Add torchao base example

* Currently broken

* Clean

* DDP varient working

* FSDP as well

* Works for all but zero3

* Bookmark: currently zero3 is underperforming

* Bookmark

* Another diff

* Fin

* Fin

* Add req huggingface suite

* update tests for fp8/torchao/ddp

* Log FP8 backend used and adjust typing

* add documentation for convert_to_float8_training

* Rename to convert_model_to_fp8_ao

* Call superinit"

* Add types

* Clean

* Use filter_first_and_last_linear_layers

* Update usage guide docs

* Actually loop through the zero stages

* Clean
2025-02-17 11:51:47 -05:00
e34db4d0d2 enable xpu (#3397) 2025-02-17 17:41:50 +01:00
5cc99e6e02 fix: typos in documentation files (#3388)
* Update test_scheduler.py

* Update test_big_modeling.py

* Update test_state_checkpointing.py

* Update test_script.py

* Update cli.md

* Update quicktour.md
2025-02-10 13:11:50 -05:00
ce63623421 works for fp8 with deepspeed (#3361)
* works for fp8 with deepspeed

* Add tests

---------

Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>
2025-02-10 09:31:15 -05:00
f076495580 deepspeed github repo move (#3376) 2025-02-03 13:52:08 -05:00
b13aadcb67 Bye bye torch <2 (#3331)
* Bye bye torch <1

* Add 2.6.0 dl args

* Rm require fsdp

* Adjust imports + 2.0 specific modeling code

* Bring back is_bf16
2025-01-09 12:11:08 -05:00
acfbf72a7f Give example on how to handle gradient accumulation with cross-entropy (#3193)
* Add cross-entropy example in the gradient accumulation docs

* add example of logs

* correct skeleton code

* replace gather_for_metrics with gather

* batch_size -> per_device_batch_size

* remove main_process_only=True

* add autoregressive example in examples/

* Update docs/source/usage_guides/gradient_accumulation.md

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* ruff format

* add grad accum test

* update docs

* Update examples/by_feature/gradient_accumulation_for_autoregressive_models.py

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* update tests

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-12-24 12:26:45 +01:00
3e62fbb09c [docs] no hard-coding cuda (#3270)
* no hard-coding cuda

* Update docs/source/usage_guides/big_modeling.md

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* update device_type

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
2024-12-10 21:32:10 -05:00
cb8b7c637a Fixed typos for Tutorials and Guides docs (#3274) 2024-12-06 10:39:45 -05:00
aa16d69561 [docs] use real path for checkpoint (#3220)
* fix bug

* update
2024-12-06 10:39:29 -05:00
51fd482d6e [docs] update set-seed (#3228)
* update set-seed

* update comment
2024-12-06 10:38:59 -05:00
dd68af886a Update troubleshooting.md (#3259)
I think the terminology of set_breakpoint and check_breakpoint has become set_trigger and check_trigger
2024-12-02 13:41:10 -05:00
1f508a6df6 Update deferring_execution.md (#3262) 2024-12-02 13:40:33 -05:00
29be478862 [WIP] FEAT Decorator to purge accelerate env vars (#3252)
* [WIP] FEAT Decorator to purge accelerate env vars

In some circumstances, calling certain classes or functions can result
in accelerate env vars being set and not being cleaned up afterwards. As
an example, when calling:

TrainingArguments(fp16=True, ...)

The following env var will be set:

ACCELERATE_MIXED_PRECISION=fp16

This can affect subsequent code, since the env var takes precedence over
TrainingArguments(fp16=False). This is especially relevant for unit
testing, where we want to avoid the individual tests to have side
effects on one another. Decorate the unit test function or whole class
with this decorator to ensure that after each test, the env vars are
cleaned up. This works for both unittest.TestCase and normal
classes (pytest); it also works when decorating the parent class.

In its current state, this PR adds the new decorator and tests it, but
the decorator is not yet applied to potentially problematic functions or
classes.

* Linter

* Refactor code to be more readable

---------

Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>
2024-11-25 12:04:56 -05:00
069743775e [docs] add instruction to install bnb on non-cuda devices (#3227)
* ad bnb installation link

* add period

* add xpu comment and fix some bugs

* style fix
2024-11-20 16:58:46 -05:00
8ad2b3b8e7 [docs] update code in tracking documentation (#3235)
* update example code

* revert
2024-11-20 10:04:07 -05:00
cf169a1ae6 enable find_executable_batch_size on XPU (#3236)
* enable on XPU

* Update src/accelerate/utils/memory.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-11-19 12:29:05 -05:00
bf4572b6ce [Utils] align_module_device (#3204)
* implement align_module

* add docs

* move to modeling utils, integrate into existing source code

* update source, expose through utils

* Suggested docstring

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Rewrite for readability, add try finally

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* Use try-finally when aligning with hook

Co-authored-by: Zach Mueller <muellerzr@gmail.com>

* apply style

* improve get_state_dict_from_offload readability

* Update docstring

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* rename to align_module_device, update docstring

---------

Co-authored-by: Zach Mueller <muellerzr@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
2024-11-01 09:05:50 -04:00
497eb3cf86 fix bug (#3166) 2024-10-31 09:08:20 -04:00
4dda5797bd [docs] use nn.module instead of tensor as model (#3157)
* use nn.module instead of tensor

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>

* fix neptune

---------

Signed-off-by: Lin, Fanli <fanli.lin@intel.com>
2024-10-23 12:23:16 -04:00
c809f8e45c [docs] update neptune API (#3181) 2024-10-23 12:14:52 -04:00
735dfa3018 [Utils] has_offloaded_params (#3188)
* implement has_offloaded_params

* update docstring

* expose to utils

* add docs

* apply style, quality

* add tests
2024-10-23 16:44:02 +02:00
6f79b63b86 Trigger weights_only=True by default for all compatible objects (#3036)
* rebase

* Update torch v

* Rename

* Prop to docs

* Actually reverse states

* Rebase fully

* Restore old state

* Keep as load()

* No need for explicit anymore

* Check numpy version, dtypes was added in 1.25

* Clean up diff

* Fix hang
2024-10-10 14:08:24 -04:00
fb68cb9d0e Refactor scaler to util (#3142)
* Refactor scaler to util

* Document

* Use the distributed_type directly
2024-10-08 11:07:01 -04:00
d4d6b6e7f5 fix tip brackets typo (#3129) 2024-10-07 09:34:24 -04:00
e9e5a73fcc POC: multiple model/configuration DeepSpeed support (#3097)
* Bookmark

* Migratory

* Uncomment

* Rm name to model for now

* Rm container

* Left: test

* Allow only wrapping one model

* Add warning but only ref once

* Refine

* Update src/accelerate/accelerator.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Finish stas nits

* Clean

* Fixup test + test writing

* Fully working

* Fin

* Nit

* Quality

* Update src/accelerate/accelerator.py

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Actionable error

* Make note of when its enabled

* Apply suggestions from code review

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Merge tests

* Merge

* Add currently broken test script

* Push the working implementation

* Fin

* Add guards for user behavior

* Test nits

* TODO: finish knowledge distillation example

* Update tests/deepspeed/test_deepspeed_multiple_model.py

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Allow for dict-like interface

* Get rid of disable

* Uncomment

* Complete rewrite to force a dict to be used

* Working tests/fin

* Use name as stas suggestion

* Clean

* docnit

* toctree

* toctree

* Missing ref

* Put in break

* Smaller diff

* Make note on how to use zeroinit

* Make note about accelerator ds plugin

* More docnits

* Apply suggestions from code review

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

* Limit users to not pass in another ds plugin to another accelerator

* not implemented err + Make a note about why no params

* Apply suggestions from code review from Stas

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>

* Add deepspeed_plugins arg + update doc

* Plugin -> plugins

* Change enable() -> select()

* Update ref properly + test

* Be consistent, model1,model2...

* first_, second_

* A few more auto values

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>

---------

Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
2024-09-13 07:28:06 -04:00
79a8426416 🚨🚨🚨 The Great Deprecation 🚨🚨🚨 (#3098)
* The great purge

* Clean

* Some more fixings

* Some more deprecations Benjamin found

* Fix kwarghandler test
2024-09-12 21:12:32 -04:00