* add support for SwanLabTracker and update related documentation
* add emoji in FRAMWORK
* apply the style corrections and quality control
* add support for SwanLabTracker in tests
* fix bug in test_tracking
* Feat: initial conversion tool draft
* Feat: add value mapping to conversion tool
* Refactor: move from os to pathlib
* Feat: add first tests
* Feat: more tests
* Feat: minor fixes + dataclass conversions
* Feat: more remapping
* Fix: namespace has no attribute version + style
* Fix: offload params behavior
* Feat: add option to only rename keys in the config file to
* Fix: wrong attr name
* Fix: partially resolve comments
* Feat: work on config command + minor fixes to reflect changes
* Refactor: style + quality
* Feat: fsdp2 initial work
* Feat: some cleanups and first running fsdp2
* Fix: version checks + mixed precision policy
* Refactor: style + quality
* Remove obsolete todos
* Feat: grad norm clipping
* Fix: tests + rename attrs
* Refactor: style + quality
* Fix: None object is not iterable
* Fix: default cpu_offload for fsdp2
* Fix: cpu offload now behaves correctly
* Feat: apply_activation_checkpointing
* Fix: append to models
* Feat: start on concept guide
* wip: concept guide
* Fix: toctree
* cleanup of the concept guide
* Fix: minor fixes + mp
* Fix: quality + | to union
* Feat: backwards compatibility + args cleanup
* Fix: style + quality
* Feat: enable dropping refs when getting named params
* Fix: memory footprint with fsdp2
* Feat: cpu ram efficient loading
* Fix: mp
* Fix: not warn about sync_modules if fsdp version is 1
* Refactor: minor changes
* Small fixes + refactors
* Feat: docs + cleanup
* Feat: saving works (not sure about optim)
* More loading/saving work
* Feat: disable local_state_dict for fsdp2
* Fix: fsdp2 convergence
* Feat: working comparison script
* Feat: memory tracking fsdp2
* Feat: memory visualizer
* Feat: more work on benchmark
* Fix: raise error if model+optimizer arent prepared together
* Minor fixes
* Style
* More warnings
* Fix: reshard_after_forward vs sharding_strategy conflict
* Refactor: clean up accelerator
* Feat: more testing in fsdp2 benchmark
* Fix: memory visualizer
* Untested: support load/save_state
* Feat: concept guide improvements
* Refactor: concept guide
* Feat: benchmark works
* Feat: more work on fsdp2 benchmark
* Fix: note syntax
* Fix: small fixes + make original tests work
* Fix: grad scaling
* Feat: reshard after forward tests
* Feat: backward prefetch tests
* Feat: tests for fsdp2
* Refactor: minor fixes
* Feat: fsdp_utils docstrings
* Feat: autodoc fsdp.md
* Docs: get_module_children_bottom_up
* Fix: remove unused images
* Refactor: benchmark cleanup
* Fix: docs
* Feat: final doc changes
* Fix: torch.distributed has no attribute tensor
* Fix: style
* Feat: tests include version in failures
* Fix: benchmark force model to load in fp32
* Fix: rename runs
* Feat: last minor fixes
* Feat: new benchmark images
* [WIP] FEAT Decorator to purge accelerate env vars
In some circumstances, calling certain classes or functions can result
in accelerate env vars being set and not being cleaned up afterwards. As
an example, when calling:
TrainingArguments(fp16=True, ...)
The following env var will be set:
ACCELERATE_MIXED_PRECISION=fp16
This can affect subsequent code, since the env var takes precedence over
TrainingArguments(fp16=False). This is especially relevant for unit
testing, where we want to avoid the individual tests to have side
effects on one another. Decorate the unit test function or whole class
with this decorator to ensure that after each test, the env vars are
cleaned up. This works for both unittest.TestCase and normal
classes (pytest); it also works when decorating the parent class.
In its current state, this PR adds the new decorator and tests it, but
the decorator is not yet applied to potentially problematic functions or
classes.
* Linter
* Refactor code to be more readable
---------
Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>
* rebase
* Update torch v
* Rename
* Prop to docs
* Actually reverse states
* Rebase fully
* Restore old state
* Keep as load()
* No need for explicit anymore
* Check numpy version, dtypes was added in 1.25
* Clean up diff
* Fix hang
* Bookmark
* Migratory
* Uncomment
* Rm name to model for now
* Rm container
* Left: test
* Allow only wrapping one model
* Add warning but only ref once
* Refine
* Update src/accelerate/accelerator.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Finish stas nits
* Clean
* Fixup test + test writing
* Fully working
* Fin
* Nit
* Quality
* Update src/accelerate/accelerator.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Actionable error
* Make note of when its enabled
* Apply suggestions from code review
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Merge tests
* Merge
* Add currently broken test script
* Push the working implementation
* Fin
* Add guards for user behavior
* Test nits
* TODO: finish knowledge distillation example
* Update tests/deepspeed/test_deepspeed_multiple_model.py
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Allow for dict-like interface
* Get rid of disable
* Uncomment
* Complete rewrite to force a dict to be used
* Working tests/fin
* Use name as stas suggestion
* Clean
* docnit
* toctree
* toctree
* Missing ref
* Put in break
* Smaller diff
* Make note on how to use zeroinit
* Make note about accelerator ds plugin
* More docnits
* Apply suggestions from code review
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Limit users to not pass in another ds plugin to another accelerator
* not implemented err + Make a note about why no params
* Apply suggestions from code review from Stas
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Add deepspeed_plugins arg + update doc
* Plugin -> plugins
* Change enable() -> select()
* Update ref properly + test
* Be consistent, model1,model2...
* first_, second_
* A few more auto values
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: lewtun <lewis.c.tunstall@gmail.com>
* Working version rebased from main
* kwargs
* Clean
* Fix more nits
* Fin
* Delay autocast flag
* Enable FP8 autocast during eval only if specified
* Fin
* Rm comment
* All done
* Zero3 works!
* Let the wrapper come off during unwrap_model
* Add import check
* Migrate all to benchmarks folder and make TE import check work
* Add readme
* Add README to benchmarks folder
* Update CLI to now include fp8 args
* Add test config for 0_34
* Finish adding to config yaml
* Write docs
* Expound docs w/ FP8
* Add to toctree
* Initial commit
* Now to test
* Store false
* Slight tweaks
* Fix naming
* Got it all working with tests
* Use not for safetensors arg
* rm change
* Add docs
* Adjust based on Marc's feedback
* Specify just weights
* Update tests to include CLI and swap namings
* Fin
* Rm unused
* Rm again
* address part of stats comments
* automatically set sync_module_states if low_cpu_mem is set
* Apply suggestions from @stas00
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* add links from fsdp and deepspeed docs. fix deepspeed imports
* replace raise in accelerate.launch
---------
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* DOC Fixes to Accelerator docstring
- Add more links to accelerator classes where applicable
- Fix a typo: KwargHandler => KwargsHandler
* Fix syntax issues
Not sure how to add a link of the type is `list[SomeType]`, so just
removed it for now.
* Fixing link for KwargsHandler
* Add KwargsHandler to API docs
* Also add doc entry to kwargs.md
* Deprecate and introduce dataloader_config
* Update docs
* Doc nits
* More tests, adjust based on PR review
* Fixup tests
* Nits
* Update docs/source/quicktour.md
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Clean
* Actually create one
* Forgot to change one
* Use pytest
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Make torch xla available on GPU
* format code
* fix documentation build error
* update according to the comments
* Replace DistributedType.TPU with DistributedType.XLA
* make all ut pass
* format code
* update comments
* skip test
* format code
* skip FSDPPluginIntegration for torchxla
* bring back custom_sampler_check
* fix ut
* format code
* format code
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Broken version
* Timing I would expect
* Working version!
* Use MethodType
* working test
* Tests
* Use no split module classes explicitly
* Put split_points in pipelien
* Store split points in hf_split_points
* fix case num_process=1
* Allow for dynamic batch padding (#2352)
* Allow for dynamic batch paddign
* Fix test
* Update src/accelerate/inference.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Break early after the first valid bs is found
* Less slicy-dicy
* Test cv model
* Start, need to test
* Use dataloader-like logic
* Refactor to utils
* With tests
* Update the source
* Clean
* bs=1 case
* Add test
* add some failing test
* Almost working version
* Much cleaner implementation
* Use pad_input_tensor
* All tests passing!
* Do it at tracing too
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Marc Sun <marc@huggingface.co>
* Rm literal
* Allow users to pass in max_memory
* Note about recursion
* Document, document, document
* Right import check
* Fix bug, add tests to multigpu runners
* Change default to None
* Start of docs
* Try again?
* Try again x2
* Trailing comma
* Move import
* Clean
* typehint
* typo
* From code review
* Use num_chunks
* Update tests/test_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Bad copy/paste
* hf_split_points
---------
Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Redo with new version
* Store
* Working version
* Seperate for now
* Min diff
* check if available
* Better docstring
* Check for multiple models and optimizers
* Check for TE and MSAMP args seperately
* String clarity
* Better docstring and types
* Quality
* Simplify a bunch for fp8
* Convert literals to type alias
* Better err
* Docs
* toc typo
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Maria Khalusova <kafooster@gmail.com>
* Address doc nits
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Maria Khalusova <kafooster@gmail.com>
* first take at troubleshooting guide
* logging moved to the troubleshooting guide
* TOC updates and gudie edits
* minor edits
* moved to tutorials
* feedback addressed
* batch size clarifications
* typo
* kernel, early stopping hanging, feedback
* add clearml tracker
* fix style in tracking.py
* run ruff --fix
* run ruff fix on src/accelerate/utils/__init__.py as well
* properly run make style
* add tests
* modify code based on code review
* changes based on code review
* quote data_frame
* fix docs
* remove pandas req in log_table
* style changes
* add tracker to docs
* Estimator
* Right err
* Fixup tests
* trust remote code
* Print output for debugging purposes
* trust_remote_code
* Address some comments
* change doc to req arg
* Properly check for _no_split_modules in transformer models
* Note on transformer models
* Check/handle pentabytes
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* Tests are passing locally again, better handle for no_split
* Adjust setup?
* Let's see if the cleaner version works
* Refactor and clean up for testing
* Specify in comments
* Better error handling
* A million tests later
* More tests + err handling
* Require hub
* More with remote code
* Clean up
* Add a test for no_split
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Docstring
* Address some comments
* rm einops
* Let it err out
* Adjust errs
* Tests
* Reduce test repeats
* Clean up borders
* Tip on 20%
---------
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* fix KwargsHandler.to_kwargs not working with os.environ initialization in __post_init__
* fix test_torch_dynamo_plugin such that it wouldn't change os.environ permanently
* move clear_os_environ func to utils/other and rename it
* reformat code in order to pass ci quality check
* modifiy the comment of utils.other.clear_environment