* init
* style
* is_hpu_available
* fix
* import habana_frameworks.torch.distributed.hccl
* style
* test
* initialize dist proc group
* revert
* set backend to hccl only if hccl initialization sets a local rank
* force backend hccl and multi_hpu type when sure of distributed launch
* style
* pass accelerator tests
* pas big modeling tests with bigger atol/rtol for accelerators
* fix hpu device count and skip tests requiring hpu:x
* hpu autocast
* hpu rng_state
* hpu launch
* hpu special device placement
* hpu launch
* rng state
* distributed data loop tests
* enforce non contiguity after device memory allocation
* pass fsdp tests
* enforce pt_hpu_lazy_mode=0 when fsdp testing
* pass cli tests
* pass and document grad sync tests
* pass kwargs handler and autocast tests
* memory utils
* found source of int64 errors
* skip some modeling utils tests
* enable int64
* skip optimizer tests
* pass checkpointing tests
* pass accelerator tests with safetensors main
* more hpu stuff
* style
* remove PT_HPU_LAZY_MODE and PT_ENABLE_INT64_SUPPORT as they should be in the testing environment
* start testing on gaudi2
* support fp16 on gaudi2
* add testing order
* custom hpu fsdp env dict
* fix torch trace malloc
* test ddp half precision comm hooks
* fix
* fix
* remove lower bound for hpu
* use 0.72 as lower bound
* lower lower bound
* order deepspeed tests
* fix
* deepspeed_use_hpu
* assert non lazy mode with offloaded optimizer
* make patching torch with habana frameworks the default
* less of require_non_hpu
* skip test_multi_device_merge_fsdp_weights for now as it halts
* skip another flaky test
* format
* use habana_visible_modules
* patch torch hpu device count
* avoid setting HABANA_VISIBLE_MODULES
* don't play with habana visible devices/modules
* only with hpu
* fixes and skips
* skip
* fix device ids and add some todos
* skip offloading with generate()
* fix
* reduced atol/rtol for hpu
* fix
* tag deepspeed tests that should run first
* enable a test path that was skipped
* revert a test that was customized for gaudi1
* some patching to enable HABANA_VISIBLE_MODULES
* fix zero3 test
* misc
* test DTensor TP
* remove gaudi1
* test
* style
* comment
* pass pad_across_processes
* require_fp16
* pass memory utils test
* test_ddp_comm_hook
* skip half precision comm hooks on hpu
* fix
* is_fp16_available
* fp16
* tp as part of integration tests
* fix
* write_basic_config
* safetensors
* local sgd and masked_fill_fwd_i64
* fix num_processes in test_load_states_by_steps
* fp8 support
* test
* fix
* add a workflow
* Update src/accelerate/accelerator.py
* review comments
* ci
* style
* comments
* test
* habana_frameworks.torch
* patch device count
* fix
* fix
* require_fp8
* fix
* fix
* gaudi 1
* remove unnecessary
* fixed maskd fill error in transformers
* style
* balanced_memory pass on hpu
* remove for now
* run first
* Apply suggestions from code review
* style after merge
* Update src/accelerate/accelerator.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Update src/accelerate/utils/transformer_engine.py
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* empty cache review comments
* test_scirpt.py error messages
* AccelerateTestCase for accelerator state cleanup
* test
* add gaudi1 workflow
* fp8 avilability
* fix
* reduce batch size
* concurrency
* check cuda as well
* nits and comments
* mark fsdp tests that require_fp16
* style
* mark deepspeed fp16 tests
* update image
* fix
* updated
* better msgs
* skip pippy
* test
* test on 2 device
* support up to 1% relative error in test_accelerate
* skip hpu fp16
* allow for 1 byte differene
* revert torch_device change
* style
* skip memory release since it's flaky
* add accelerator state cleanup to fixture
* fix
* atol
* fix
* more rtol
* equal grad test
* revert
* pass pippy on gaudi2 and skip on gaudi1
* enable sd 1.5 test with require fp16
* added warning on memory release
* don't log warning in memory release as it requires PartialState to be initialized
* Apply suggestions from code review
---------
Co-authored-by: Zach Mueller <muellerzr@gmail.com>
* Let's try it out
* Let's try this out
* Some more cases
* String
* Require hub online for estimator
* Add CI checker to alert on hub status
* Format
* Oops death by ctrl z
* Fix import
* Update log_reports to send to slack
* REVERT this change, just for testing!
* Add slack_sdk dep
* Second one
* Try now?
* Remove len
* Need secret
* Try with new version
* Right boldface
* Fix import
* New format, use tabulate
* Add tabulate to yml
* Quality
* Purposefully fail
* Working updater, now to test
* Int
* Print payload
* Append
* Change maxcolwidth
* Offset
* More offset
* Context
* No max width
* gh format
* max-col-width'
* Reduce max
* Non-working tables
* Rm md report
* Try now
* Try with just count
* Use table
* New version
* Use table
* Try with thread
* Should be working now
* Clean
* Fixup test reports fully
* Revert workflow
* Keep tabulate in workflow ci
* Update other workflows
* Use blocks for better formatting
* ONe more test
* Works as expected
* checkpointing enhancements and fixes for FSDP and DeepSpeed
* resolving comments
1. Adding deprecation args and warnings in launcher for FSDP
2. Handling old configs to work with new launcher args wrt FSDP.
3. Reverting changes to public methods in `checkpointing.py` and handling it in `Accelerator`
4. Explicitly writing the defaults of various FSDP options in `dataclasses` for readability.
* fixes
1. FSDP wrapped model being added to the `_models`.
2. Not passing the env variables when args are None.
* resolving comments
* adding FSDP for all the collective operations
* adding deepspeed and fsdp tests
1. Removes mrpc datafiles and directly relies on HF datasets as it was throwing `file not found` error when running from within `tests` folder. Updating `moke_dataloaders` as a result.
2. adding `test_performance.py`, `test_memory.py` and `test_checkpointing.py` for multi-gpu FSDP and DeepSpeed tests
* reverting `mocked_dataloader` changes
* adding FSDP tests
* data files revert
* excluding fsdp tests from `tests_core`
* try 2
* adding time delay to avoid `torchrun` from crashing at times leading which causing flaky behaviour
* reducing the time of tests
* fixes
* fix
* fixes and reduce time further
* reduce time further and minor fixes
* adding a deepspeed basic e2e test for single gpu setup
* deepspeed revamp
* Update dataclasses.py
* Update deepspeed.py
* quality
* fixing code
* quality
* FIx imports
* saving 16bit model in zero stage 3
1. Saving 16bit model in zero stage 3
2. zero init in stage 3 support using HFDeepSpeedConfig
* quality
* adding test and fixing bugs
* update makefile for deepspeed tests
* Update test.yml
* adding `deepspeed` as requirement for tests
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* quality
* addressing comments
* add example and minor updates
1. Add example to show the usage of config file with revamped deepspeed support.
2. update required deepspeed version to 0.6.5
2. reverting `reinit` change as it is not required,
3. raising Exception when using `clip_grad_value` with DeepSpeed/FSDP.
* Documentation and Zero-3 Inference Support
1. Changes to support ZeRo Stage-3 Inference support.
2. minor bug fixes.
3. Documentation.
* doc fix
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* addressing comments
* update doc to address comments and bug fixes
1. update tests and add new one testing autofill functionality of `prepare` method.
2. fix bug related to zero-3 init related to HFDeepSpeedConfig
3. Update documentation addressing comments.
* removing image and hosting it on `documentation-images` dataset
* check for hidden_size for zero_opt heurisitics
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
- Added experiment tracking API, and support for Weights and Biases, TensorBoard, and CometML + Tests
- Added `tensorflow` to a new dependency list to be used during tests
- Added three new functions in `Accelerator` to interact with the API