13a00e2877
Lock datasets
2025-07-10 11:56:37 +00:00
b1498c7c5c
Revert "Bunch of FSDP improvements ( #3671 )"
...
This reverts commit d6c986c3f2dd8417c8689967a2d139576d617925.
2025-07-10 11:38:40 +00:00
d6c986c3f2
Bunch of FSDP improvements ( #3671 )
...
* Feat: split tests
* Feat: finito
* Fix
* Final, tests pass
2025-07-09 16:05:22 +02:00
1ac8643df7
xpu enablement on left cases ( #3654 )
...
* 1. enable xpu for launcher 2. expand cuda only ds uts to xpu 3. expand profiler example to xpu
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* rename
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* Update profiler.py
* Apply style fixes
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-07-07 18:10:53 +02:00
07ce74868c
Fix: properly error when DDP + Dtensor model ( #3629 )
...
* Feat: add check
* Refactor: nits
2025-06-27 01:33:45 +02:00
175fe91589
Added a check in the no_sync() function to avoid errors when using deepspeed zero2/3. ( #3656 )
2025-06-26 14:39:04 +02:00
fe16ce8bce
Fix fsdp2 example ( #3657 )
2025-06-26 14:08:51 +02:00
5987d79a53
Update gradient_accumulation.md ( #3649 )
2025-06-23 11:58:31 +02:00
31af8d4e8e
shards ( #3645 )
2025-06-20 11:24:20 +02:00
b7493a82b1
Add support for e5e2 and default to hybrid when launcher is used ( #3640 )
...
* add support for e5e2 and defaumt to hybrid when launcher is used
* style
2025-06-20 11:11:32 +02:00
a16d2bb3c1
bump to v1.9.0dev
2025-06-19 15:13:41 +02:00
cac22ed980
fix grad acc deepspeed ( #3638 )
...
* fix grad acc deepspeed
* style
2025-06-19 12:06:21 +02:00
be826a6b7b
Fix: correct labels ( #3637 )
2025-06-19 11:01:56 +02:00
5939640829
Feat: add cpu offload ( #3636 )
2025-06-18 18:13:45 +02:00
7f9c8cbe34
[DeepSpeed] sync gradient accum steps from deepspeed plugin ( #3632 )
...
* sync steps
* add a debug log when overriding
* make grad accum always consistent
* remove debug
2025-06-18 16:45:57 +02:00
9888c7ed23
feat: use datasets.IterableDataset shard if possible ( #3635 )
...
* feat: use datasets.IterableDataset shard if possible.
When `accelerator.prepare` is called on a
`datasets.IterableDataset`, use the `shard` method to
split the dataset across the available processes. This
allows for more efficient data loading and processing.
Without load and slice overhead of `IterableDatasetShard`
* dataset
* remove unused import
* style
---------
Co-authored-by: wuwenxu.01 <wuwenxu.01@bytedance.com >
2025-06-18 16:45:17 +02:00
42a68c30dc
Fix Typos in Documentation and Comments ( #3621 )
...
* Update state.py
* Update tracking.py
2025-06-18 15:53:02 +02:00
6597dae780
Integrate SwanLab for offline/online experiment tracking for Accelerate ( #3605 )
...
* add support for SwanLabTracker and update related documentation
* add emoji in FRAMWORK
* apply the style corrections and quality control
* add support for SwanLabTracker in tests
* fix bug in test_tracking
2025-06-18 15:42:29 +02:00
8878d93745
remove hardcoded cuda from fsdpv2 ( #3631 )
2025-06-17 14:32:10 +02:00
2eaf5cdbbc
remove ipex.optimize in accelerate ( #3608 )
...
* remove ipex.optimize in accelerate
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* fix mis-style
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* Update intel_cpu.md
* Update launch.py
* fix comments
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* fix style
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* add logging
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* Update launch.py
* Apply style fixes
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-17 11:08:19 +02:00
23c1d8db89
[Deepspeed] deepspeed auto grad accum ( #3630 )
...
* deepspeed auto grad accum
* add tests for grad accum
* use tiny-random-gpt2
* Update tests/deepspeed/test_deepspeed_gradient_accumulation.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
* fix redundant code
* set_gradient_accumulation_boundary is always there
* remove unused helper
* no need for this
* full revert
* Apply style fixes
* get_global_grad_norm is always there
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-16 16:28:24 +02:00
0af621bbec
add xpu support in TorchTensorParallelPlugin ( #3627 )
...
* add xpu support in TorchTensorParallelPlugin
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
* fix typo
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
---------
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-06-13 17:45:51 +02:00
bee04f1b01
Add fp8_e5m2 support in dtype_byte_size
( #3625 )
...
* float8_e5m2 device_map
* remove prints
2025-06-12 16:27:32 +02:00
8a953f08c6
fix xpu 8bit value loading ( #3623 )
...
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-06-12 14:55:14 +02:00
3518c03584
small fix ( #3619 )
2025-06-11 14:02:45 +02:00
2f8fd72e51
Remove device_count ( #3587 )
2025-06-10 14:50:34 +02:00
d2e6b0313d
[FSDP2] Refactor + FP8 ( #3585 )
...
* Fix double wrap
* Clocking off, ~equal to torch baseline
* works?
* Working version
* Partial rewrite
* FSDP2 path works
* Fix back prepare
* Almost done, proper AC left
* Feat: should work, cleanup + test more benchmarks left
* Style+quality
* Feat: fp8 example
* Feat: better example
* Feat: add readme
* Docs + should be done
* Fix: typos
* Fix: protect imports
* Feat: address comments
* Feat: add flops image
2025-06-10 14:26:48 +02:00
b9fee48c85
better handle FP8 with and without deepspeed ( #3611 )
...
* use the state mixed precision which has undergone all preprocessing
* Update src/accelerate/accelerator.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/accelerate/accelerator.py
* accelerator state sets the mixed precision for deepspeed and fp8_enabled
* fix
* fix
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-06-10 14:24:43 +02:00
3a82b056cf
Fix bf16 training with TP ( #3610 )
...
* fix
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-06-10 11:29:59 +02:00
6b61a373a2
fix deepspeed regional compilation ( #3609 )
2025-06-06 14:48:43 +02:00
682691deac
Update Gaudi Runners ( #3593 )
...
* test
* fix
* push
* in the morning
* fix backend
* run first
* set habana modules
* dynamo backend
* trigger
* remove on pr
* remove on file change
2025-06-03 12:36:56 +02:00
791055b484
Fix: list object has no attribute keys ( #3603 )
2025-06-03 12:24:20 +02:00
16bf1d8901
enable torchao and pippy test cases on XPU ( #3599 )
...
* enable torchao and pippy test cases on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
2025-05-30 17:36:34 +02:00
ab3c604e48
enable big_model_inference on xpu ( #3595 )
...
* enable big_model_inference on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix quality
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
2025-05-30 17:23:26 +02:00
273799c85d
enable fsdp2 benchmark on XPU ( #3590 )
...
* enable fsdp2 benchmark on XPU
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* add deterministic
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
2025-05-27 14:08:59 +02:00
43526c5c08
add device-agnostic GradScaler ( #3588 )
...
* add device-agnostic GradScaler
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix bug
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix review comments
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* fix
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* format
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* Apply style fixes
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-27 11:44:50 +02:00
07f2392f40
change to use torch.device ( #3594 )
...
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
2025-05-27 11:17:18 +02:00
ee2f48c2c3
[docs] no hard-coded cuda in the ddp documentation ( #3589 )
...
* make device-agnostic
* refactor
2025-05-27 11:16:42 +02:00
4f3abb73a7
Set ccl and KMP param in simple launch ( #3575 )
...
* Even 1 CPU mechine can also run multi process
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix ccl and kml param setting
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* set master addr only when processes > 1
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix num process check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
* fix ccl args check
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
---------
Signed-off-by: jiqing-feng <jiqing.feng@intel.com >
2025-05-26 15:55:10 +02:00
db536cbfeb
Fix: Defer Tracker Initialization to Prevent Premature Distributed Setup ( #3581 )
...
* Fix tracker initialize distributed before InitProcessGroupKwargs
* Fix tracker initialize distributed before InitProcessGroupKwargs
* Add test for bug #3550
* Improve test for #3550
* Remove redundant code
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* fix style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2025-05-26 15:08:13 +02:00
4e9d0deba6
enable regional_compilation benchmark on xpu ( #3592 )
...
* enable regional_compilation benchmark on xpu
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
* Apply style fixes
---------
Signed-off-by: Matrix YAO <matrix.yao@intel.com >
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-26 15:05:42 +02:00
8cb3ace894
Add kwargs to optimizer, scheduler and dataloader using function accelerator().load_state()
( #3540 )
...
* Added artifacts and figure tracking at MLFlow tracker
* Added `log_artifact` to the MLFlowTracker
* Remove changes
* Added kwargs when loading state.
* added doc string
* Adjusted correct default types of kwargs
* Changed the load kwargs to a single one
* removed None value from kwargs
* fix kwargs for loading the model
* removed load_kwargs from optimizer state dict
* make load_kwargs a dictionary
* revert last changes
* reverted load_kwargs
* fix docstring
* added dict initiation
* Fix quality error during PR
2025-05-22 17:21:54 +02:00
b6d97cb856
Resolve logger warnings ( #3582 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com >
2025-05-22 16:26:31 +02:00
33967d4733
Add support for standalone mode when default port is occupied on single node ( #3576 )
...
* add standalone mode and replace ConnectionError with a warning when the main process port is in use, allowing for automatic port selection
* address review feedback: warn on port conflict only for single-node; raise error for multi-node
* Apply style fixes
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-20 12:29:53 +02:00
5b1fcda371
enable test_cli & test_example cases on XPU ( #3578 )
...
* enable test_cli & test_example cases on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* remove print
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix ci issue
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
Signed-off-by: YAO Matrix <matrix.yao@intel.com >
2025-05-20 12:04:24 +02:00
f55f0533b5
goodbye torch_ccl ( #3580 )
...
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
2025-05-20 12:02:14 +02:00
1ec99f0b58
enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU ( #3579 )
...
* enable test_load_checkpoint_and_dispatch_with_broadcast cases on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* Update test_load_checkpoint_and_dispatch_with_broadcast.py
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
2025-05-19 11:27:40 +02:00
417bc52965
bump to v1.8.0dev
2025-05-15 12:02:44 +02:00
97c93c4809
enable test_dispatch_model_tied_weights_memory_with_nested_offload_cpu on xpu ( #3569 )
...
* enable test_dispatch_model_tied_weights_memory_with_nested_offload_cpu
case on XPU
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* replace hard-coded torch.cuda w/ device-dependent callings
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* use device agnostic clear_device_cache
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
* fix style
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
---------
Signed-off-by: Matrix Yao <matrix.yao@intel.com >
2025-05-15 11:40:55 +02:00
cd37bbb629
set backend correctly for CUDA+FSDP2+cpu-offload ( #3574 )
...
* set backend correctly for CUDA+FSDP2+cpu-offload
* offload
* format
---------
Co-authored-by: Wing Lian <wing@axolotl.ai >
2025-05-15 11:38:53 +02:00