19 Commits

Author SHA1 Message Date
847ae58c74 Fix FP8 tests, enable FP8 to be used without direct Accelerator() configuring (#3677)
* single-gpu tests passing

* install deepspeed in fp8 container

* revert mixed_precision check
2025-07-15 15:20:57 +02:00
ab3c604e48 enable big_model_inference on xpu (#3595)
* enable big_model_inference on XPU

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix style

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* fix quality

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
2025-05-30 17:23:26 +02:00
273799c85d enable fsdp2 benchmark on XPU (#3590)
* enable fsdp2 benchmark on XPU

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* add deterministic

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
2025-05-27 14:08:59 +02:00
4e9d0deba6 enable regional_compilation benchmark on xpu (#3592)
* enable regional_compilation benchmark on xpu

Signed-off-by: Matrix YAO <matrix.yao@intel.com>

* Apply style fixes

---------

Signed-off-by: Matrix YAO <matrix.yao@intel.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-05-26 15:05:42 +02:00
7b5774ac55 Dynamo regional compilation (#3529) 2025-05-12 09:49:29 +02:00
583b26db3c Add FP8 runners + tweak building FP8 image (#3493)
* Initial test

* Try on push

* Only wf dispatch now

* keep trying

* Try again

* Try again

* source activate?

* Force bash

* Source activate accelerate to make it get the env propelry

* try using nightly docker

* Try this?

* Try this?

* Try this, proper output

* Try this, proper output

* Try via full conda activate(?)

* rm conda

* te fp8 tests

* add ao

* ao in setup too

* actually include fp8 deps

* FP8 docker image, use newer version

* Update docker image to take in input

* Test

* prior month

* igpu?

* Use only last 2 digits of year

* Build rest

* Apply style fixes

---------

Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-04-15 11:39:43 +02:00
3169339f5b Bump ruff to 0.11.2 (#3471)
* ruff format

* Bump ruff to 0.11.2
2025-04-01 11:57:06 +02:00
d7c741a6bc Initial FSDP2 support (#3394)
* Feat: initial conversion tool draft

* Feat: add value mapping to conversion tool

* Refactor: move from os to pathlib

* Feat: add first tests

* Feat: more tests

* Feat: minor fixes + dataclass conversions

* Feat: more remapping

* Fix: namespace has no attribute version + style

* Fix: offload params behavior

* Feat: add option to only rename keys in the config file to

* Fix: wrong attr name

* Fix: partially resolve comments

* Feat: work on config command + minor fixes to reflect changes

* Refactor: style + quality

* Feat: fsdp2 initial work

* Feat: some cleanups and first running fsdp2

* Fix: version checks + mixed precision policy

* Refactor: style + quality

* Remove obsolete todos

* Feat: grad norm clipping

* Fix: tests + rename attrs

* Refactor: style + quality

* Fix: None object is not iterable

* Fix: default cpu_offload for fsdp2

* Fix: cpu offload now behaves correctly

* Feat: apply_activation_checkpointing

* Fix: append to models

* Feat: start on concept guide

* wip: concept guide

* Fix: toctree

* cleanup of the concept guide

* Fix: minor fixes + mp

* Fix: quality + | to union

* Feat: backwards compatibility + args cleanup

* Fix: style + quality

* Feat: enable dropping refs when getting named params

* Fix: memory footprint with fsdp2

* Feat: cpu ram efficient loading

* Fix: mp

* Fix: not warn about sync_modules if fsdp version is 1

* Refactor: minor changes

* Small fixes + refactors

* Feat: docs + cleanup

* Feat: saving works (not sure about optim)

* More loading/saving work

* Feat: disable local_state_dict for fsdp2

* Fix: fsdp2 convergence

* Feat: working comparison script

* Feat: memory tracking fsdp2

* Feat: memory visualizer

* Feat: more work on benchmark

* Fix: raise error if model+optimizer arent prepared together

* Minor fixes

* Style

* More warnings

* Fix: reshard_after_forward vs sharding_strategy conflict

* Refactor: clean up accelerator

* Feat: more testing in fsdp2 benchmark

* Fix: memory visualizer

* Untested: support load/save_state

* Feat: concept guide improvements

* Refactor: concept guide

* Feat: benchmark works

* Feat: more work on fsdp2 benchmark

* Fix: note syntax

* Fix: small fixes + make original tests work

* Fix: grad scaling

* Feat: reshard after forward tests

* Feat: backward prefetch tests

* Feat: tests for fsdp2

* Refactor: minor fixes

* Feat: fsdp_utils docstrings

* Feat: autodoc fsdp.md

* Docs: get_module_children_bottom_up

* Fix: remove unused images

* Refactor: benchmark cleanup

* Fix: docs

* Feat: final doc changes

* Fix: torch.distributed has no attribute tensor

* Fix: style

* Feat: tests include version in failures

* Fix: benchmark force model to load in fp32

* Fix: rename runs

* Feat: last minor fixes

* Feat: new benchmark images
2025-03-27 15:01:18 -04:00
8039158d71 Torchao float8 training (#3348)
* Bookmark

* bookmark

* Add torchao base example

* Currently broken

* Clean

* DDP varient working

* FSDP as well

* Works for all but zero3

* Bookmark: currently zero3 is underperforming

* Bookmark

* Another diff

* Fin

* Fin

* Add req huggingface suite

* update tests for fp8/torchao/ddp

* Log FP8 backend used and adjust typing

* add documentation for convert_to_float8_training

* Rename to convert_model_to_fp8_ao

* Call superinit"

* Add types

* Clean

* Use filter_first_and_last_linear_layers

* Update usage guide docs

* Actually loop through the zero stages

* Clean
2025-02-17 11:51:47 -05:00
fb68cb9d0e Refactor scaler to util (#3142)
* Refactor scaler to util

* Document

* Use the distributed_type directly
2024-10-08 11:07:01 -04:00
e93b056687 fix deprecated torch.cuda.amp.GradScaler FutureWarning for pytorch 2.4+ (#3132)
* fix deprecated FutureWarning for pytorch 2.4+

* perform `make style` and `make quality`

* try to fix `Quality Check` on `actions/workflows/quality.yml`

* undo changes for `src/accelerate/utils/memory.py`

* adapt scaler for pytorch.__version__

* fix scalar waning for npu device deps on pytorch2.4 version check

* fallback to default npu scaler

* fallback to default `GradScaler` doc
2024-10-07 09:26:59 -04:00
d5b7b70e06 MS-AMP support (w/o FSDP) (#3093)
* MS-AMP support sans FSDP

* Fix import

* Fixings

* Last Benjamin nit

* New ruff version cleaning

* Update src/accelerate/accelerator.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-09-10 12:25:45 -04:00
3fd02e60dc MAINT: Upgrade ruff to v0.6.4 (#3095)
* MNT Upgrade ruff to 0.6.4

Currently used version, 0.2.1, is quite old at this point.

Not a lot needed to be changed:

- Change ruff version in setup.py
- Remove deprecated ignore-init-module-imports option for ruff
- Type comparison should use is and not ==
- Use f-string instead of % formatting
- Some line wrapping and empty lines

* Oops
2024-09-10 10:43:37 -04:00
c2120927b0 Add FP8 docker images (#3048)
* Add fp8 docker images

* Add more docker images

* Rv

* bring back ds

* Less diffy

* No need for sep tag
2024-08-26 12:12:34 -04:00
c0cf860dc6 Fix fp8 benchmark on single GPU (#3032) 2024-08-22 16:54:32 -04:00
a452327e8e Enable FSDP & Deepspeed + FP8 (#2983)
* Working version rebased from main

* kwargs

* Clean

* Fix more nits

* Fin

* Delay autocast flag

* Enable FP8 autocast during eval only if specified

* Fin

* Rm comment

* All done

* Zero3 works!

* Let the wrapper come off during unwrap_model

* Add import check

* Migrate all to benchmarks folder and make TE import check work

* Add readme

* Add README to benchmarks folder

* Update CLI to now include fp8 args

* Add test config for 0_34

* Finish adding to config yaml

* Write docs

* Expound docs w/ FP8

* Add to toctree
2024-08-14 14:57:01 -04:00
7a2feecad4 Add copyright + some ruff lint things (#2523)
* Copyright and ruff stuff

* lol
2024-03-04 09:14:31 -05:00
5002e56704 Update quality tools to 2023 (#1046)
* Setup 2023 tooling for quality

* Result of styling

* Simplify inits and remove isort and flake8 from doc

* Puts back isort skip flag
2023-02-07 13:34:05 -05:00
ea0d5368bd Add benchmarks (#506)
* Add benchmarks

* Oops! Forgot one file

* Update benchmarks/README.md

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>

Co-authored-by: Zachary Mueller <muellerzr@gmail.com>
2022-07-12 15:16:45 -04:00