* Initial test
* Try on push
* Only wf dispatch now
* keep trying
* Try again
* Try again
* source activate?
* Force bash
* Source activate accelerate to make it get the env propelry
* try using nightly docker
* Try this?
* Try this?
* Try this, proper output
* Try this, proper output
* Try via full conda activate(?)
* rm conda
* te fp8 tests
* add ao
* ao in setup too
* actually include fp8 deps
* FP8 docker image, use newer version
* Update docker image to take in input
* Test
* prior month
* igpu?
* Use only last 2 digits of year
* Build rest
* Apply style fixes
---------
Co-authored-by: [[ -z $EMAIL ]] && read -e -p "Enter your email (for git configuration): " EMAIL <muellerzr@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Feat: initial conversion tool draft
* Feat: add value mapping to conversion tool
* Refactor: move from os to pathlib
* Feat: add first tests
* Feat: more tests
* Feat: minor fixes + dataclass conversions
* Feat: more remapping
* Fix: namespace has no attribute version + style
* Fix: offload params behavior
* Feat: add option to only rename keys in the config file to
* Fix: wrong attr name
* Fix: partially resolve comments
* Feat: work on config command + minor fixes to reflect changes
* Refactor: style + quality
* Feat: fsdp2 initial work
* Feat: some cleanups and first running fsdp2
* Fix: version checks + mixed precision policy
* Refactor: style + quality
* Remove obsolete todos
* Feat: grad norm clipping
* Fix: tests + rename attrs
* Refactor: style + quality
* Fix: None object is not iterable
* Fix: default cpu_offload for fsdp2
* Fix: cpu offload now behaves correctly
* Feat: apply_activation_checkpointing
* Fix: append to models
* Feat: start on concept guide
* wip: concept guide
* Fix: toctree
* cleanup of the concept guide
* Fix: minor fixes + mp
* Fix: quality + | to union
* Feat: backwards compatibility + args cleanup
* Fix: style + quality
* Feat: enable dropping refs when getting named params
* Fix: memory footprint with fsdp2
* Feat: cpu ram efficient loading
* Fix: mp
* Fix: not warn about sync_modules if fsdp version is 1
* Refactor: minor changes
* Small fixes + refactors
* Feat: docs + cleanup
* Feat: saving works (not sure about optim)
* More loading/saving work
* Feat: disable local_state_dict for fsdp2
* Fix: fsdp2 convergence
* Feat: working comparison script
* Feat: memory tracking fsdp2
* Feat: memory visualizer
* Feat: more work on benchmark
* Fix: raise error if model+optimizer arent prepared together
* Minor fixes
* Style
* More warnings
* Fix: reshard_after_forward vs sharding_strategy conflict
* Refactor: clean up accelerator
* Feat: more testing in fsdp2 benchmark
* Fix: memory visualizer
* Untested: support load/save_state
* Feat: concept guide improvements
* Refactor: concept guide
* Feat: benchmark works
* Feat: more work on fsdp2 benchmark
* Fix: note syntax
* Fix: small fixes + make original tests work
* Fix: grad scaling
* Feat: reshard after forward tests
* Feat: backward prefetch tests
* Feat: tests for fsdp2
* Refactor: minor fixes
* Feat: fsdp_utils docstrings
* Feat: autodoc fsdp.md
* Docs: get_module_children_bottom_up
* Fix: remove unused images
* Refactor: benchmark cleanup
* Fix: docs
* Feat: final doc changes
* Fix: torch.distributed has no attribute tensor
* Fix: style
* Feat: tests include version in failures
* Fix: benchmark force model to load in fp32
* Fix: rename runs
* Feat: last minor fixes
* Feat: new benchmark images
* Bookmark
* bookmark
* Add torchao base example
* Currently broken
* Clean
* DDP varient working
* FSDP as well
* Works for all but zero3
* Bookmark: currently zero3 is underperforming
* Bookmark
* Another diff
* Fin
* Fin
* Add req huggingface suite
* update tests for fp8/torchao/ddp
* Log FP8 backend used and adjust typing
* add documentation for convert_to_float8_training
* Rename to convert_model_to_fp8_ao
* Call superinit"
* Add types
* Clean
* Use filter_first_and_last_linear_layers
* Update usage guide docs
* Actually loop through the zero stages
* Clean
* MNT Upgrade ruff to 0.6.4
Currently used version, 0.2.1, is quite old at this point.
Not a lot needed to be changed:
- Change ruff version in setup.py
- Remove deprecated ignore-init-module-imports option for ruff
- Type comparison should use is and not ==
- Use f-string instead of % formatting
- Some line wrapping and empty lines
* Oops
* Working version rebased from main
* kwargs
* Clean
* Fix more nits
* Fin
* Delay autocast flag
* Enable FP8 autocast during eval only if specified
* Fin
* Rm comment
* All done
* Zero3 works!
* Let the wrapper come off during unwrap_model
* Add import check
* Migrate all to benchmarks folder and make TE import check work
* Add readme
* Add README to benchmarks folder
* Update CLI to now include fp8 args
* Add test config for 0_34
* Finish adding to config yaml
* Write docs
* Expound docs w/ FP8
* Add to toctree