* Bookmark
* bookmark
* Add torchao base example
* Currently broken
* Clean
* DDP varient working
* FSDP as well
* Works for all but zero3
* Bookmark: currently zero3 is underperforming
* Bookmark
* Another diff
* Fin
* Fin
* Add req huggingface suite
* update tests for fp8/torchao/ddp
* Log FP8 backend used and adjust typing
* add documentation for convert_to_float8_training
* Rename to convert_model_to_fp8_ao
* Call superinit"
* Add types
* Clean
* Use filter_first_and_last_linear_layers
* Update usage guide docs
* Actually loop through the zero stages
* Clean