mirror of
https://github.com/pytorch/pytorch.git
synced 2025-10-20 21:14:14 +08:00
Summary: Add fp8. Right now FP8 only allows fast_accum. Test Plan: ``` Experiment group: _scaled_mm (8192x8192, 8192x8192) torch.float8_e4m3fn +-----------------------+--------------------+--------------------+----------------------+--------------------+ | name | forward_time (us) | teraflops (TFLOPS) | compilation_time (s) | perf_over_aten (%) | +-----------------------+--------------------+--------------------+----------------------+--------------------+ | aten | 967.1226739883423 | 1136.8895149998868 | 1.219131228979677 | NA | | triton | 1764.6185159683228 | 623.08743664783 | 20.373826419003308 | 82.46067054670186 | | triton_persistent_tma | 1769.0335512161255 | 621.5323768280928 | 20.48663099599071 | 82.91718297956578 | | cutlass_lvl_default | 790.5075550079346 | 1390.8932568835019 | 13.788519630907103 | -18.26191482535096 | | cutlass_lvl_3332 | 803.7384748458862 | 1367.996757884245 | 226.81587297911756 | -16.89384434227684 | +-----------------------+--------------------+--------------------+----------------------+--------------------+ ``` Rollback Plan: Differential Revision: D76310809 Pull Request resolved: https://github.com/pytorch/pytorch/pull/155507 Approved by: https://github.com/ColinPeppler