Files
pytorch/test/quantization
Amadeusz Skrzypczak 107f944f22 Support fp8 quantization (#123161)
This commit enables float8_e5m2 and float8_e4m3fn dtypes in fx quantization and PT2E.

Motivation for using fp8 quantization instead of int8:
- it works better to run inference with the same datatype the model was trained with,
- fp8 can handle outliers better, which is one of the problems in LLMs activations.

The numerical recipe we want to use it for is fp8 inference:
- bgemms/gemms running in float8_e4m3fn,
- Per-Tensor-Quantization/Scaling,
- amax observer for measurement with input_backoff and weight_backoff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/123161
Approved by: https://github.com/jgong5, https://github.com/jerryzh168
2024-04-23 13:35:27 +00:00
..
2024-04-23 13:35:27 +00:00
2020-07-17 17:19:47 -07:00