Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
Proposal of two float8 variants - e5m2 and e4m3 - based on https://arxiv.org/pdf/2209.05433.pdf
Hide all Float8 operator implementations behind `#if !defined(C10_MOBILE)` guard to keep Android build size almost unchanged
TODO:
- Refactor duplicated code
- Cleanup unbalanced pragma pop in dtype utils
- Add native implementation on the CUDA size
Co-authored-by: Nikita Shulga <nshulga@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104242
Approved by: https://github.com/albanD
With ufmt in place https://github.com/pytorch/pytorch/pull/81157, we can now use it to gradually format all files. I'm breaking this down into multiple smaller batches to avoid too many merge conflicts later on.
This batch (as copied from the current BLACK linter config):
* `tools/**/*.py`
Upcoming batchs:
* `torchgen/**/*.py`
* `torch/package/**/*.py`
* `torch/onnx/**/*.py`
* `torch/_refs/**/*.py`
* `torch/_prims/**/*.py`
* `torch/_meta_registrations.py`
* `torch/_decomp/**/*.py`
* `test/onnx/**/*.py`
Once they are all formatted, BLACK linter will be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81285
Approved by: https://github.com/suo
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71114
`include-what-you-use` or `iwyu` is a clang-based tool that looks at
the code's AST to figure out which symbols need to be included and
with the help of user-defined mappings it suggests the include
files that are actually needed.
This is very nice for the per-operator headers build because it give
you a list of exactly the `ATen/ops` headers needed by the file. You
still need to manually write the include-guards etc. but at least this
automates the most tedious part.
The header mappings aren't perfect yet so it will still suggest you
include basic c10 components everywhere instead of taking it
transitively from `TensorBase.h`. However, this does provide some
useful mappings and removes bad include paths from the build system
that were causing bad suggestions.
Test Plan: Imported from OSS
Reviewed By: ngimel
Differential Revision: D33949901
Pulled By: malfet
fbshipit-source-id: d5b015ef9e168bee4b8717b8e87ccc0608da62a1
(cherry picked from commit ecb2ffb35a5b1509a1275834fbe5c25e60ea1b79)