* Automatically applies ruff rule 401. Turns loops into equivalent list comprehensions which are faster and do not leak the scope of the loop variables.
* list comprehensions not only often have better typing, but are 50+% faster than for loops on overhead. They also preserve length information etc and are better for the interpreter to optimize.
* Manually went back and made mypy happy after the change.
* Also fixed style lints in files covered by flake8 but not by pyfmt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/140980
Approved by: https://github.com/justinchuby, https://github.com/malfet
This is a lot of files changed! Don't panic! Here's how it works:
* Previously, we set `follow_imports = silent` for our mypy.ini configuration. Per https://mypy.readthedocs.io/en/stable/running_mypy.html#follow-imports, what this does is whenever we have an import to a module which is not listed as a file to be typechecked in mypy, we typecheck it as normal but suppress all errors that occurred in that file.
* When mypy is run inside lintrunner, the list of files is precisely the files covered by the glob in lintrunner.toml, but with files in excludes excluded.
* The top-level directive `# mypy: ignore-errors` instructs mypy to typecheck the file as normal, but ignore all errors.
* Therefore, it should be equivalent to set `follow_imports = normal`, if we put `# mypy: ignore-errors` on all files that were previously excluded from the file list.
* Having done this, we can remove the exclude list from .lintrunner.toml, since excluding a file from typechecking is baked into the files themselves.
* torch/_dynamo and torch/_inductor were previously in the exclude list, because they were covered by MYPYINDUCTOR. It is not OK to mark these as `# mypy: ignore-errors` as this will impede typechecking on the alternate configuration. So they are temporarily being checked twice, but I am suppressing the errors in these files as the configurations are not quite the same. I plan to unify the configurations so this is only a temporary state.
* There were some straggler type errors after these changes somehow, so I fixed them as needed. There weren't that many.
In the future, to start type checking a file, just remove the ignore-errors directive from the top of the file.
The codemod was done with this script authored by GPT-4:
```
import glob
exclude_patterns = [
...
]
for pattern in exclude_patterns:
for filepath in glob.glob(pattern, recursive=True):
if filepath.endswith('.py'):
with open(filepath, 'r+') as f:
content = f.read()
f.seek(0, 0)
f.write('# mypy: ignore-errors\n\n' + content)
```
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/118414
Approved by: https://github.com/thiagocrepaldi, https://github.com/albanD
Summary:
This PR adds a new quantization backend, ONEDNN, with quantized conv and linear kernels in the same code path as the FBGEMM backend
The ONEDNN backend is an alternative of FBGEMM and QNNPACK backends. It takes advantage of features of the latest Intel® CPU products. It supports VNNI on Cascade Lake and the AMX instruction set to be available on Sapphire Rapids which has 8X int8 peak TOPS over VNNI.
ONEDNN demonstrates better performance on conv kernels of popular CNN models than FBGEMM. It also supports more fused ops, such as convolution-add-ReLU, than FBGEMM and QNNPACK.
To use this backend, users only need to set the quantization backend to 'onednn' before any calculation without a single change to models.
```python
torch.backends.quantized.engine = 'onednn'
```
## Design docs
https://github.com/pytorch/pytorch/issues/21120#issuecomment-562371983https://github.com/pytorch/pytorch/pull/67177#issuecomment-963787096
## File changes
**Add ONEDNN to qengine list**
- aten/src/ATen/Context.cpp
- c10/core/QEngine.h
- torch/ao/quantization/qconfig.py
- torch/backends/quantized/\_\_init\_\_.py
**Implement qconv & qlinear for ONEDNN backend**
- aten/src/ATen/native/quantized/cpu/conv_serialization.h
- aten/src/ATen/native/quantized/cpu/fbgemm_utils.cpp
- aten/src/ATen/native/quantized/cpu/onednn_utils.h
- aten/src/ATen/native/quantized/cpu/qconv.cpp
- aten/src/ATen/native/quantized/cpu/qconv_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qconv_unpack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
- aten/src/ATen/native/quantized/cpu/qlinear_unpack.cpp
**Skip tests that are not supported by ONEDNN**
- test/ao/sparsity/test_kernels.py
- test/quantization/core/test_quantized_module.py
- test/quantization/core/test_quantized_op.py
## Validation results
This PR has passed `test_quantization.py` and `test_mkldnn.py`.
Below are performance data of int8 2d convolution and linear on the Cascade Lake Xeon® platform:
(Note: Tested with single instance on single core. Using the latest oneDNN library.)
**Table 1. Performance comparison of int8 2d convolution operator**
|No.| Shape| FBGEMM| ONEDNN| Gain|
|-|-|-|-|-|
|1| IC=128, OC=128, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 668.310us| 535.630us| 24.8%|
|2| IC=128, OC=128, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 290.630us| 281.810us| 3.1%|
|3| IC=128, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.045ms| 893.010us| 17.0%|
|4| IC=128, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 385.320us| 373.720us| 3.1%|
|5| IC=256, OC=256, kernel=3, stride=1, N=4, H=32, W=32, G=1, pad=0| 1.876ms| 1.641ms| 14.3%|
|6| IC=256, OC=256, kernel=3, stride=2, N=4, H=32, W=32, G=1, pad=0| 660.460us| 638.470us| 3.4%|
**Table 2. Performance comparison of int8 linear operator**
|No.| Shape (m, n, k)| FBGEMM| ONEDNN| Gap|
|-|-|-|-|-|
|1| 64, 800, 320| 80.550us| 96.770us| 20.10%|
|2| 64, 768, 512| 101.230us| 130.720us| 29.10%|
|3| 16, 256, 512| 30.230us| 51.450us| 70.20%|
|4| 128, 128, 128| 33.810us| 50.480us| 49.30%|
|5| 256, 512, 256| 154.490us| 195.050us| 26.30%|
|6| 1024, 1024, 1024| 3.134ms| 3.514ms| 12.10%|
ONEDNN showed advantages over FBGEMM for convolution. However, it has performance gap to FBGEMM for Linear ops. The gap is a known issue and further optimization is in progress in the oneDNN library. On the latest platforms, better performance of ONEDNN is achieved for both conv and linear.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69820
Reviewed By: HDCharles
Differential Revision: D33716039
Pulled By: jerryzh168
fbshipit-source-id: 6f7bb807e85798142dfcffccfca8b8bd652fb3dd
(cherry picked from commit 91526b373560f42ba0ad307f9cccfc0eb5218b1f)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70106
Some of quantization tests had log spew like
```
UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
```
This PR cleans up the root cause from the utils. Some other
tests may still hit this warning from other places
Test Plan:
```
python test/test_quantization.py TestFakeQuantizeOps
```
this particular warning no longer appears
Reviewed By: soulitzer
Differential Revision: D33187925
Pulled By: vkuzo
fbshipit-source-id: bd1acd77fd72a10dad0c254f9f9f32e513c8a89a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58963
some tests are used to check the op level numerics of the fake quantize operations
Test Plan:
python test/test_quantization.py
Imported from OSS
Reviewed By: HDCharles
Differential Revision: D28696599
fbshipit-source-id: 98f9b0c993dd43050176125461ddd5288142989b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52377
Add QNNPACK specific packed params for sparse linear.
Add sparse linear dynamic op with appropriate registration.
Add python side LinearDynamic module for sparsity.
Add tests to validate sparse linear qnnpack kernels.
Note that since these test are mostly run on x86 platform and
given that 1x4 sparse kernels are implemented both in sse and arm,
LinearDynamic at the moment defaults to 1x4 pattern.
Plan is to add another diff that will allow a global override for 8x1 pattern
such that prepare/convert flow can work for exporting model for mobile.
Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test
Reviewed By: z-a-f
Differential Revision: D26491944
fbshipit-source-id: b98839b4c62664e1fabbb0cbeb2e5c1bd5903b4d
Summary:
Add QNNPACK specific packed params for sparse linear.
Add sparse linear dynamic op with appropriate registration.
Add python side LinearDynamic module for sparsity.
Add tests to validate sparse linear qnnpack kernels.
Note that since these test are mostly run on x86 platform and
given that 1x4 sparse kernels are implemented both in sse and arm,
LinearDynamic at the moment defaults to 1x4 pattern.
Plan is to add another diff that will allow a global override for 8x1 pattern
such that prepare/convert flow can work for exporting model for mobile.
Test Plan: buck run caffe2/torch/fb/model_optimization:sparsity_test
Reviewed By: z-a-f
Differential Revision: D26263480
fbshipit-source-id: 04ab60aec624d1ecce8cfb38b79c7e94f501cdf6
Summary:
This is causing type hint test errors on the latest numpy:
```
torch/testing/_internal/common_quantized.py:38: error: Module has no attribute "float"; maybe "float_", "cfloat", or "float64"? [attr-defined]
torch/testing/_internal/common_methods_invocations.py:758: error: Module has no attribute "bool"; maybe "bool_" or "bool8"? [attr-defined]
```
Runtime-wise, there's also a deprecation warning:
```
__main__:1: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
```
Fixes #{issue number}
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52103
Reviewed By: suo
Differential Revision: D26401210
Pulled By: albanD
fbshipit-source-id: a7cc12ca402c6645473c98cfc82caccf161160c9
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49671
- Introduces the `torch.nn.quantizable` namespace
- Adds the `torch.nn.quantizable.LSTM` module
The point of the `quantizable` namespace is to segregate the purely quantized modules with the modules that could be quantized through a normal quantization flow, but are not using the quantized kernels explicitly.
That means the quantizable modules are functionally and numerically equivalent to the FP ones and can be used instead of the FP ones without any loss.
The main difference between the `torch.nn.LSTM` and the `torch.nn.quantizable.LSTM` is that the former one does not support observation for the linear layers, because all the computation is internal to the `aten` namespace.
The `torch.nn.quantizable.LSTM`, however, uses explicit linear layers that can be observed for further quantization.
Test Plan: Imported from OSS
Differential Revision: D25663870
Reviewed By: vkuzo
Pulled By: z-a-f
fbshipit-source-id: 70ff5463bd759b9a7922571a5712d3409dfdfa06
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39677
Test Plan:
Moved a test class suite between files, wanted to have same functionality (simple code refactor) so tested to make sure the test output was the same before/after the refactor.
Image below shows the output of TestGraphModePostTrainingStatic before refactor
{F239676498}
This image shows the output of TestQuantizeScript (renamed version that is in test_quantize_script.py instead of test_quantize.py)
{F239676509}
Differential Revision: D21940638
Pulled By: edmundw314
fbshipit-source-id: 54160a5151aadf3a34bdac2bcaeb52904e6653ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39041
reduce_range option restricts the activation tensor to 7 bits instead of 8.
This is necessary to enable per channel quant for RNNs and LSTMs
Test Plan:
python test/test_quantization.py TestDynamicQuantizedLinear
Imported from OSS
Reviewed By: akinh
Differential Revision: D21769691
fbshipit-source-id: ef0e9873367f3c1b34091b0b3af788233ef60c6c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32479
Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator
Test Plan:
python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear
Imported from OSS
Differential Revision: D19542980
fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445
Create distributed and rpc directories under caffe/test for better management
of unit tests.
Differential Revision: D18702786
fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606