This PR introduces 3 BC changes:
First, this PR propagates `BUILD_CAFFE2` flag to `libtorch` and `libtorch_python`, which is necessary for non-caffe2 ONNX runtimes when using `ONNX_ATEN_FALLBACK` operator export type.
Second, as a complement of https://github.com/pytorch/pytorch/pull/68490, this PR refactors Caffe2's Aten ops symbolics to consider not only the `operator_export_type` (aka `ONNX_ATEN_FALLBACK`) to emit Caffe2 Aten ops, but also whether `BUILD_CAFFE2` (which is called `torch.onnx._CAFFE2_ATEN_FALLBACK` in python binding) is set.
Lastly, it renames `onnx::ATen` to `aten::ATen` for ONNX spec consistency in a BC fashion.
ONNX doesn't have `ATen` op on its spec, but PyTorch ONNX converter emits them. Non-Caffe2 backend engines would be mislead by such operator's name/domain. A non-ideal workaround would be to have Aten ops handled based on its name and ignore the (non-complaint) domain. Moreover, users could incorrectly file bugs to either ONNX or ONNX Runtime when they inspect the model and notice the presence of an unspecified ONNX operator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73954
Approved by: https://github.com/BowenBao, https://github.com/malfet, https://github.com/garymm, https://github.com/jiafatom
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74507
* This is the default symmetric qat qconfigs for qnnpack.
* Support for symmetric quantization is not available from other backends.
* Observers are similar to symmetric PTQ qconfigs for qnnpack.
Reviewed By: jerryzh168
Differential Revision: D34804808
fbshipit-source-id: 22c11b89242a98f54029ac195f7b984e42809164
(cherry picked from commit ea751ded1174ba2c2f061bafc81573faaf248a9a)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73863
This PR fully aligns the convert function with the design: https://github.com/pytorch/rfcs/blob/master/RFC-0019-Extending-PyTorch-Quantization-to-Custom-Backends.md
and simplifies the implementation of convert function by always produce a reference quantized model (with reference patterns) first,
and then lower the model to a quantized model that is runnable with PyTorch native backend (fbgemm/qnnpack).
This PR makes the convert.py much easier to understand than the previous implementation, and we are able to remove majority of code
in quantization_patterns.py as well (in followup PRs).
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
and other internal/oss regression tests
Imported from OSS
Reviewed By: andrewor14
Differential Revision: D34778506
fbshipit-source-id: 0678b66addf736039a8749b352f6f569caca962b
(cherry picked from commit 33ec9caf23f3ab373d827117efbd9db0668b2437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/71427
This commit adds a lowering path for the LinearReLU modules
in static quantization mode. This includes torch.nn.qat.Linear,
torch.nn.intrinsic.LinearReLU, and torch.nn.intrinsic.qat.LinearReLU.
Future commits will add support for dynamic quantization and functional
LinearReLU.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_module
Imported from OSS
Reviewed By: george-qi
Differential Revision: D33694742
fbshipit-source-id: 19af11f82b1ad8ade0c307498971c29a3f776036
(cherry picked from commit b3f607de439f2ba7c0a03ad1ac494127685cbf4e)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70009
Currently we rely on module.training to decide whether we'll do a qat fusion or ptq fusion, this is
not ideal since training flag has nothing to do with quantization, this PR introduces an extra flag `is_qat`
to control this
Note: currently we still has the constraint that when `is_qat` is True, the modules must be in training mode, we
can relax this constraint later
Test Plan:
```
python test/test_quantization.py TestFuseFx
python test/test_quantization.py TestFusion
```
Imported from OSS
**Static Docs Preview: classyvision**
|[Full Site](https://our.intern.facebook.com/intern/staticdocs/eph/D33178977/V36/classyvision/)|
|**Modified Pages**|
Reviewed By: mruberry
Differential Revision: D33178977
fbshipit-source-id: 0c1499c45526971140d9ad58e2994d1edf5ad770
(cherry picked from commit 2d51f9fb28967f1c5aab260d84b8d32d838f4f26)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70022
Add support for fusing ConvTranpose{1,2,3}d with BatchNorm{1,2,3}d. This re-uses the existing fusion logic but adds a "transpose" flag to the fusing function which when enabled will use the appropriate reshape for ConTranspose's transposed weights.
Test Plan: `buck test mode/dev //caffe2/test:quantization -- -r quantization.eager.test_fusion.TestFusion`
Reviewed By: jerryzh168
Differential Revision: D33074405
fbshipit-source-id: 5e9eff1a06d8f98d117e7d18e80da8e842e973b7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69864
att, will have a follow up PR that removes QConfigDynamic in the api
Test Plan:
regression tests
```
python test/test_quantization.py TestPostTrainingStatic
python test/test_quantization.py TestPostTrainingDynamic
python test/test_quantization.py TestQuantizeFx
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D33073235
fbshipit-source-id: 6c1a1647032453803c55cdad7c04154502f085db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/69644
This PR cleans up the init of ModuleReLUFuseHandler and moved all `module - relu`
fusion pattern to use this handler
also disabled additional_fuser_method argument temporarily, will enable
after we bring back the simple pattern format
Test Plan:
```
python test/test_quantize_fx.py TestFuseFx
```
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D32974906
fbshipit-source-id: 23483ea4293d569cb3cec6dadfefd4d9f30921a7
Summary:
**Summary:** This commit adds the `torch.nn.qat.dynamic.modules.Linear`
module, the dynamic counterpart to `torch.nn.qat.modules.Linear`.
Functionally these are very similar, except the dynamic version
expects a memoryless observer and is converted into a dynamically
quantized module before inference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67325
Test Plan:
`python3 test/test_quantization.py TestQuantizationAwareTraining.test_dynamic_qat_linear`
**Reviewers:** Charles David Hernandez, Jerry Zhang
**Subscribers:** Charles David Hernandez, Supriya Rao, Yining Lu
**Tasks:** 99696812
**Tags:** pytorch
Reviewed By: malfet, jerryzh168
Differential Revision: D32178739
Pulled By: andrewor14
fbshipit-source-id: 5051bdd7e06071a011e4e7d9cc7769db8d38fd73
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/65674
Before this PR user had to use the eager mode static quantization APIs to quantize Embedding/EmbeddingBag modules.
With this PR they can use either the static or dynamic quantization APIs for Embedding quantization
The only qconfig supported for embedding quantization is float_qparams_weight_only_qconfig whcih is currently enforced in the from_float
method of the quantized Embedding/Embedding modules.
To combine embedding quantization with Linear dynamic quantization, user can use the qconfig_dict to specify different qconfig for each module type.
The prepare/convert APIs can still be used to quantize Embeddings, with the caveat that user need to ensure input to Embedding ops are FP32.
Addresses Issue #65185
ghstack-source-id: 139935419
Test Plan:
python test/test_quantization.py
Imported from OSS
Reviewed By: gchanan
Differential Revision: D31211199
fbshipit-source-id: 8c747881caee5ccbf8b93c6704b08d132049dea4
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64916
AO Team is migrating the existing torch.quantization into torch.ao.quantization. We are doing it one file at a time to make sure that the internal callsites are updated properly.
This migrates the quant_type.py from torch.quantization to torch.ao.quantization.
At this point both locations will be supported. Eventually the torch.quantization will be deprecated.
Test Plan: `buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization`
Reviewed By: vkuzo
Differential Revision: D30898422
fbshipit-source-id: 3e6126b49f0565a4136d6928cea9eb25368927ff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63799
Add a new module that can be used for module swap with the nni.LinearReLU module in convert function.
Supports INT8 currently (since FP16 op doesn't have relu fusion yet).
Fixes#55393
Test Plan:
python test/test_quantization.py test_dynamic_fusion
Imported from OSS
Reviewed By: heitorschueroff
Differential Revision: D30502812
fbshipit-source-id: 3668e4f001a0626d469e17ac323acf582ee28a51
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62487
checkGraphModeFxOp is our utility test function to quantize a given model with FX Graph Mode Quantization
and checks whether the result model contains expected ops, previously it only returns a result on the sample data for the
quantized model, this PR chagnes it to return prepared, quantized, quantized_reference models together with the result
for quantized models.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: iramazanli
Differential Revision: D30053981
fbshipit-source-id: 31fbce48d138261d0b00ba24e1427fd0c6208990
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62277
This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Imported from OSS
Reviewed By: ejguan
Differential Revision: D29941079
fbshipit-source-id: 84bdfc0bb872c34fc345875e545c8b323e77c41e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61892
This PR changes is_reference=True for linear to produce a pattern consists of dequant - float linear - quant instead of reference linear module, this is useful for future transformations to custom backends, it is also helpful to simplify the implementation for
convert in the future.
Test Plan:
python test/test_quantization.py TestQuantizeFxOps
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29810657
fbshipit-source-id: 949615bbc017bc454d81c8a6b2bdec53badaab19
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62041
Before this PR, weights of conv and linear modules were extracted
as lists, in order to match the signature of LSTM weights.
After this PR, weight extraction preserves the type of the weights,
so extracted weights of conv and linear have a different type
from LSTM weights. The comparison util functions are updated to
handle the LSTM weight type of `List[tensor]`.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D29853626
fbshipit-source-id: 93da5b9b0b174679c61528d02b6b902cb064444e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60476
# Context
Add tests for Lite modules that are quantized using fx API
Read this posts for details about why we need a test bench for quantized lite modules
https://fb.workplace.com/groups/2322282031156145/permalink/4289792691071726/https://github.com/pytorch/pytorch/pull/60226#discussion_r654615851
moved common code to `caffe2/torch/testing/_internal/common_quantization.py`
ghstack-source-id: 133144292
Test Plan:
```
~/fbsource/fbcode] buck test caffe2/test:fx_quantization_lite
Downloaded 0/2 artifacts, 0.00 bytes, 100.0% cache miss
Building: finished in 8.3 sec (100%) 11892/11892 jobs, 2 updated
Total time: 8.6 sec
More details at https://www.internalfb.com/intern/buck/build/ffb7d517-d85e-4c8f-9531-5e5d9ca1d34c
Tpx test run coordinator for Facebook. See https://fburl.com/tpx for details.
Running with tpx session id: d79a5713-bd29-4bbf-ae76-33a413869a09
Trace available for this run at /tmp/tpx-20210630-105547.675980/trace.log
Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707
✓ ListingSuccess: caffe2/test:fx_quantization_lite - main (9.423)
✓ Pass: caffe2/test:fx_quantization_lite - test_embedding (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (10.630)
✓ Pass: caffe2/test:fx_quantization_lite - test_submodule (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.464)
✓ Pass: caffe2/test:fx_quantization_lite - test_conv2d (mobile.test_quantize_fx_lite_script_module.TestFuseFx) (12.728)
Summary
Pass: 3
ListingSuccess: 1
If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users
Finished test run: https://www.internalfb.com/intern/testinfra/testrun/3096224749578707
```
Reviewed By: iseeyuan
Differential Revision: D29306402
fbshipit-source-id: aa481e0f696b7e9b04b9dcc6516e8a390f7dc1be
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60779
When we do fusion, we replace certain modules (such as Linear + ReLU) with fused versions (such as LinearReLU) by calling `_fuse_fx` in prepare_fx. However when we try to look up using the fused module type in qconfig_dict, we cannot find a match anymore since the qconfig dict contains the original module types. An example is here [N882873](https://fburl.com/anp/azenjx3v).
So we will now update the qconfig_dict to include the fused modules mapping to the qconfigs used for the modules that make up the fused modules. If the modules are not mapped to the same qconfig, then we will raise an error.
Test Plan:
`python test/test_quantization.py TestFuseFx.test_qconfig_fused_module`
Imported from OSS
Reviewed By: supriyar
Differential Revision: D29406941
fbshipit-source-id: 74b5db89f4998aeb02b2bf7c37bf97326580c654
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60378
Created the following unit-tests to check that our equalization algorithm is as expected:
- Check the equalization scales calculated and stored in the graph are as expected
- Check the scaled weights and biases are as expected
- Check that the min/max values in the quantization observers are as expected
- Check that the graphs with equalization are structured in the same way as graphs without equalization (except that equalized graphs have additional equalization scale and mul nodes) before and after quantization
Test Plan:
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_equalization_scales`
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_weights_bias`
`python test/test_quantization TestEqualizeFx.test_input_activation_values`
`python test/test_quantization TestEqualizeFx.test_input_weight_equalization_graphs`
Imported from OSS
Reviewed By: supriyar
Differential Revision: D29406942
fbshipit-source-id: 518208546ae5835c1ebb2af217507e90af66fbe4
Summary:
Fixes https://github.com/pytorch/pytorch/issues/45687
Fix changes the input size check for `InstanceNorm*d` to be more restrictive and correctly reject sizes with only a single spatial element, regardless of batch size, to avoid infinite variance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56659
Reviewed By: pbelevich
Differential Revision: D27948060
Pulled By: jbschlosser
fbshipit-source-id: 21cfea391a609c0774568b89fd241efea72516bb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54813
Previously we have a cat that takes a list of Tensors with different qparams and dequantize them
cacatenate them and requantize with the output qparams. This adds some unnecessary overhead in dequantizing
and quantizing Tensors.
This PR adds an optimization for cat operator, we'll make sure inputs and output of cat
uses same observer/fake_quant and produce a cat that does not do rescaling.
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D27408377
fbshipit-source-id: 6a4bdcfd15e57ea1fe0f7e72d1e1288eb3ece4db
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56194
Enables the NS graph matcher to also match `call_method` nodes.
These are useful for ops such as `torch.sigmoid`.
Test Plan:
```
python test/test_quantization.py TestFXGraphMatcher.test_methods
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27805333
fbshipit-source-id: 509ae283db6b245671f11e3eb6b7fcb3a5735ef5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54253
Creates an `NSSubgraph` type for representing a subgraph instance,
and modifies the NS code to use it. This will enable us to add
more information to the subgraph instance definition without
having to change all the callsites.
Test Plan:
```
mypy torch/quantization
python test/test_quantization.py TestFXGraphMatcher
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D27158198
fbshipit-source-id: 548785dd90144e2da256c23af990620c778e7cfe
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53779
Moves the test case for LSTM activation matching to new NS APIs.
This requires adding the ability to log non-Tensor types.
Since we need Loggers to be scriptable and TorchScript does
not support `Union`, we collect statistics in a separate collector
if we have an RNN. Note: this can scale to a small N of
return types, but not to a large N. If the N becomes large in
the future, we will solve it then.
Test Plan:
```
python test/test_quantization.py TestFXNumericSuiteCoreAPIsModels
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D26967110
fbshipit-source-id: afe60b44fdec28a328813b4f342cf4fe04820baa
Summary:
This PR implements the option to log inputs for FX Numeric Suite. The user facing api looks like
```
def prepare_model_outputs(..., should_log_inputs : bool = False)
def prepare_model_with_stubs(..., should_log_inputs : bool = False)
```
The output data now looks like
```
{
"layer1": {
"node_inputs": {
"model1": [{
"values": ...,
...,
}],
},
"node_outputs": {
...,
}
},
... // other layers
}
```
One key design decision taken here is that an input logger logs the output of previous nodes, instead of logging the input of the current node. This matters for a signature such as `cat([x1, x2, x3])`. We are inserting three input loggers here (for x1, x2, and x3), instead of a single input logger for `[x1, x2, x3]`. This was chosen in order to preserve the structure of the original graph as much as possible and keep flexibility for future optimizations.
Test Plan:
TODO: fill out
Imported from OSS
Differential Revision: D26931225
Reviewed By: hx89
Pulled By: vkuzo
fbshipit-source-id: dd692bfb5ddaaf5554f80c25e2f40b21762e4fc3
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52534
Currently linear_dynamic_fp16 has a signature that's tied to fbgemm/qnnpack
We'll need to produce a pattern equivalent to linear_dynamic_fp16 to support extensions
to other backends
Test Plan:
python test/test_quantization.py TestQuantizeFxOps.test_linear_dynamic_fp16
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D26557726
fbshipit-source-id: 270c9f781f73c79416a092b7831294cabca84b0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52779
1. makes the return type of the weight comparison APIs match the return
type of the activation comparison APIs:
```
# before
{layer_name: {model_name: weight_tensor}}
{layer_name: {model_name: [activation_tensor]}}
# after
{layer_name: {model_name: [weight_tensor]}}
{layer_name: {model_name: [activation_tensor]}}
```
2. makes a type alias for the type, so future changes are easier
Test Plan:
```
mypy torch/quantization
python test/test_quantization.py TestFXNumericSuiteCoreAPIs
```
Imported from OSS
Reviewed By: hx89
Differential Revision: D26652639
fbshipit-source-id: eb1f04d6913cedf88d628f362468875ae9ced928
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52179
Rename debug to reference. We'll use this to produce a reference quantized model
that can be used as a common interface between pytorch quantized model and backends.
Test Plan:
python test/test_quantization.py TestQuantizeFx
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D26424656
fbshipit-source-id: a0299b023f6ba7d98f5750724c517b0ecb987b35
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52130
We have patterns like (F.linear, F.relu) which need to match
to (toq.linear_relu). So, we need to match subgraphs.
This PR does the following:
* defines a "subgraph" as (start_node, end_node). The current assumption
is that subgraphs are simple, there is always a path from start_node to
end_node, and we can ignore any non-input args/kwargs of these nodes
for the purposes of matching and copying things. An example one node
subgraph is (F.linear, F.linear). An example two node subgraph
is (F.linear, F.relu).
* changes the matching logic to iterate over subgraphs instead of nodes
* changes the NS core APIs to use subgraph pairs instead of node pairs:
1. for weights, we match on the start node
2. for unshadowed activations, we observe the end nodes
3. for shadowed activations, we copy the subgraph of a to graph c
TODO(before review) write up better, not ready for review yet
Test Plan:
TODO before land: better test plan
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D26403092
fbshipit-source-id: e49aaad4b02b8d60589435848bee422b8f41937a