Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637
The previous naming convention `default_affine_fixed_qparams_observer`
and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read
the definition in order to understand what these observers are. The new
naming convention reveals information about the range of the observers
The analogous changes were also made for
`default_symmetric_fixed_qparams_fake_quant` and
`default_affine_fixed_qparams_fake_quant`
Test Plan:
```
python test/test_quantization.py
```
```
python test/test_quantization.py
```
Differential Revision:
D36054169
D36054169
Reviewed By: vkuzo
Pulled By: dzdang
fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9
(cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814
1. move the file
```
hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/
```
2. create a new file in the old location and copy the imports
3. fix all callsites inside `torch`
Test Plan:
```
buck test mode/dev //caffe2/test:quantization
```
Reviewed By: z-a-f
Differential Revision: D30866792
fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043
In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward.
Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1.
The state_dict will still have the same attribute for the observer/fake_quant modules.
There may be some numerics difference between the old observer code in observer.py and the new fused module that was
re-written in C++/CUDA to perform observe + fake_quantize.
This PR also updates the test to check for the new module instead of the default FakeQuantize module.
Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that.
Test Plan:
python test/test_quantization.py TestSerialization.test_default_qat_qconfig
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D30232222
fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62702
Expose the qconfig to the user to speed up training by leveraging the fused module.
The module currently supports per-tensor/per-channel moving avg observer and fake-quantize.
For details on perf benefits, refer to https://github.com/pytorch/pytorch/pull/61691
Test Plan: Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D30093719
fbshipit-source-id: b78deb7810f5b597474b9b9a0395d361d04eb46a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863
To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max
Test Plan:
python test/test_quantization.py test_fused_mod_reduce_range
Imported from OSS
Reviewed By: raghuramank100
Differential Revision: D30146602
fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62346
Update the operator code to resize the min/max tensors if per-channel quant is selected. We need to do this because by default the observer creates empty tensors for min/max and scale/zero_point values when per-channel quantization is enabled
Test Plan:
python test/test_quantization.py test_fused_mod_per_channel
Imported from OSS
Reviewed By: HDCharles
Differential Revision: D30003835
fbshipit-source-id: b5ec80261cb50ee543f21191a887e979dcde4667
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386
During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`
For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D29271905
fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51265
This PR is the cleanup after #51159. High level, we make the new
definition of fake_quant per channel be the definition used by autograd, but keep the old
function around as a thin wrapper to keep the user facing API the same.
In detail:
1. point fake_quantize_per_channel_affine's implementation to be fake_quantize_per_channel_affine_cachemask
2. delete the fake_quantize_per_channel_affine backward, autograd will automatically use the cachemask backward
3. delete all the fake_quantize_per_channel_affine kernels, since they are no longer used by anything
Test Plan:
```
python test/test_quantization.py TestFakeQuantize
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D26120957
fbshipit-source-id: 264426435fabd925decf6d1f0aa79275977ea29b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51255
This is the same as #50561, but for per-channel fake_quant.
TODO before land write up better
Memory and performance impact (MobileNetV2): TODO
Performance impact (microbenchmarks): https://gist.github.com/vkuzo/fbe1968d2bbb79b3f6dd776309fbcffc
* forward pass on cpu: 512ms -> 750ms (+46%)
* forward pass on cuda: 99ms -> 128ms (+30%)
* note: the overall performance impact to training jobs should be minimal, because this is used for weights, and relative importance of fq is dominated by fq'ing the activations
* note: we can optimize the perf in a future PR by reading once and writing twice
Test Plan:
```
python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cpu
python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cuda
python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cpu
python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cuda
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D26117721
fbshipit-source-id: 798b59316dff8188a1d0948e69adf9e5509e414c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51159
This PR is the cleanup after #50561. High level, we make the new
definition of fake_quant be the definition used by autograd, but keep the old
function around as a thin wrapper to keep the user facing API the same.
In detail:
1. point `fake_quantize_per_tensor_affine`'s implementation to be `fake_quantize_per_tensor_affine_cachemask`
2. delete the `fake_quantize_per_tensor_affine` backward, autograd will automatically use the cachemask backward
3. delete all the `fake_quantize_per_tensor_affine` kernels, since they are no longer used by anything
Test Plan:
```
python test/test_quantization.py TestFakeQuantize
```
performance testing was done in the previous PR.
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D26090869
fbshipit-source-id: fda042881f77a993a9d15dafabea7cfaf9dc7c9c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50561
Not for review yet, a bunch of TODOs need finalizing.
tl;dr; add an alternative implementation of `fake_quantize` which saves
a ask during the forward pass and uses it to calculate the backward.
There are two benefits:
1. the backward function no longer needs the input Tensor, and it can be
gc'ed earlier by autograd. On MobileNetV2, this reduces QAT overhead
by ~15% (TODO: link, and absolute numbers). We add an additional mask Tensor
to pass around, but its size is 4x smaller than the input tensor. A
future optimization would be to pack the mask bitwise and unpack in the
backward.
2. the computation of `qval` can be done only once in the forward and
reused in the backward. No perf change observed, TODO verify with better
matrics.
TODO: describe in more detail
Test Plan:
OSS / torchvision / MobileNetV2
```
python references/classification/train_quantization.py
--print-freq 1
--data-path /data/local/packages/ai-group.imagenet-256-smallest-side/prod/
--output-dir ~/nfs/pytorch_vision_tests/
--backend qnnpack
--epochs 5
TODO paste results here
```
TODO more
Imported from OSS
Reviewed By: ngimel
Differential Revision: D25918519
fbshipit-source-id: ec544ca063f984de0f765bf833f205c99d6c18b6
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50868
Ensures that `FakeQuantize` respects device affinity when loading from
state_dict, and knows how to resize scale and zero_point values
(which is necessary for FQ classes wrapping per channel observers).
This is same as https://github.com/pytorch/pytorch/pull/44537, but for
`FakeQuantize`.
Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D25991570
fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47423
Since the dtype of this fake_quant is `quint8`, the output range should be
from 0 to 255. Fixing. This should address the numerical inaccuracies with
sigmoid and hardsigmoid with `FixedQParamsFakeQuantize` attached compared
to their quantized counterparts.
In a future PR, might be safer to also make the activation functions
using `FixedQParamsFakeQuantize` to explicitly specify their expected
output range and zero_point. Leaving that for later, as this bugfix
should be landed urgently.
Test Plan:
Manual script which gives low SQNR before this PR and high SQNR after
this PR: https://gist.github.com/vkuzo/9906bae29223da72b10d6b6aafadba42https://github.com/pytorch/pytorch/pull/47376, which can be landed after
this, adds a proper test.
Imported from OSS
Reviewed By: ayush29feb, jerryzh168
Differential Revision: D24751497
fbshipit-source-id: 4c32e22a30116caaceeedb4cd47146d066054a89
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan:
Imported from OSS
Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24451406
fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538
This is used to simulate fake quantize operation for ops with fixed quantization parameters
e.g. hardsigmoid
Test Plan: Imported from OSS
Reviewed By: vkuzo
Differential Revision: D24004795
fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773
The model is created and prepared using fx APIs and then scripted for training.
In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant
and observer modules on it.
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23741354
fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749
Ensure fx module is scriptable after calling prepare_qat on it
Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script
Imported from OSS
Reviewed By: jerryzh168
Differential Revision: D23718380
fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39750
Add a test to make the default QAT qconfig scriptable, and fix
all the errors.
Test Plan:
```
python test/test_quantization.py TestQATScript.fake_quant_scriptable
```
Imported from OSS
Differential Revision: D21975879
fbshipit-source-id: 8c48ad9f24b2c941d2267cb53eb70ebecd103744
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/38368
There is a need for some customers to enable/disable these flags
in the middle of QAT. To make it work properly with DDP,
we need to implement them using buffers so that they are replicated
properly to all the nodes.
This should solve issue https://github.com/pytorch/pytorch/issues/38081
Test Plan:
CI
Imported from OSS
Differential Revision: D21537607
fbshipit-source-id: 8c9da022beb7aaa44c658268f02f99dd5aee93fd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032
DataParallel requires all params and buffers of child modules to be updated
in place because of how it implements model replication during the
forward pass (see https://github.com/pytorch/pytorch/pull/12671 for
context). Any params or buffers not updated in place are lost and not
propagated back to the master.
This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut
point) to do their parameter update in-place. This will enable static
quant and QAT to work correctly with DataParallel.
TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR.
Test Plan:
script failed before and passes after the diff:
https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40
TODO before land: add integration testing
Imported from OSS
Differential Revision: D21206454
fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626
For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest.
Test Plan:
Tested on actual model on GPU
Imported from OSS
Differential Revision: D20038839
fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4
Summary:
Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out.
fixes: https://github.com/pytorch/pytorch/issues/32082
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318
Differential Revision: D19434801
Pulled By: jerryzh168
fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357
Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant.
ghstack-source-id: 94468814
Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly.
Differential Revision: D18668517
fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494
`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode
Test Plan:
python test/test_jit.py
Imported from OSS
Differential Revision: D18580905
fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396
Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018
Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details
buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details
Differential Revision: D17727213
fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113
Fix bug in fake quant control of observer and fake-quantize operations.
Add test to ensure that features work as expected
ghstack-source-id: 91071181
Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control
Differential Revision: D17678875
fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516
ghstack-source-id: 90982010
Test Plan:
Integrate per-channel support into conv and linear modules.
The following tests pass:
buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details
buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details
Differential Revision: D17342622
fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520
Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT
ghstack-source-id: 90897063
Test Plan: buck test caffe2/test:quantization -- --print-passing-details
Differential Revision: D17491155
fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492
Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided.
Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention.
Test Plan: Imported from OSS
Differential Revision: D17521265
Pulled By: dzhulgakov
fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e