116 Commits

Author SHA1 Message Date
5cedc5a0ff [BE][PYFMT] migrate PYFMT for torch/[p-z]*/ to ruff format (#144552)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/144552
Approved by: https://github.com/ezyang
2025-08-07 00:09:56 +00:00
3bf922a6ce Apply UFMT to low traffic torch modules (#106249)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249
Approved by: https://github.com/Skylion007
2023-07-29 23:37:30 +00:00
f309f8fbd4 [quant] ao migration of observer and qconfig (#64982)
Summary:
(Had to recreate this diff so it wasn't dependent on the stack)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982

migration of qconfig.py and observer.py to torch/ao/quantization using new test format
ghstack-source-id: 138215256

Test Plan:
buck test mode/opt //caffe2/test:quantization

https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/

buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization

https://www.internalfb.com/intern/testinfra/testrun/3940649742829796

Reviewed By: z-a-f

Differential Revision: D30982534

fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9
2021-09-16 10:33:16 -07:00
37bcefa248 [quant] Removing hardcoded "torch.quantization.observer" for migration (#64981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981

this would have cause errors when observer.py was moved to ao.

see: D30391189
ghstack-source-id: 138118430

Test Plan:
buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)'

buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)'

Reviewed By: supriyar

Differential Revision: D30432008

fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc
2021-09-15 15:22:19 -07:00
336aa9cd85 change with_callable_args to return a fresh _PartialWrapper (#63374)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/63326

Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374

Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally.

Reviewed By: HDCharles

Differential Revision: D30359360

Pulled By: robieta

fbshipit-source-id: aef33261ac49952d8d90da868a57ab063dfc456e
2021-08-17 09:11:38 -07:00
08d1a12d69 [quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863

To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max

Test Plan:
python test/test_quantization.py test_fused_mod_reduce_range

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D30146602

fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972
2021-08-10 09:27:01 -07:00
cfd0f5ebc9 [quant] update per-channel observer min/max_val attribute names (#62345)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345

This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes.

It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version.
Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR.

Test Plan:
python test/test_quantization.py TestSerialization

Imported from OSS

Reviewed By: HDCharles

Differential Revision: D30003700

fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a
2021-07-29 22:28:53 -07:00
32d0c3e8ee Support for reference convert_fx working on gpu
Summary:
This PR enables gpu only quantization, best used with is_reference since
there are not many gpu kernels for ops as of now.

This PR mainly changes how qconfigs and their obs constructors operate once they
on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original
qconfig, and configures them so that when invoked, the created obs will
be on whatever device the module occupies. (Once observers are created,
module.to(device) is already setup so that it moves any observers). To do this,
a new method and a few small chanegs were added to the _PartialWrapper class that
our observers already use to create constructors (without changing the
existing functionality). These changes work in
concert with changes to the prepare flow such that when the qconfigs are
propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr.

Ideally this would work on other models but the is_reference support for
a lot of modules isn't there yet, those tests should be added in a
future PR

Test Plan:
python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic

python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert

python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert

python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence

Reviewed By: vkuzo

Differential Revision: D29684114

fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7
2021-07-23 10:30:38 -07:00
99848c7269 [quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317

Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are

* required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc
* enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53
* overload consistent with `quantizer_per_tensor.tensor_qparams`
ghstack-source-id: 133370216

Test Plan:
buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask
buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask

Reviewed By: raghuramank100

Differential Revision: D29552727

fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d
2021-07-10 19:41:55 -07:00
dabadd7e20 [quant] Added reset_min_max_vals() function to observers (#60883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883

As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code.

Test Plan:
`python test/test_quantization.py TestEqualizeFx`

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D29491848

fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f
2021-06-30 14:22:08 -07:00
4887c6e401 [quant] avoid resize calls in observer/fake_quant (#60386)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386

During QAT we sometimes encounter errors with scripted models
`RuntimeError: cannot resize variables that require grad`

For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable

Test Plan: Imported from OSS

Reviewed By: vkuzo

Differential Revision: D29271905

fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b
2021-06-22 17:41:43 -07:00
c0b7c59e55 [quant] Equalization Observer modifications (#59953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953

The following modifications were made to the equalization
observers due to design changes:
- [InputEqualizationObserver] Replaced `calculate_qparams()` with
`calculate_scaled_minmax()` since we will need to return the scaled
min/max values to update the following input quantization observer
- [WeightEqualizationObserver] We no longer need a row observer since
this will be taken care of by the following weight quantization observer
- [WeightEqualizationObserver] Following the previous comment, we no
longer need to calculate the scaled qparam values. Instead, we will use
the equalization scale to later scale the weights and the qparams will
be taken care of by the weight quantization observer.

Test Plan:
`python test/test_quantization.py
TestEqualizeFx.test_input_weight_eq_observer`

Imported from OSS

Reviewed By: supriyar

Differential Revision: D29135332

fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831
2021-06-16 22:32:30 -07:00
61965abad7 Move _PartialWrapper to module scope (#59660)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660

Context https://github.com/pytorch/pytorch/issues/57352

Test Plan: Pytorch CI tests

Reviewed By: vkuzo

Differential Revision: D28972991

fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42
2021-06-09 11:55:04 -07:00
1aa14fcb14 Fix the "tensors to be on the same device" error in HistogramObserver (#59234)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/59075

This PR fixes the "tensors to be on the same device" error in `HistogramObserver`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234

Reviewed By: jbschlosser

Differential Revision: D28837572

Pulled By: vkuzo

fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216
2021-06-03 13:30:56 -07:00
a1806134a7 [QAT] Fix the runtime run cannot resize variables that require grad (#57068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57068

When training with histogram observer on, we got this runtime error:
```
torch/quantization/observer.py", line 942, in forward
                    self.bins)

            self.histogram.resize_(combined_histogram.shape)
            ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            self.histogram.copy_(combined_histogram)
            self.min_val.resize_(combined_min.shape)
RuntimeError: cannot resize variables that require grad
```

Since this is the histogram observer that is used to collect histogram information, should not need gradient. So turn off the grad before resizing using `detach_()` method.

Test Plan:
- arc lint
- Train with histogram observer turned on, training finished successfully

f264139727

Reviewed By: supriyar

Differential Revision: D27147212

fbshipit-source-id: abed5b9c4570ffc6bb60e58e64791cfce66856cd
2021-05-27 09:12:06 -07:00
25ac647f64 [QAT] Auto format the torch/quantization/observer.py` (#57067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57067

auto format the code

Test Plan: lint

Reviewed By: jerryzh168

Differential Revision: D27147213

fbshipit-source-id: 008871d276c8891b2411549e17617e5c27d16ee3
2021-05-27 09:10:34 -07:00
febff45900 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: albanD

Differential Revision: D27939544

Pulled By: jbschlosser

fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805
2021-04-22 16:16:53 -07:00
12b2bc94d7 Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27909732 (5a09def9b0)

Original commit changeset: d8684b2403ab

fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4
2021-04-21 13:44:03 -07:00
5a09def9b0 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: malfet

Differential Revision: D27909732

Pulled By: jbschlosser

fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76
2021-04-21 13:20:11 -07:00
92d24e3060 Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27855386 (40483acc51)

Original commit changeset: dabd505d2a04

fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500
2021-04-19 20:07:20 -07:00
40483acc51 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: bdhirsh

Differential Revision: D27855386

Pulled By: jbschlosser

fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c
2021-04-19 12:24:58 -07:00
d05e7c163f Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules
Test Plan: revert-hammer

Differential Revision:
D27600457 (1077f87269)

Original commit changeset: b58bfee61c39

fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c
2021-04-19 07:47:24 -07:00
1077f87269 Support factory kwargs in torch.nn modules (#54508)
Summary:
Continuation of https://github.com/pytorch/pytorch/pull/53144

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508

Reviewed By: mrshenli

Differential Revision: D27600457

Pulled By: jbschlosser

fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1
2021-04-19 06:58:40 -07:00
ada916675f update HistogramObserver to be scriptable (#51081)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51081

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51001

fix tests in TestQuantizeJitOps

Test Plan:
Imported from OSS
python test/test_quantization.py

Reviewed By: raghuramank100

Differential Revision: D26038759

Pulled By: lyoka

fbshipit-source-id: 0977ba7b8b26a9f654f20f5c698a7a20ec078c35
2021-01-27 07:27:03 -08:00
72306378b4 quant: ensure observers do not crash for empty Tensors (#49800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49800

Ensures that having a Tensor with 0 elements does not crash observers.
Note: it's illegal to pass Tensors with 0 elements to reductions such
as min and max, so we gate this out before the logic hits min/max.

This should not be hit often in practice, but it's coming up
during debugging of some RCNN models with test inputs.

Test Plan:
```
python test/test_quantization.py TestObserver.test_zero_numel
```

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D25693230

fbshipit-source-id: d737559697c98bd923356edacba895835060bb38
2021-01-05 09:35:47 -08:00
14edc726d9 Clean up some type annotations in caffe2/torch/quantization (#49942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49942

Upgrades type annotations from Python2 to Python3

Test Plan: Sandcastle tests

Reviewed By: vkuzo

Differential Revision: D25717551

fbshipit-source-id: 1b63dc485ecf6641641b05f7ce095ae1d2d87346
2020-12-29 15:43:50 -08:00
576fa09157 [quant][fix] Fix quant type classification for float_qparam qconfig (#48069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48069

also renamed float_qparam_dynamic_qconfig to float_qparam_weight_only_qconfig
It's not used in user code yet so we only need to update the tests.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D25010175

fbshipit-source-id: caa3eaa5358a8bc5c808bf5f64e6ebff3e0b61e8
2020-11-18 18:22:08 -08:00
6bb18b24fb [quant][qat] Ensure observer respects device affinity (#47514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47514

Previosuly the scale and zero_point were returned on the CPU even if
the input tensor was on the GPU.
This is because `copy_()` doesn't respect the device when copying over the tensor.

Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id)
in the calculate_qparams function

Test Plan:
python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D24800495

fbshipit-source-id: d7a76c59569842ed69029d0eb4fa9df63f87e28c
2020-11-10 08:43:52 -08:00
11c32611d7 [quant] Support 4-bit embedding_bag operators using the dtype quint4x2 (#45752)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45752

Use the torch.quint4x2 dtype to create 4-bit packed tensors in the previous PR.
These packed tensors can be directly consumed by the operator.
Serialization of the packed tensors is supported using torchbind custom class.
Module support will follow in a later PR.

Test Plan:
python test/test_quantization.py TestEmbeddingBagOps

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D24120996

fbshipit-source-id: 2639353b3343ebc69e058b5ba237d3fc56728e1c
2020-10-06 21:11:49 -07:00
322855e380 type check for torch.quantization.observer (#45630)
Summary:
add type checker for observer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45630

Reviewed By: malfet

Differential Revision: D24058304

Pulled By: walterddr

fbshipit-source-id: ac1c0f5ff0d34b0445bd1364653fc5c9d7571b05
2020-10-02 13:25:41 -07:00
9d5607fcd9 [quant] Use PlaceholderObserver as default dynamic quant observer (#45343)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45343

Current default dynamic quant observer is not correct since we don't accumulate
min/max and we don't need to calculate qparams.

Test Plan: Imported from OSS

Reviewed By: supriyar

Differential Revision: D23933995

fbshipit-source-id: 3ff497c9f5f74c687e8e343ab9948d05ccbba09b
2020-09-30 19:01:18 -07:00
489af4ddcb [quant] Add quant APIs to save/load observer state_dict (#44846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846

The save function traverses the model state dict to pick out the observer stats
load function traverse the module hierarchy to load the state dict into module attributes depending on observer type

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23746821

fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55
2020-09-29 01:52:42 -07:00
2163d31016 histogram observer: ensure buffer shape consistency (#44956)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956

Makes buffer shapes for HistogramObserver have the
same shapes in uninitialized versus initialized states.

This is useful because the detectron2 checkpointer assumes
that these states will stay the same, so it removes the
need for manual hacks around the shapes changing.

Test Plan:
```
python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23785382

fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92
2020-09-19 09:29:39 -07:00
20ac736200 Remove py2 compatible future imports (#44735)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735

Reviewed By: mruberry

Differential Revision: D23731306

Pulled By: ezyang

fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f
2020-09-16 12:55:57 -07:00
3f512b0de2 [quant][qat] Ensure observers and fq modules are scriptable (#44749)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749

Ensure fx module is scriptable after calling prepare_qat on it

Test Plan:
python test/test_quantization.py TestQuantizeFx.test_qat_and_script

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23718380

fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c
2020-09-16 09:30:07 -07:00
70dfeb44bd MinMax based observers: respect device affinity for state_dict (#44537)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537

Originally, the `min_val`, `max_val`, `min_vals`, `max_vals`
attributes of observers were Tensors but not buffers.  They had custom
state_dict save/load code to ensure their state was saved.

At some point, these attributes became buffers, and the custom
save/load code remained. This introduced a subtle bug:
* create model A, move it to a device (cpu/cuda) and save its state_dict
* create model B, load its state dict.
* `min_val|min_vals|max_val|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device
* the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices)

In practice, the case people would sometimes hit is:
* model A is on CPU, state dict is saved
* model B is created and moved to GPU, state_dict from model A is loaded
* assertions throw when operations are attempted across different devices

This PR fixes the behavior by removing the custom save/load where
possible and letting the default `nn.Module` save/load code handle
device assignment.  We special case `PerChannelMinMaxObserver` and its
children to allow for loading buffers or different size, which is
normal.

There are some followups to also enable this for HistogramObserver
and FakeQuantize, which can be done in separate PRs due to higher
complexity.

Test Plan:
```
python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity
```

Imported from OSS

Reviewed By: raghuramank100

Differential Revision: D23644493

fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e
2020-09-11 14:48:56 -07:00
fd8e2064e0 quant: switch observers to use min_max (#42957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42957

Switches observers to use the new min_max function to calculate
min and max at the same time.  We see around 45-50% speedup on
representative input shapes on the microbenchmarks for all observers except `HistogramObserver`.

Test Plan:
CI for correctness

performance:
```
cd benchmarks/operator_benchmark
// repeat (before diff, after diff) x (cpu, cuda)
python -m pt.qobserver_test --tag_filter all --device cpu
/*
    * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/
    * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/
    * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/
    * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/
*/
```

Imported from OSS

Reviewed By: supriyar

Differential Revision: D23093995

fbshipit-source-id: 9f416d144109b5b80baf089eb4bcfabe8fe358d5
2020-09-08 11:39:44 -07:00
f73ba88946 Avoid resizing in MinMaxObserver (#43789)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43789

Since it's single element.. In some cases we may not be able to resize the
buffers.

Test Plan: unit tests

Reviewed By: supriyar

Differential Revision: D23393108

fbshipit-source-id: 46cd7f73ed42a05093662213978a01ee726433eb
2020-08-31 17:41:39 -07:00
3293fdfa80 [quant] Enable from_float for quantized Embedding_Bag (#43176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43176

Convert floating point nn.EmbeddingBag module to
nn.quantized.dynamic.EmbeddingBag module

Test Plan:
python test/test_quantization.py TestDynamicQuantizedModule.test_embedding_bag_api
python test/test_quantization.py TestPostTrainingDynamic.test_embedding_quantization

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23200196

fbshipit-source-id: 090f47dbf7aceab9c719cbf282fad20fe3e5a983
2020-08-21 11:46:03 -07:00
57af1ec145 observers: use torch.all to check for valid min and max values (#43151)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43151

Using `torch.all` instead of `torch.sum` and length check.
It's unclear whether the increase in perf (~5% for small inputs) is
real, but should be a net benefit, especially for larger channel inputs.

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23170426

fbshipit-source-id: ee5c25eb93cee1430661128ac9458a9c525df8e5
2020-08-17 17:08:57 -07:00
3264ba065c observers: use clamp instead of min/max in calculate_qparams (#43150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43150

The current logic was expensive because it created tensors on CUDA.
Switching to clamp since it can work without needing to create tensors.

Test Plan:
benchmarks

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23170427

fbshipit-source-id: 6fe3a728e737aca9f6c2c4d518c6376738577e21
2020-08-17 17:08:54 -07:00
a5dfba0a6e observers: make eps a buffer (#43149)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43149

This value doesn't change, making it a buffer to only pay
the cost of creating a tensor once.

Test Plan: Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D23170428

fbshipit-source-id: 6b963951a573efcc5b5a57649c814590b448dd72
2020-08-17 17:08:51 -07:00
b992a927a9 Clearer Semantics and Naming for Customized Quantization Range Initialization in Observer (#42602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42602

In this diff, clearer semantics and namings for are introduced by splitting the original `init_dynamic_qrange` into 2 separate `Optional[int]` types `qmin` and `qmax` to avoid the confusion of the parameters with dynamic quantization.

The `qmin` and `qmax` parameters allow customers to specify their own customary quantization range and enables specific use cases for lower bit quantization.

Test Plan:
To assert the correctness and compatibility of the changes with existing observers, on a devvm, execute the following command to run the unit tests:

`buck test //caffe2/test:quantization -- observer`

Reviewed By: vkuzo, raghuramank100

Differential Revision: D22948334

fbshipit-source-id: 275bc8c9b5db4ba76fc2e79ed938376ea4f5a37c
2020-08-13 21:15:23 -07:00
816d37b1d8 [quant] Make PerChannel Observer work with float qparams (#42690)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42690

Add implementation for new qscheme per_channel_affine_float_qparams in observer

Test Plan:
python test/test_quantization.py TestObserver.test_per_channel_observers

Imported from OSS

Reviewed By: vkuzo

Differential Revision: D23070633

fbshipit-source-id: 84d348b0ad91e9214770131a72f7adfd3970349c
2020-08-13 11:22:19 -07:00
7332c21f7a Speed up HistogramObserver by vectorizing critical path (#41041)
Summary:
22x speedup over the code this replaces. Tested on ResNet18 on a devvm using CPU only, using default parameters for HistogramObserver (i.e. 2048 bins).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/41041

Test Plan:
To run the test against the reference (old) implementation, you can use `python test/test_quantization.py TestRecordHistogramObserver.test_histogram_observer_against_reference`.

To run the benchmark, while in the folder `benchmarks/operator_benchmark`, you can use `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`.

Benchmark results before speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 185818.566

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 165325.916
```

Benchmark results after speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 12242.241

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 12655.354
```

Reviewed By: raghuramank100

Differential Revision: D22400755

Pulled By: durumu

fbshipit-source-id: 639ac796a554710a33c8a930c1feae95a1148718
2020-08-07 12:29:23 -07:00
38bf5be24f [quant] Use PlaceholderObserver instead of Fp16Observer and NoopObserver (#42348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42348

Use the dtype info in placeholderObserver to decide what ops to insert in the graph
In the next PR we can delete NoopObserver

Test Plan:
python test/test_quantization.py

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D22859457

fbshipit-source-id: a5c618f22315534ebd9a2df77b14a0aece196989
2020-07-31 12:33:56 -07:00
8c5bf10264 [quant] Add FP16Observer for fp16 quant support (#42221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/42221

Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph

Test Plan:
python test/test_quantizaton.py TestObserver.test_fp16_observer

Imported from OSS

Reviewed By: jerryzh168

Differential Revision: D22849222

fbshipit-source-id: a301281ce38ba4d4e7a009308400d34a08c113d2
2020-07-31 12:33:51 -07:00
36fb14b68b [quant] Add Graph Mode Passes to quantize EmbeddingBag operators (#41612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41612

This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights.

To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions.
Refer to the testplan for how to invoke the qconfig for the embedding_bag ops.

Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it.

NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device.

Test Plan:
python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag

Imported from OSS

Reviewed By: vkuzo, jerryzh168

Differential Revision: D22609342

fbshipit-source-id: 23e33f44a451c26719e6e283e87fbf09b584c0e6
2020-07-23 18:54:59 -07:00
9e0c746b15 Augmenting Concrete Observer Constructors to Support Dynamic Quantization Range; Modifying Utility Functions in _LearnableFakeQuantize Module for Better Logging and Baseline Construction. (#41815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41815

**All are minor changes to enable better simulations.**

The constructors of MinMaxObserver, MovingAverageMinMaxObserver, PerChannelMinMaxObserver, and MovingAveragePerChannelMinMaxObserver are augmented so they can utilize the dynamic quantization range support in the _ObserverBase class.

In addition, minor adjustments are made to the enable_static_observation function that allow observer to update parameters but do not fake quantize on the output (for constructing baseline).

Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:
```
buck test //caffe2/test:quantization -- observer
```

Reviewed By: z-a-f

Differential Revision: D22649128

fbshipit-source-id: 32393b706f9b69579dc2f644fb4859924d1f3773
2020-07-21 17:59:40 -07:00
16dde6e3a0 Augmenting Observers to Support Dynamic Quantization Range (#41113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/41113

In this diff, the `ObserverBase` class is augmented with 2 additional optional arguments qmin and qmax. Correspondingly the calculation of qmin and qmax and the related quantization parameters are modified to accommodate this additional flexibility should the number of bits for quantization be lower than 8 (the default value).

Additional logic in the base class `_calculate_qparams` function has also been modified to provide support for dynamic quantization range.

Test Plan:
To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics:

`buck test //caffe2/test:quantization -- observer`

This modified observer script can be tested within the experiments for lower bit fake quantization. Please see the following diffs for reference.
- Single Fake Quantizer: D22337447
- Single Conv Layer: D22338532

Reviewed By: z-a-f

Differential Revision: D22427134

fbshipit-source-id: f405e633289322078b0f4a417f54b684adff2549
2020-07-20 08:51:31 -07:00