pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 21:49:24 +08:00

Author	SHA1	Message	Date
Xuehai Pan	5cedc5a0ff	[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format` (#144552 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144552 Approved by: https://github.com/ezyang	2025-08-07 00:09:56 +00:00
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
Charles David Hernandez	f309f8fbd4	[quant] ao migration of observer and qconfig (#64982 ) Summary: (Had to recreate this diff so it wasn't dependent on the stack) Pull Request resolved: https://github.com/pytorch/pytorch/pull/64982 migration of qconfig.py and observer.py to torch/ao/quantization using new test format ghstack-source-id: 138215256 Test Plan: buck test mode/opt //caffe2/test:quantization https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249354294701/ buck test mode/dev //caffe2/test:quantization -- TestAOMigrationQuantization https://www.internalfb.com/intern/testinfra/testrun/3940649742829796 Reviewed By: z-a-f Differential Revision: D30982534 fbshipit-source-id: 48d08969b1984311ceb036eac0877c811cd6add9	2021-09-16 10:33:16 -07:00
Charles David Hernandez	37bcefa248	[quant] Removing hardcoded "torch.quantization.observer" for migration (#64981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64981 this would have cause errors when observer.py was moved to ao. see: D30391189 ghstack-source-id: 138118430 Test Plan: buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_dynamic_quant_multi_uses (quantization.jit.test_quantize_jit.TestQuantizeDynamicJitPasses)' buck test mode/opt //caffe2/test:quantization -- --exact 'caffe2/test:quantization - test_save_load_state_dict_script (quantization.core.test_workflow_module.TestObserver)' Reviewed By: supriyar Differential Revision: D30432008 fbshipit-source-id: 754727a89c78f6ceada6f8ff92c304f3953f38fc	2021-09-15 15:22:19 -07:00
Taylor Robie	336aa9cd85	change with_callable_args to return a fresh _PartialWrapper (#63374 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/63326 Currently `get_callable_args` has the side effect of mutating the input _PartialWrapper. When that input is one of the global defaults, there are all sorts of lifetime issues that crop up. (Details in the linked issue.) So far as I can tell, we only need to make a constructor which is module (and by extension device) aware, so making a fresh one should have the same effect without leaking the last call's module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/63374 Test Plan: the repro in https://github.com/pytorch/pytorch/issues/63326 now reports no leaked Tensors, and all quantization tests pass locally. Reviewed By: HDCharles Differential Revision: D30359360 Pulled By: robieta fbshipit-source-id: aef33261ac49952d8d90da868a57ab063dfc456e	2021-08-17 09:11:38 -07:00
Supriya Rao	08d1a12d69	[quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863 To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max Test Plan: python test/test_quantization.py test_fused_mod_reduce_range Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30146602 fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972	2021-08-10 09:27:01 -07:00
Supriya Rao	cfd0f5ebc9	[quant] update per-channel observer min/max_val attribute names (#62345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62345 This PR updates the attribute names from min_vals to min_val. the motivation for this is to keep the attribute name consistent with per-tensor observers so that dependencies (like FusedMovingAvgObsFakeQuantize) don't need to differentiate between the two observer types to access the attributes. It also adds some BC tests to make sure that observers saved earlier with min_vals/max_vals can be loaded depending on the state_dict version. Note: Scriptability of the observers isn't fully supported yet, so we aren't testing for that in this PR. Test Plan: python test/test_quantization.py TestSerialization Imported from OSS Reviewed By: HDCharles Differential Revision: D30003700 fbshipit-source-id: 20e673f1bb15e2b209551b6b9d5f8f3be3f85c0a	2021-07-29 22:28:53 -07:00
Charles David Hernandez	32d0c3e8ee	Support for reference convert_fx working on gpu Summary: This PR enables gpu only quantization, best used with is_reference since there are not many gpu kernels for ops as of now. This PR mainly changes how qconfigs and their obs constructors operate once they on modules qconfig. The function add_module_to_qconfig_obs_ctr takes the obs constructors on the original qconfig, and configures them so that when invoked, the created obs will be on whatever device the module occupies. (Once observers are created, module.to(device) is already setup so that it moves any observers). To do this, a new method and a few small chanegs were added to the _PartialWrapper class that our observers already use to create constructors (without changing the existing functionality). These changes work in concert with changes to the prepare flow such that when the qconfigs are propagated to the moduels (in quantize.py and qconfig_utils.py) they are configured using add_module_to_qconfig_obs_ctr. Ideally this would work on other models but the is_reference support for a lot of modules isn't there yet, those tests should be added in a future PR Test Plan: python test/test_quantization.py TestQuantizeFxModels.test_static_gpu_convert_basic python test/test_quantization.py TestQuantizeFxModels.test_switch_device_prepare_convert python test/test_quantization.py TestQuantizeFxModels.test_prepare_serialize_switch_device_convert python test/test_quantization.py TestQuantizeFx.test_qconfig_precedence Reviewed By: vkuzo Differential Revision: D29684114 fbshipit-source-id: 19fefb8e1998eaf212723e836276ccf39467f2e7	2021-07-23 10:30:38 -07:00
Supriya Rao	99848c7269	[quant] Add tensor_qparam variant to fake_quantize_per_tensor (#61317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61317 Add an overload to fake_quantize_per_tensor that accepts scale/zero_point as input. The reasons to do this are * required for fused observer + fake_quant operator on GPU where the scale/zero_point will be calculated by the observer on device. Passing tensor inputs enables us to directly access the scale/zero-point value in the cuda kernel to avoid extra copies/malloc * enables us to pass in float as scale dtype and int32 as zero_point dtype (which is consistent with what the quantize call actually uses) https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/affine_quantizer_base.cpp#L52-L53 * overload consistent with `quantizer_per_tensor.tensor_qparams` ghstack-source-id: 133370216 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization -- test_backward_per_tensor_cachemask buck test mode/dev-nosan caffe2/test/:quantization -- test_forward_per_tensor_cachemask Reviewed By: raghuramank100 Differential Revision: D29552727 fbshipit-source-id: cbb9af40fc575ad27a29c646b760d5ee52cc923d	2021-07-10 19:41:55 -07:00
Angela Yi	dabadd7e20	[quant] Added reset_min_max_vals() function to observers (#60883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60883 As per this [comment](https://github.com/pytorch/pytorch/pull/59964#discussion_r659064270), I created a `reset_min_max_vals()` function inside the observers which will be called during input-weight equalization. This is so that we will not expose the implementation of the observers in the equalization code. Test Plan: `python test/test_quantization.py TestEqualizeFx` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D29491848 fbshipit-source-id: 00e91959ceb3b4f3688175a1a7ba11823e929b2f	2021-06-30 14:22:08 -07:00
Supriya Rao	4887c6e401	[quant] avoid resize calls in observer/fake_quant (#60386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386 During QAT we sometimes encounter errors with scripted models `RuntimeError: cannot resize variables that require grad` For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29271905 fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b	2021-06-22 17:41:43 -07:00
Angela Yi	c0b7c59e55	[quant] Equalization Observer modifications (#59953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59953 The following modifications were made to the equalization observers due to design changes: - [InputEqualizationObserver] Replaced `calculate_qparams()` with `calculate_scaled_minmax()` since we will need to return the scaled min/max values to update the following input quantization observer - [WeightEqualizationObserver] We no longer need a row observer since this will be taken care of by the following weight quantization observer - [WeightEqualizationObserver] Following the previous comment, we no longer need to calculate the scaled qparam values. Instead, we will use the equalization scale to later scale the weights and the qparams will be taken care of by the weight quantization observer. Test Plan: `python test/test_quantization.py TestEqualizeFx.test_input_weight_eq_observer` Imported from OSS Reviewed By: supriyar Differential Revision: D29135332 fbshipit-source-id: be7e468273c8b62fc183b1e1ec50f6bd6d8cf831	2021-06-16 22:32:30 -07:00
Kevin Zheng (FRL)	61965abad7	Move _PartialWrapper to module scope (#59660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/59660 Context https://github.com/pytorch/pytorch/issues/57352 Test Plan: Pytorch CI tests Reviewed By: vkuzo Differential Revision: D28972991 fbshipit-source-id: efc9dd3e90e18e1cdf27d5ef0f168abd8169bc42	2021-06-09 11:55:04 -07:00
KAI ZHAO	1aa14fcb14	Fix the "tensors to be on the same device" error in HistogramObserver (#59234 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/59075 This PR fixes the "tensors to be on the same device" error in `HistogramObserver`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/59234 Reviewed By: jbschlosser Differential Revision: D28837572 Pulled By: vkuzo fbshipit-source-id: ff7c3229ced7de2cdd8f76d526f0fd33ac643216	2021-06-03 13:30:56 -07:00
Jing Shan	a1806134a7	[QAT] Fix the runtime run `cannot resize variables that require grad` (#57068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57068 When training with histogram observer on, we got this runtime error: ``` torch/quantization/observer.py", line 942, in forward self.bins) self.histogram.resize_(combined_histogram.shape) ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE self.histogram.copy_(combined_histogram) self.min_val.resize_(combined_min.shape) RuntimeError: cannot resize variables that require grad ``` Since this is the histogram observer that is used to collect histogram information, should not need gradient. So turn off the grad before resizing using `detach_()` method. Test Plan: - arc lint - Train with histogram observer turned on, training finished successfully f264139727 Reviewed By: supriyar Differential Revision: D27147212 fbshipit-source-id: abed5b9c4570ffc6bb60e58e64791cfce66856cd	2021-05-27 09:12:06 -07:00
Jing Shan	25ac647f64	[QAT] Auto format the torch/quantization/observer.py` (#57067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57067 auto format the code Test Plan: lint Reviewed By: jerryzh168 Differential Revision: D27147213 fbshipit-source-id: 008871d276c8891b2411549e17617e5c27d16ee3	2021-05-27 09:10:34 -07:00
Joel Schlosser	febff45900	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: albanD Differential Revision: D27939544 Pulled By: jbschlosser fbshipit-source-id: 4bf517e5f74f093e27ca38a85e732da65e44d805	2021-04-22 16:16:53 -07:00
Joel Schlosser	12b2bc94d7	Revert D27909732: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27909732 (`5a09def9b0`) Original commit changeset: d8684b2403ab fbshipit-source-id: d00d69fae4fa4ed58d9e97e70b27a06a0dcb39e4	2021-04-21 13:44:03 -07:00
Joel Schlosser	5a09def9b0	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: malfet Differential Revision: D27909732 Pulled By: jbschlosser fbshipit-source-id: d8684b2403ab7eb336371d118799146a2520bd76	2021-04-21 13:20:11 -07:00
Natalia Gimelshein	92d24e3060	Revert D27855386: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27855386 (`40483acc51`) Original commit changeset: dabd505d2a04 fbshipit-source-id: f5bf3120d87861b30a8e1bf11977ad7d27cd8500	2021-04-19 20:07:20 -07:00
Joel Schlosser	40483acc51	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: bdhirsh Differential Revision: D27855386 Pulled By: jbschlosser fbshipit-source-id: dabd505d2a04208e74b158570fb2859c736eea2c	2021-04-19 12:24:58 -07:00
Sam Estep	d05e7c163f	Revert D27600457: [pytorch][PR] Support factory kwargs in torch.nn modules Test Plan: revert-hammer Differential Revision: D27600457 (`1077f87269`) Original commit changeset: b58bfee61c39 fbshipit-source-id: 19d5bfc5133a3880383731d0332503ca1f3bce0c	2021-04-19 07:47:24 -07:00
Joel Schlosser	1077f87269	Support factory kwargs in torch.nn modules (#54508 ) Summary: Continuation of https://github.com/pytorch/pytorch/pull/53144 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54508 Reviewed By: mrshenli Differential Revision: D27600457 Pulled By: jbschlosser fbshipit-source-id: b58bfee61c3917524b4622f63ef216c27a588eb1	2021-04-19 06:58:40 -07:00
yanli924	ada916675f	update HistogramObserver to be scriptable (#51081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/51001 fix tests in TestQuantizeJitOps Test Plan: Imported from OSS python test/test_quantization.py Reviewed By: raghuramank100 Differential Revision: D26038759 Pulled By: lyoka fbshipit-source-id: 0977ba7b8b26a9f654f20f5c698a7a20ec078c35	2021-01-27 07:27:03 -08:00
Vasiliy Kuznetsov	72306378b4	quant: ensure observers do not crash for empty Tensors (#49800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49800 Ensures that having a Tensor with 0 elements does not crash observers. Note: it's illegal to pass Tensors with 0 elements to reductions such as min and max, so we gate this out before the logic hits min/max. This should not be hit often in practice, but it's coming up during debugging of some RCNN models with test inputs. Test Plan: ``` python test/test_quantization.py TestObserver.test_zero_numel ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25693230 fbshipit-source-id: d737559697c98bd923356edacba895835060bb38	2021-01-05 09:35:47 -08:00
Richard Barnes	14edc726d9	Clean up some type annotations in caffe2/torch/quantization (#49942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49942 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: vkuzo Differential Revision: D25717551 fbshipit-source-id: 1b63dc485ecf6641641b05f7ce095ae1d2d87346	2020-12-29 15:43:50 -08:00
Jerry Zhang	576fa09157	[quant][fix] Fix quant type classification for float_qparam qconfig (#48069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48069 also renamed float_qparam_dynamic_qconfig to float_qparam_weight_only_qconfig It's not used in user code yet so we only need to update the tests. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D25010175 fbshipit-source-id: caa3eaa5358a8bc5c808bf5f64e6ebff3e0b61e8	2020-11-18 18:22:08 -08:00
Supriya Rao	6bb18b24fb	[quant][qat] Ensure observer respects device affinity (#47514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47514 Previosuly the scale and zero_point were returned on the CPU even if the input tensor was on the GPU. This is because `copy_()` doesn't respect the device when copying over the tensor. Also fixed a bug where we were always setting the device to 'cuda' (irrespective of the device id) in the calculate_qparams function Test Plan: python test/test_quantization.py TestObserver.test_observer_qparams_respects_device_affinity Imported from OSS Reviewed By: vkuzo Differential Revision: D24800495 fbshipit-source-id: d7a76c59569842ed69029d0eb4fa9df63f87e28c	2020-11-10 08:43:52 -08:00
Supriya Rao	11c32611d7	[quant] Support 4-bit embedding_bag operators using the dtype quint4x2 (#45752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45752 Use the torch.quint4x2 dtype to create 4-bit packed tensors in the previous PR. These packed tensors can be directly consumed by the operator. Serialization of the packed tensors is supported using torchbind custom class. Module support will follow in a later PR. Test Plan: python test/test_quantization.py TestEmbeddingBagOps Imported from OSS Reviewed By: jerryzh168 Differential Revision: D24120996 fbshipit-source-id: 2639353b3343ebc69e058b5ba237d3fc56728e1c	2020-10-06 21:11:49 -07:00
Rong Rong	322855e380	type check for torch.quantization.observer (#45630 ) Summary: add type checker for observer Pull Request resolved: https://github.com/pytorch/pytorch/pull/45630 Reviewed By: malfet Differential Revision: D24058304 Pulled By: walterddr fbshipit-source-id: ac1c0f5ff0d34b0445bd1364653fc5c9d7571b05	2020-10-02 13:25:41 -07:00
Jerry Zhang	9d5607fcd9	[quant] Use PlaceholderObserver as default dynamic quant observer (#45343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45343 Current default dynamic quant observer is not correct since we don't accumulate min/max and we don't need to calculate qparams. Test Plan: Imported from OSS Reviewed By: supriyar Differential Revision: D23933995 fbshipit-source-id: 3ff497c9f5f74c687e8e343ab9948d05ccbba09b	2020-09-30 19:01:18 -07:00
Supriya Rao	489af4ddcb	[quant] Add quant APIs to save/load observer state_dict (#44846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44846 The save function traverses the model state dict to pick out the observer stats load function traverse the module hierarchy to load the state dict into module attributes depending on observer type Test Plan: python test/test_quantization.py TestQuantizeFx.test_save_observer_state_dict Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23746821 fbshipit-source-id: 05c571b62949a2833602d736a81924d77e7ade55	2020-09-29 01:52:42 -07:00
Vasiliy Kuznetsov	2163d31016	histogram observer: ensure buffer shape consistency (#44956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44956 Makes buffer shapes for HistogramObserver have the same shapes in uninitialized versus initialized states. This is useful because the detectron2 checkpointer assumes that these states will stay the same, so it removes the need for manual hacks around the shapes changing. Test Plan: ``` python test/test_quantization.py TestObserver.test_histogram_observer_consistent_buffer_shape ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23785382 fbshipit-source-id: 1a83fd4f39b244b00747c368d5d305a07d877c92	2020-09-19 09:29:39 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Supriya Rao	3f512b0de2	[quant][qat] Ensure observers and fq modules are scriptable (#44749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749 Ensure fx module is scriptable after calling prepare_qat on it Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23718380 fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c	2020-09-16 09:30:07 -07:00
Vasiliy Kuznetsov	70dfeb44bd	MinMax based observers: respect device affinity for state_dict (#44537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44537 Originally, the `min_val`, `max_val`, `min_vals`, `max_vals` attributes of observers were Tensors but not buffers. They had custom state_dict save/load code to ensure their state was saved. At some point, these attributes became buffers, and the custom save/load code remained. This introduced a subtle bug: * create model A, move it to a device (cpu/cuda) and save its state_dict * create model B, load its state dict. * `min_val\|min_vals\|max_val\|max_vals` would always be loaded to model A's device, even if the rest of model B was on a different device * the above is inconsistent with how save/load on different devices is expected to work (see https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-across-devices) In practice, the case people would sometimes hit is: * model A is on CPU, state dict is saved * model B is created and moved to GPU, state_dict from model A is loaded * assertions throw when operations are attempted across different devices This PR fixes the behavior by removing the custom save/load where possible and letting the default `nn.Module` save/load code handle device assignment. We special case `PerChannelMinMaxObserver` and its children to allow for loading buffers or different size, which is normal. There are some followups to also enable this for HistogramObserver and FakeQuantize, which can be done in separate PRs due to higher complexity. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D23644493 fbshipit-source-id: 0dbb6aa309ad569a91a663b9ee7e44644080032e	2020-09-11 14:48:56 -07:00
Vasiliy Kuznetsov	fd8e2064e0	quant: switch observers to use min_max (#42957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42957 Switches observers to use the new min_max function to calculate min and max at the same time. We see around 45-50% speedup on representative input shapes on the microbenchmarks for all observers except `HistogramObserver`. Test Plan: CI for correctness performance: ``` cd benchmarks/operator_benchmark // repeat (before diff, after diff) x (cpu, cuda) python -m pt.qobserver_test --tag_filter all --device cpu /* * before, cpu: https://our.intern.facebook.com/intern/paste/P138633280/ * before, cuda: https://our.intern.facebook.com/intern/paste/P138639473/ * after, cpu: https://our.intern.facebook.com/intern/paste/P138635458/ * after, cuda: https://our.intern.facebook.com/intern/paste/P138636344/ */ ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23093995 fbshipit-source-id: 9f416d144109b5b80baf089eb4bcfabe8fe358d5	2020-09-08 11:39:44 -07:00
Qi Zhou	f73ba88946	Avoid resizing in MinMaxObserver (#43789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43789 Since it's single element.. In some cases we may not be able to resize the buffers. Test Plan: unit tests Reviewed By: supriyar Differential Revision: D23393108 fbshipit-source-id: 46cd7f73ed42a05093662213978a01ee726433eb	2020-08-31 17:41:39 -07:00
Supriya Rao	3293fdfa80	[quant] Enable from_float for quantized Embedding_Bag (#43176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43176 Convert floating point nn.EmbeddingBag module to nn.quantized.dynamic.EmbeddingBag module Test Plan: python test/test_quantization.py TestDynamicQuantizedModule.test_embedding_bag_api python test/test_quantization.py TestPostTrainingDynamic.test_embedding_quantization Imported from OSS Reviewed By: vkuzo Differential Revision: D23200196 fbshipit-source-id: 090f47dbf7aceab9c719cbf282fad20fe3e5a983	2020-08-21 11:46:03 -07:00
Vasiliy Kuznetsov	57af1ec145	observers: use torch.all to check for valid min and max values (#43151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43151 Using `torch.all` instead of `torch.sum` and length check. It's unclear whether the increase in perf (~5% for small inputs) is real, but should be a net benefit, especially for larger channel inputs. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170426 fbshipit-source-id: ee5c25eb93cee1430661128ac9458a9c525df8e5	2020-08-17 17:08:57 -07:00
Vasiliy Kuznetsov	3264ba065c	observers: use clamp instead of min/max in calculate_qparams (#43150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43150 The current logic was expensive because it created tensors on CUDA. Switching to clamp since it can work without needing to create tensors. Test Plan: benchmarks Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170427 fbshipit-source-id: 6fe3a728e737aca9f6c2c4d518c6376738577e21	2020-08-17 17:08:54 -07:00
Vasiliy Kuznetsov	a5dfba0a6e	observers: make eps a buffer (#43149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43149 This value doesn't change, making it a buffer to only pay the cost of creating a tensor once. Test Plan: Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23170428 fbshipit-source-id: 6b963951a573efcc5b5a57649c814590b448dd72	2020-08-17 17:08:51 -07:00
Paul Shao	b992a927a9	Clearer Semantics and Naming for Customized Quantization Range Initialization in Observer (#42602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42602 In this diff, clearer semantics and namings for are introduced by splitting the original `init_dynamic_qrange` into 2 separate `Optional[int]` types `qmin` and `qmax` to avoid the confusion of the parameters with dynamic quantization. The `qmin` and `qmax` parameters allow customers to specify their own customary quantization range and enables specific use cases for lower bit quantization. Test Plan: To assert the correctness and compatibility of the changes with existing observers, on a devvm, execute the following command to run the unit tests: `buck test //caffe2/test:quantization -- observer` Reviewed By: vkuzo, raghuramank100 Differential Revision: D22948334 fbshipit-source-id: 275bc8c9b5db4ba76fc2e79ed938376ea4f5a37c	2020-08-13 21:15:23 -07:00
Supriya Rao	816d37b1d8	[quant] Make PerChannel Observer work with float qparams (#42690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42690 Add implementation for new qscheme per_channel_affine_float_qparams in observer Test Plan: python test/test_quantization.py TestObserver.test_per_channel_observers Imported from OSS Reviewed By: vkuzo Differential Revision: D23070633 fbshipit-source-id: 84d348b0ad91e9214770131a72f7adfd3970349c	2020-08-13 11:22:19 -07:00
Presley Graham	7332c21f7a	Speed up HistogramObserver by vectorizing critical path (#41041 ) Summary: 22x speedup over the code this replaces. Tested on ResNet18 on a devvm using CPU only, using default parameters for HistogramObserver (i.e. 2048 bins). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41041 Test Plan: To run the test against the reference (old) implementation, you can use `python test/test_quantization.py TestRecordHistogramObserver.test_histogram_observer_against_reference`. To run the benchmark, while in the folder `benchmarks/operator_benchmark`, you can use `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`. Benchmark results before speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 185818.566 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 165325.916 ``` Benchmark results after speedup: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine Forward Execution Time (us) : 12242.241 # Benchmarking PyTorch: HistogramObserverCalculateQparams # Mode: Eager # Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric # Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric Forward Execution Time (us) : 12655.354 ``` Reviewed By: raghuramank100 Differential Revision: D22400755 Pulled By: durumu fbshipit-source-id: 639ac796a554710a33c8a930c1feae95a1148718	2020-08-07 12:29:23 -07:00
Supriya Rao	38bf5be24f	[quant] Use PlaceholderObserver instead of Fp16Observer and NoopObserver (#42348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42348 Use the dtype info in placeholderObserver to decide what ops to insert in the graph In the next PR we can delete NoopObserver Test Plan: python test/test_quantization.py Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22859457 fbshipit-source-id: a5c618f22315534ebd9a2df77b14a0aece196989	2020-07-31 12:33:56 -07:00
Supriya Rao	8c5bf10264	[quant] Add FP16Observer for fp16 quant support (#42221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42221 Adds a new observer that emits a warning if the range of tensor is beyond fp16 range. This will be further used in graph mode quantization to insert the cast to fp16 ops in the graph Test Plan: python test/test_quantizaton.py TestObserver.test_fp16_observer Imported from OSS Reviewed By: jerryzh168 Differential Revision: D22849222 fbshipit-source-id: a301281ce38ba4d4e7a009308400d34a08c113d2	2020-07-31 12:33:51 -07:00
Supriya Rao	36fb14b68b	[quant] Add Graph Mode Passes to quantize EmbeddingBag operators (#41612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41612 This change adds preliminary support to quantize the EmbeddingBag operators. We currently support 4-bit and 8-bit quantization+packing of the weights. To quantize these operators, specify the operator name in the `custom_op_name` field of the NoopObserver. Based on the op name (4bit or 8bit) we call the corresponding quantization functions. Refer to the testplan for how to invoke the qconfig for the embedding_bag ops. Future versions of this will support 4-bit and 2-bit qtensors with native support to observe and quantize it. NB - This version assumes that the weights in the EmbeddingBag Module reside on the same device. Test Plan: python test/test_quantization.py TestQuantizeDynamicJitOps.test_embedding_bag Imported from OSS Reviewed By: vkuzo, jerryzh168 Differential Revision: D22609342 fbshipit-source-id: 23e33f44a451c26719e6e283e87fbf09b584c0e6	2020-07-23 18:54:59 -07:00
Paul Shao	9e0c746b15	Augmenting Concrete Observer Constructors to Support Dynamic Quantization Range; Modifying Utility Functions in _LearnableFakeQuantize Module for Better Logging and Baseline Construction. (#41815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41815 All are minor changes to enable better simulations. The constructors of MinMaxObserver, MovingAverageMinMaxObserver, PerChannelMinMaxObserver, and MovingAveragePerChannelMinMaxObserver are augmented so they can utilize the dynamic quantization range support in the _ObserverBase class. In addition, minor adjustments are made to the enable_static_observation function that allow observer to update parameters but do not fake quantize on the output (for constructing baseline). Test Plan: To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics: ``` buck test //caffe2/test:quantization -- observer ``` Reviewed By: z-a-f Differential Revision: D22649128 fbshipit-source-id: 32393b706f9b69579dc2f644fb4859924d1f3773	2020-07-21 17:59:40 -07:00
Paul Shao	16dde6e3a0	Augmenting Observers to Support Dynamic Quantization Range (#41113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41113 In this diff, the `ObserverBase` class is augmented with 2 additional optional arguments qmin and qmax. Correspondingly the calculation of qmin and qmax and the related quantization parameters are modified to accommodate this additional flexibility should the number of bits for quantization be lower than 8 (the default value). Additional logic in the base class `_calculate_qparams` function has also been modified to provide support for dynamic quantization range. Test Plan: To ensure this modification is still backward compatible with past usages, numerics are verified by running the quantization unit test suite, which contains various observer tests. The following command executes the test suite, which also verifies the observer numerics: `buck test //caffe2/test:quantization -- observer` This modified observer script can be tested within the experiments for lower bit fake quantization. Please see the following diffs for reference. - Single Fake Quantizer: D22337447 - Single Conv Layer: D22338532 Reviewed By: z-a-f Differential Revision: D22427134 fbshipit-source-id: f405e633289322078b0f4a417f54b684adff2549	2020-07-20 08:51:31 -07:00

1 2 3

116 Commits