pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 13:44:15 +08:00

Author	SHA1	Message	Date
Edward Z. Yang	3bf922a6ce	Apply UFMT to low traffic torch modules (#106249 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/106249 Approved by: https://github.com/Skylion007	2023-07-29 23:37:30 +00:00
dzdang	e2aa28a2d0	[quant][fx][improvement] Renamed default_affine_fixed_qparams_observer and default_symmetric_fixed_qparams_observer (#76637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/76637 The previous naming convention `default_affine_fixed_qparams_observer` and `default_symmetric_fixed_qparams_observer` were uninformative, and users had to read the definition in order to understand what these observers are. The new naming convention reveals information about the range of the observers The analogous changes were also made for `default_symmetric_fixed_qparams_fake_quant` and `default_affine_fixed_qparams_fake_quant` Test Plan: ``` python test/test_quantization.py ``` ``` python test/test_quantization.py ``` Differential Revision: D36054169 D36054169 Reviewed By: vkuzo Pulled By: dzdang fbshipit-source-id: 215f7786a4b7abda7327f17cc61735697ec5cca9 (cherry picked from commit 21a4e6eda4467c8adca7fd534a506a14e975f9cf)	2022-05-04 02:39:20 +00:00
Vasiliy Kuznetsov	6101cbcedb	torch.ao migration: fake_quantize.py, phase 1 (#64814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/64814 1. move the file ``` hg mv caffe2/torch/quantization/fake_quantize.py caffe2/torch/ao/quantization/ ``` 2. create a new file in the old location and copy the imports 3. fix all callsites inside `torch` Test Plan: ``` buck test mode/dev //caffe2/test:quantization ``` Reviewed By: z-a-f Differential Revision: D30866792 fbshipit-source-id: 7a221cb46c0ab01f1c5de9be061f09ecc83ce23e	2021-09-13 15:22:28 -07:00
Supriya Rao	d5a7579597	[quant] Make version 1 the default for get_default_qat_qconfig (#63043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/63043 In version 1 we use the fused module/operator during QAT. Making this the default for all QAT runs going forward. Older models saved after prepare_qat_fx can still load their state_dict into a model prepared using version 1. The state_dict will still have the same attribute for the observer/fake_quant modules. There may be some numerics difference between the old observer code in observer.py and the new fused module that was re-written in C++/CUDA to perform observe + fake_quantize. This PR also updates the test to check for the new module instead of the default FakeQuantize module. Note: there are also some changes to make the operator work for multi-dim per-channel quantization + updated the test for that. Test Plan: python test/test_quantization.py TestSerialization.test_default_qat_qconfig Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30232222 fbshipit-source-id: f3553a1926ab7c663bbeed6d574e30a7e90dfb5b	2021-08-11 22:06:44 -07:00
Supriya Rao	aa89d5f7f6	[quant] Update get_default_qat_qconfig to return the fused observer+fake_quant module (#62702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62702 Expose the qconfig to the user to speed up training by leveraging the fused module. The module currently supports per-tensor/per-channel moving avg observer and fake-quantize. For details on perf benefits, refer to https://github.com/pytorch/pytorch/pull/61691 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30093719 fbshipit-source-id: b78deb7810f5b597474b9b9a0395d361d04eb46a	2021-08-10 09:28:49 -07:00
Supriya Rao	08d1a12d69	[quant] add reduce_range option to FusedMovingAvgFakeQuantize module (#62863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62863 To make this consistent with other observers, add reduce_range option that can be used to update quant_min/max Test Plan: python test/test_quantization.py test_fused_mod_reduce_range Imported from OSS Reviewed By: raghuramank100 Differential Revision: D30146602 fbshipit-source-id: a2015f095766f9c884611e9ab6942528bc9bc972	2021-08-10 09:27:01 -07:00
Supriya Rao	aa5e3ad705	[quant] Support PerChannel quantization in FusedMovingAvgObsFakeQuantize (#62346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/62346 Update the operator code to resize the min/max tensors if per-channel quant is selected. We need to do this because by default the observer creates empty tensors for min/max and scale/zero_point values when per-channel quantization is enabled Test Plan: python test/test_quantization.py test_fused_mod_per_channel Imported from OSS Reviewed By: HDCharles Differential Revision: D30003835 fbshipit-source-id: b5ec80261cb50ee543f21191a887e979dcde4667	2021-08-01 21:45:11 -07:00
Supriya Rao	b8386f5d72	[quant] Create FusedMovingAvgObsFakeQuantize for QAT (#61691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61691 Create a new module for QAT that does a Fused MovingAvgMinMaxObserver and FakeQuantize operation The module currently only supports per-tensor quantization (affine/symmetric). Follow-up PR will add support for per-channel Results on running QAT with MobileNetV2 (Obs enabled/fake_quant enabled) Original FQ module PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "242.80261993408203"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "505.7964324951172"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "235.80145835876465"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "543.8144207000732"} Fused FakeQuant module (~50% improvement in latency) PyTorchObserver {"type": "_", "metric": "qnnpack_fp_latency_ms", "unit": "ms", "value": "232.1624755859375"} PyTorchObserver {"type": "_", "metric": "qnnpack_qat0_latency_ms", "unit": "ms", "value": "263.8866901397705"} PyTorchObserver {"type": "_", "metric": "fbgemm_fp_latency_ms", "unit": "ms", "value": "236.9832992553711"} PyTorchObserver {"type": "_", "metric": "fbgemm_qat0_latency_ms", "unit": "ms", "value": "292.1590805053711"} Individual module benchmark result (>5x improvement in latency) ===> Baseline FakeQuantize module ``` --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.77% 1.210ms 4.92% 7.730ms 154.596us 718.528us 0.45% 9.543ms 190.862us 50 aten::fake_quantize_per_tensor_affine_cachemask 2.41% 3.792ms 4.15% 6.520ms 130.402us 8.825ms 5.58% 8.825ms 176.492us 50 aten::_aminmax 3.25% 5.105ms 4.43% 6.955ms 139.102us 8.193ms 5.18% 8.193ms 163.868us 50 aten::zeros_like 1.87% 2.939ms 6.95% 10.922ms 109.218us 5.992ms 3.79% 10.844ms 108.442us 100 aten::zeros 0.97% 1.527ms 3.11% 4.885ms 97.702us 2.383ms 1.51% 4.800ms 96.010us 50 aten::rsub 1.34% 2.106ms 2.94% 4.614ms 92.277us 2.063ms 1.30% 4.559ms 91.173us 50 aten::clamp 2.79% 4.381ms 5.42% 8.519ms 85.190us 5.385ms 3.41% 8.438ms 84.381us 100 aten::eq 11.70% 18.384ms 21.31% 33.479ms 83.280us 22.465ms 14.21% 33.310ms 82.861us 402 aten::ones 1.05% 1.656ms 2.57% 4.038ms 80.751us 2.494ms 1.58% 3.951ms 79.028us 50 aten::le 2.52% 3.955ms 4.84% 7.607ms 76.071us 4.998ms 3.16% 7.702ms 77.016us 100 aten::min 0.69% 1.087ms 2.32% 3.641ms 72.827us 1.017ms 0.64% 3.603ms 72.055us 50 aten::max 1.40% 2.195ms 4.62% 7.260ms 72.597us 2.008ms 1.27% 7.140ms 71.404us 100 aten::is_nonzero 2.68% 4.207ms 11.35% 17.829ms 71.033us 4.062ms 2.57% 17.225ms 68.625us 251 aten::detach 1.17% 1.831ms 3.65% 5.736ms 57.360us 1.680ms 1.06% 5.634ms 56.340us 100 aten::mul 3.36% 5.278ms 3.36% 5.278ms 53.862us 5.215ms 3.30% 5.215ms 53.216us 98 aten::div 3.42% 5.376ms 3.42% 5.376ms 53.759us 5.320ms 3.36% 5.320ms 53.196us 100 aten::sub 6.79% 10.672ms 6.79% 10.672ms 53.901us 10.504ms 6.64% 10.504ms 53.050us 198 aten::item 4.06% 6.380ms 12.02% 18.883ms 53.798us 6.127ms 3.87% 18.322ms 52.198us 351 aten::add 3.28% 5.147ms 3.28% 5.147ms 52.518us 5.113ms 3.23% 5.113ms 52.171us 98 aten::minimum 1.63% 2.555ms 1.63% 2.555ms 51.092us 2.585ms 1.64% 2.585ms 51.708us 50 aten::maximum 3.22% 5.065ms 3.22% 5.065ms 50.646us 5.133ms 3.25% 5.133ms 51.329us 100 aten::round 1.61% 2.529ms 1.61% 2.529ms 50.578us 2.528ms 1.60% 2.528ms 50.552us 50 aten::zero_ 1.99% 3.125ms 4.72% 7.422ms 49.481us 2.835ms 1.79% 7.269ms 48.462us 150 aten::copy_ 6.62% 10.394ms 6.62% 10.394ms 41.576us 10.252ms 6.48% 10.252ms 41.010us 250 detach 2.49% 3.905ms 2.49% 3.905ms 39.049us 3.954ms 2.50% 3.954ms 39.539us 100 aten::select 2.01% 3.154ms 2.47% 3.876ms 38.759us 3.866ms 2.44% 3.866ms 38.658us 100 aten::_local_scalar_dense 7.96% 12.503ms 7.96% 12.503ms 35.621us 12.195ms 7.71% 12.195ms 34.743us 351 aten::to 2.31% 3.625ms 4.16% 6.530ms 32.650us 4.320ms 2.73% 6.270ms 31.348us 200 aten::fill_ 3.70% 5.808ms 3.70% 5.808ms 29.039us 5.892ms 3.73% 5.892ms 29.459us 200 aten::as_strided 0.79% 1.244ms 0.79% 1.244ms 6.221us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.55% 5.579ms 3.55% 5.579ms 11.137us 0.000us 0.00% 0.000us 0.000us 501 aten::resize_ 2.36% 3.712ms 2.36% 3.712ms 12.332us 0.000us 0.00% 0.000us 0.000us 301 aten::empty_like 1.45% 2.284ms 3.68% 5.776ms 28.878us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 2.80% 4.398ms 2.80% 4.398ms 17.592us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 157.108ms Self CUDA time total: 158.122ms ``` ===> FusedFakeQuant ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ fb::fused_fake_quant 23.42% 6.408ms 100.00% 27.361ms 547.215us 7.887ms 27.20% 28.996ms 579.925us 50 aten::fake_quantize_per_tensor_affine 4.25% 1.162ms 27.65% 7.565ms 151.298us 686.176us 2.37% 10.217ms 204.336us 50 aten::_fake_quantize_per_tensor_affine_cachemask_ten... 14.11% 3.860ms 23.40% 6.403ms 128.068us 9.531ms 32.87% 9.531ms 190.612us 50 aten::_aminmax 20.57% 5.628ms 27.47% 7.515ms 150.305us 8.218ms 28.34% 8.218ms 164.367us 50 aten::item 3.65% 999.522us 10.27% 2.810ms 56.202us 931.904us 3.21% 2.674ms 53.481us 50 aten::_local_scalar_dense 6.62% 1.811ms 6.62% 1.811ms 36.212us 1.742ms 6.01% 1.742ms 34.843us 50 aten::empty 10.85% 2.969ms 10.85% 2.969ms 14.843us 0.000us 0.00% 0.000us 0.000us 200 aten::as_strided 1.92% 524.365us 1.92% 524.365us 5.244us 0.000us 0.00% 0.000us 0.000us 100 aten::empty_like 6.48% 1.774ms 14.62% 4.000ms 26.670us 0.000us 0.00% 0.000us 0.000us 150 aten::empty_strided 8.14% 2.226ms 8.14% 2.226ms 14.842us 0.000us 0.00% 0.000us 0.000us 150 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 27.361ms Self CUDA time total: 28.996ms ``` Test Plan: python test/test_quantization.py TestFusedObsFakeQuantModule Imported from OSS Reviewed By: vkuzo Differential Revision: D29706889 fbshipit-source-id: ae3f9fb1fc559920459bf6e8663e8299bf7d21e1	2021-07-21 10:13:04 -07:00
Supriya Rao	7a15576a65	[quant] update FakeQuant modules to use tensor qparams (#61318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/61318 Remove the `float()` and `int()` calls in the forward function so that we can directly use the tensor qparams in the fake_quantize operator. Calling `float()/int()` internally calls `item()` which can trigger a gpu-> cpu copy if the original tensors reside on GPU. Local benchmark P427668213 Before this change ``` Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::_aminmax 2.57% 1.507ms 3.10% 1.819ms 36.371us 2.872ms 4.81% 2.872ms 57.446us 50 aten::fake_quantize_per_tensor_affine 1.04% 610.915us 3.60% 2.114ms 42.276us 472.896us 0.79% 2.698ms 53.962us 50 aten::fake_quantize_per_tensor_affine_cachemask 1.69% 993.626us 2.56% 1.503ms 30.058us 2.225ms 3.73% 2.225ms 44.504us 50 aten::is_nonzero 3.85% 2.258ms 19.68% 11.540ms 46.161us 2.168ms 3.63% 11.084ms 44.336us 250 aten::zeros_like 1.82% 1.064ms 6.65% 3.901ms 39.007us 1.531ms 2.57% 3.905ms 39.045us 100 aten::eq 13.80% 8.093ms 25.90% 15.189ms 37.972us 9.580ms 16.05% 15.566ms 38.914us 400 aten::item 5.67% 3.323ms 21.50% 12.607ms 36.019us 3.233ms 5.42% 12.167ms 34.762us 350 aten::zeros 0.94% 549.208us 2.93% 1.717ms 34.343us 688.928us 1.15% 1.695ms 33.894us 50 aten::le 2.52% 1.478ms 4.50% 2.641ms 26.411us 1.753ms 2.94% 2.845ms 28.448us 100 aten::rsub 1.04% 608.715us 2.44% 1.433ms 28.667us 532.000us 0.89% 1.418ms 28.353us 50 aten::max 1.54% 905.401us 4.62% 2.711ms 27.106us 847.488us 1.42% 2.697ms 26.969us 100 aten::ones 0.92% 542.159us 2.16% 1.266ms 25.324us 661.856us 1.11% 1.301ms 26.017us 50 aten::min 0.82% 479.167us 2.15% 1.258ms 25.160us 407.808us 0.68% 1.276ms 25.530us 50 aten::_local_scalar_dense 15.83% 9.284ms 15.83% 9.284ms 26.526us 8.934ms 14.97% 8.934ms 25.524us 350 aten::clamp 2.35% 1.378ms 4.21% 2.467ms 24.669us 1.546ms 2.59% 2.461ms 24.612us 100 aten::zero_ 2.53% 1.482ms 5.65% 3.316ms 22.108us 1.326ms 2.22% 3.380ms 22.531us 150 aten::maximum 3.08% 1.805ms 3.08% 1.805ms 18.052us 1.849ms 3.10% 1.849ms 18.494us 100 aten::minimum 1.33% 778.854us 1.33% 778.854us 15.577us 868.672us 1.46% 868.672us 17.373us 50 aten::round 1.36% 799.910us 1.36% 799.910us 15.998us 809.568us 1.36% 809.568us 16.191us 50 aten::copy_ 6.61% 3.878ms 6.61% 3.878ms 15.513us 4.036ms 6.76% 4.036ms 16.143us 250 aten::div 2.53% 1.483ms 2.53% 1.483ms 14.833us 1.535ms 2.57% 1.535ms 15.353us 100 aten::mul 2.44% 1.431ms 2.44% 1.431ms 14.314us 1.478ms 2.48% 1.478ms 14.782us 100 aten::detach 1.46% 855.670us 2.41% 1.411ms 14.110us 832.448us 1.39% 1.395ms 13.949us 100 aten::add 2.22% 1.301ms 2.22% 1.301ms 13.008us 1.383ms 2.32% 1.383ms 13.828us 100 aten::fill_ 4.18% 2.452ms 4.18% 2.452ms 12.262us 2.693ms 4.51% 2.693ms 13.463us 200 aten::sub 5.06% 2.967ms 5.06% 2.967ms 14.837us 2.675ms 4.48% 2.675ms 13.374us 200 aten::to 2.10% 1.230ms 3.65% 2.140ms 10.701us 1.310ms 2.20% 2.062ms 10.310us 200 aten::select 1.28% 749.144us 1.49% 874.227us 8.742us 863.232us 1.45% 863.232us 8.632us 100 detach 0.95% 555.326us 0.95% 555.326us 5.553us 562.496us 0.94% 562.496us 5.625us 100 aten::as_strided 0.40% 232.289us 0.40% 232.289us 1.161us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 2.93% 1.720ms 2.93% 1.720ms 3.439us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.04% 611.313us 1.04% 611.313us 2.038us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.75% 438.585us 1.77% 1.036ms 5.180us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.36% 799.442us 1.36% 799.442us 3.198us 0.000us 0.00% 0.000us 0.000us 250 --------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 58.645ms Self CUDA time total: 59.674ms ``` After this change ``` test_fake_quant_profiler (scripts.supriyar.benchmark.module_bench.ProfilerBench) ... ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ aten::fake_quantize_per_tensor_affine 0.98% 505.210us 4.38% 2.259ms 45.187us 419.424us 0.78% 3.218ms 64.367us 50 aten::_aminmax 2.78% 1.434ms 3.42% 1.766ms 35.321us 2.825ms 5.27% 2.825ms 56.505us 50 aten::fake_quantize_per_tensor_affine_cachemask_tens... 2.38% 1.229ms 3.40% 1.754ms 35.083us 2.799ms 5.22% 2.799ms 55.979us 50 aten::rsub 0.94% 485.040us 5.02% 2.590ms 51.793us 458.976us 0.86% 2.587ms 51.747us 50 aten::is_nonzero 3.78% 1.952ms 23.64% 12.196ms 48.786us 2.055ms 3.83% 11.986ms 47.944us 250 aten::item 6.92% 3.572ms 19.86% 10.244ms 40.977us 3.670ms 6.85% 9.931ms 39.724us 250 aten::zeros_like 1.65% 848.874us 6.64% 3.426ms 34.260us 1.397ms 2.61% 3.572ms 35.717us 100 aten::zeros 0.85% 436.691us 3.00% 1.549ms 30.984us 551.936us 1.03% 1.576ms 31.516us 50 aten::eq 10.60% 5.467ms 20.26% 10.452ms 26.130us 7.018ms 13.09% 10.832ms 27.079us 400 aten::le 2.58% 1.332ms 4.67% 2.407ms 24.074us 1.580ms 2.95% 2.614ms 26.144us 100 aten::_local_scalar_dense 12.93% 6.673ms 12.93% 6.673ms 26.691us 6.261ms 11.68% 6.261ms 25.046us 250 aten::clamp 2.43% 1.253ms 4.37% 2.256ms 22.560us 1.431ms 2.67% 2.273ms 22.725us 100 aten::ones 0.89% 460.133us 2.18% 1.123ms 22.467us 570.496us 1.06% 1.128ms 22.551us 50 aten::min 0.74% 383.132us 2.06% 1.065ms 21.296us 377.536us 0.70% 1.091ms 21.824us 50 aten::zero_ 2.36% 1.219ms 5.87% 3.029ms 20.194us 1.261ms 2.35% 3.199ms 21.327us 150 aten::max 1.51% 779.081us 4.06% 2.096ms 20.960us 791.680us 1.48% 2.130ms 21.295us 100 aten::sub 7.97% 4.111ms 7.97% 4.111ms 20.556us 3.847ms 7.18% 3.847ms 19.234us 200 aten::div 2.94% 1.516ms 2.94% 1.516ms 15.158us 1.580ms 2.95% 1.580ms 15.798us 100 aten::round 1.45% 750.445us 1.45% 750.445us 15.009us 756.064us 1.41% 756.064us 15.121us 50 aten::copy_ 6.88% 3.548ms 6.88% 3.548ms 14.190us 3.701ms 6.90% 3.701ms 14.803us 250 aten::minimum 1.32% 681.654us 1.32% 681.654us 13.633us 713.664us 1.33% 713.664us 14.273us 50 aten::maximum 2.55% 1.317ms 2.55% 1.317ms 13.169us 1.338ms 2.50% 1.338ms 13.378us 100 aten::mul 2.63% 1.358ms 2.63% 1.358ms 13.581us 1.328ms 2.48% 1.328ms 13.283us 100 aten::detach 1.34% 688.820us 2.35% 1.211ms 12.110us 772.800us 1.44% 1.278ms 12.779us 100 aten::fill_ 4.53% 2.338ms 4.53% 2.338ms 11.692us 2.495ms 4.65% 2.495ms 12.473us 200 aten::add 2.32% 1.197ms 2.32% 1.197ms 11.968us 1.240ms 2.31% 1.240ms 12.405us 100 aten::to 2.07% 1.069ms 3.66% 1.889ms 9.443us 1.224ms 2.28% 1.975ms 9.874us 200 aten::select 1.44% 743.042us 1.64% 848.207us 8.482us 641.600us 1.20% 641.600us 6.416us 100 detach 1.01% 522.155us 1.01% 522.155us 5.222us 505.088us 0.94% 505.088us 5.051us 100 aten::as_strided 0.44% 227.884us 0.44% 227.884us 1.139us 0.000us 0.00% 0.000us 0.000us 200 aten::empty 3.20% 1.652ms 3.20% 1.652ms 3.304us 0.000us 0.00% 0.000us 0.000us 500 aten::resize_ 1.25% 646.711us 1.25% 646.711us 2.156us 0.000us 0.00% 0.000us 0.000us 300 aten::empty_like 0.79% 407.768us 2.07% 1.067ms 5.334us 0.000us 0.00% 0.000us 0.000us 200 aten::empty_strided 1.52% 785.788us 1.52% 785.788us 3.143us 0.000us 0.00% 0.000us 0.000us 250 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Self CPU time total: 51.590ms Self CUDA time total: 53.609ms ghstack-source-id: 133370215 Test Plan: buck test mode/dev-nosan caffe2/test/:quantization Reviewed By: raghuramank100 Differential Revision: D29566512 fbshipit-source-id: 1aefca51f99949da7334bcfe504848275c9f952c	2021-07-10 19:43:02 -07:00
Supriya Rao	4887c6e401	[quant] avoid resize calls in observer/fake_quant (#60386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/60386 During QAT we sometimes encounter errors with scripted models `RuntimeError: cannot resize variables that require grad` For per-tensor cases we don't need to resize some buffers so this PR removes the extra resize ops where applicable Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D29271905 fbshipit-source-id: 01a484a9559a3a4180490f9476d0cd3044ba0d1b	2021-06-22 17:41:43 -07:00
Vasiliy Kuznetsov	05c8cd748d	memory efficient per-channel fq: use it everywhere, delete old version (#51265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51265 This PR is the cleanup after #51159. High level, we make the new definition of fake_quant per channel be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point fake_quantize_per_channel_affine's implementation to be fake_quantize_per_channel_affine_cachemask 2. delete the fake_quantize_per_channel_affine backward, autograd will automatically use the cachemask backward 3. delete all the fake_quantize_per_channel_affine kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26120957 fbshipit-source-id: 264426435fabd925decf6d1f0aa79275977ea29b	2021-01-28 19:42:25 -08:00
Vasiliy Kuznetsov	267e243064	fake_quant: more memory efficient per-channel backward (#51255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51255 This is the same as #50561, but for per-channel fake_quant. TODO before land write up better Memory and performance impact (MobileNetV2): TODO Performance impact (microbenchmarks): https://gist.github.com/vkuzo/fbe1968d2bbb79b3f6dd776309fbcffc * forward pass on cpu: 512ms -> 750ms (+46%) * forward pass on cuda: 99ms -> 128ms (+30%) * note: the overall performance impact to training jobs should be minimal, because this is used for weights, and relative importance of fq is dominated by fq'ing the activations * note: we can optimize the perf in a future PR by reading once and writing twice Test Plan: ``` python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_forward_per_channel_cachemask_cuda python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cpu python test/test_quantization.py TestFakeQuantize.test_backward_per_channel_cachemask_cuda ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26117721 fbshipit-source-id: 798b59316dff8188a1d0948e69adf9e5509e414c	2021-01-28 19:39:35 -08:00
Vasiliy Kuznetsov	0335222a4a	memory efficient fq: use it everywhere, delete the old version (#51159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51159 This PR is the cleanup after #50561. High level, we make the new definition of fake_quant be the definition used by autograd, but keep the old function around as a thin wrapper to keep the user facing API the same. In detail: 1. point `fake_quantize_per_tensor_affine`'s implementation to be `fake_quantize_per_tensor_affine_cachemask` 2. delete the `fake_quantize_per_tensor_affine` backward, autograd will automatically use the cachemask backward 3. delete all the `fake_quantize_per_tensor_affine` kernels, since they are no longer used by anything Test Plan: ``` python test/test_quantization.py TestFakeQuantize ``` performance testing was done in the previous PR. Imported from OSS Reviewed By: jerryzh168 Differential Revision: D26090869 fbshipit-source-id: fda042881f77a993a9d15dafabea7cfaf9dc7c9c	2021-01-27 19:39:05 -08:00
Vasiliy Kuznetsov	983b8e6b62	fake_quant: add a more memory efficient version (#50561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50561 Not for review yet, a bunch of TODOs need finalizing. tl;dr; add an alternative implementation of `fake_quantize` which saves a ask during the forward pass and uses it to calculate the backward. There are two benefits: 1. the backward function no longer needs the input Tensor, and it can be gc'ed earlier by autograd. On MobileNetV2, this reduces QAT overhead by ~15% (TODO: link, and absolute numbers). We add an additional mask Tensor to pass around, but its size is 4x smaller than the input tensor. A future optimization would be to pack the mask bitwise and unpack in the backward. 2. the computation of `qval` can be done only once in the forward and reused in the backward. No perf change observed, TODO verify with better matrics. TODO: describe in more detail Test Plan: OSS / torchvision / MobileNetV2 ``` python references/classification/train_quantization.py --print-freq 1 --data-path /data/local/packages/ai-group.imagenet-256-smallest-side/prod/ --output-dir ~/nfs/pytorch_vision_tests/ --backend qnnpack --epochs 5 TODO paste results here ``` TODO more Imported from OSS Reviewed By: ngimel Differential Revision: D25918519 fbshipit-source-id: ec544ca063f984de0f765bf833f205c99d6c18b6	2021-01-27 19:36:04 -08:00
Vasiliy Kuznetsov	f8eefbdf7a	fake_quant: fix device affinity and buffer resizing for state_dict (#50868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50868 Ensures that `FakeQuantize` respects device affinity when loading from state_dict, and knows how to resize scale and zero_point values (which is necessary for FQ classes wrapping per channel observers). This is same as https://github.com/pytorch/pytorch/pull/44537, but for `FakeQuantize`. Test Plan: ``` python test/test_quantization.py TestObserver.test_state_dict_respects_device_affinity ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25991570 fbshipit-source-id: 1193a6cd350bddabd625aafa0682e2e101223bb1	2021-01-25 13:50:28 -08:00
Richard Barnes	14edc726d9	Clean up some type annotations in caffe2/torch/quantization (#49942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49942 Upgrades type annotations from Python2 to Python3 Test Plan: Sandcastle tests Reviewed By: vkuzo Differential Revision: D25717551 fbshipit-source-id: 1b63dc485ecf6641641b05f7ce095ae1d2d87346	2020-12-29 15:43:50 -08:00
Jerry Zhang	72918e475e	[quant] FakeQuantize inherit from FakeQuantizeBase (#48072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48072 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D25011074 fbshipit-source-id: 260f4d39299bc148b65c21e67b571dfa1d0fe2ad	2020-11-18 19:14:20 -08:00
Vasiliy Kuznetsov	5977d1d864	FixedQParamsFakeQuantize: adjust default quant_min and quant_max (#47423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47423 Since the dtype of this fake_quant is `quint8`, the output range should be from 0 to 255. Fixing. This should address the numerical inaccuracies with sigmoid and hardsigmoid with `FixedQParamsFakeQuantize` attached compared to their quantized counterparts. In a future PR, might be safer to also make the activation functions using `FixedQParamsFakeQuantize` to explicitly specify their expected output range and zero_point. Leaving that for later, as this bugfix should be landed urgently. Test Plan: Manual script which gives low SQNR before this PR and high SQNR after this PR: https://gist.github.com/vkuzo/9906bae29223da72b10d6b6aafadba42 https://github.com/pytorch/pytorch/pull/47376, which can be landed after this, adds a proper test. Imported from OSS Reviewed By: ayush29feb, jerryzh168 Differential Revision: D24751497 fbshipit-source-id: 4c32e22a30116caaceeedb4cd47146d066054a89	2020-11-05 09:06:55 -08:00
Jerry Zhang	6b50ccc41c	[quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738 ) (#46871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46871 Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24547180 fbshipit-source-id: d2eb9aa74c6e5436204376b1a2ebcc6188d3562f	2020-10-26 23:52:07 -07:00
Alban Desmaison	25db74bf5e	Revert D24486972: [quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat Test Plan: revert-hammer Differential Revision: D24486972 (`e927b62e73`) Original commit changeset: c9f139bfdd54 fbshipit-source-id: 2a75f5ec93d55a62b40d1cdd49adcf65436058f7	2020-10-26 12:47:05 -07:00
Jerry Zhang	e927b62e73	[quant][graphmode][fx] Support sigmoid/hardsigmoid/tanh in qat (#46738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46738 Test Plan: Imported from OSS Reviewed By: raghuramank100 Differential Revision: D24486972 fbshipit-source-id: c9f139bfdd54973da1a93a45e32937595dbe67fc	2020-10-26 12:04:42 -07:00
Jerry Zhang	13decddae2	[reland][quant] Add FixedQParamsFakeQuantize module (#45538 ) (#46657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46657 This is used to simulate fake quantize operation for ops with fixed quantization parameters e.g. hardsigmoid Test Plan: Imported from OSS Imported from OSS Reviewed By: vkuzo Differential Revision: D24451406 fbshipit-source-id: 26cc140c00f12bdec9a8f9dc880f4c425f4d4074	2020-10-21 16:47:11 -07:00
Ashkan Aliabadi	2181449068	Revert D24004795: [quant] Add FixedQParamsFakeQuantize module Test Plan: revert-hammer Differential Revision: D24004795 (`253918ec55`) Original commit changeset: fc4797f80842 fbshipit-source-id: 663169e90a2f58e5a89e4d382291ae41c24d0fee	2020-10-20 19:40:21 -07:00
Jerry Zhang	253918ec55	[quant] Add FixedQParamsFakeQuantize module (#45538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45538 This is used to simulate fake quantize operation for ops with fixed quantization parameters e.g. hardsigmoid Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D24004795 fbshipit-source-id: fc4797f80842daacd3b3584c5b72035774634edd	2020-10-20 17:43:25 -07:00
Sam Estep	24187a0b42	Enable type check for torch.quantization.fake_quantize (#45701 ) Summary: Addresses part of https://github.com/pytorch/pytorch/issues/42969. Pull Request resolved: https://github.com/pytorch/pytorch/pull/45701 Reviewed By: walterddr Differential Revision: D24066672 Pulled By: samestep fbshipit-source-id: 53bb5e7b4703738d3de86fa89fb0980f1d6251f3	2020-10-02 09:27:34 -07:00
Supriya Rao	1fde54d531	[quant][qat] Ensure fake_quant and observer can be disabled on scriptmodule (#44773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44773 The model is created and prepared using fx APIs and then scripted for training. In order to test QAT on scriptmodel we need to be able to disable/enable fake_quant and observer modules on it. Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23741354 fbshipit-source-id: 3fee7aa9b049d9901313b977710f4dc1c4501532	2020-09-17 10:21:52 -07:00
Xiang Gao	20ac736200	Remove py2 compatible future imports (#44735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44735 Reviewed By: mruberry Differential Revision: D23731306 Pulled By: ezyang fbshipit-source-id: 0ba009a99e475ddbe22981be8ac636f8a1c8b02f	2020-09-16 12:55:57 -07:00
Supriya Rao	3f512b0de2	[quant][qat] Ensure observers and fq modules are scriptable (#44749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44749 Ensure fx module is scriptable after calling prepare_qat on it Test Plan: python test/test_quantization.py TestQuantizeFx.test_qat_and_script Imported from OSS Reviewed By: jerryzh168 Differential Revision: D23718380 fbshipit-source-id: abf63ffb21e707f7def8f6c88246877f5aded58c	2020-09-16 09:30:07 -07:00
Jerry Zhang	85752b989d	[quant][doc] Print more info for fake quantize module (#43031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43031 fixes: https://github.com/pytorch/pytorch/issues/43023 Test Plan: Imported from OSS Reviewed By: vkuzo Differential Revision: D23116200 fbshipit-source-id: faa90ce8711da0785d635aacd0362c45717cfacc	2020-08-13 20:27:36 -07:00
Vasiliy Kuznetsov	94dfc76e3f	graph mode qat: make fake_quantize scriptable (#39750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39750 Add a test to make the default QAT qconfig scriptable, and fix all the errors. Test Plan: ``` python test/test_quantization.py TestQATScript.fake_quant_scriptable ``` Imported from OSS Differential Revision: D21975879 fbshipit-source-id: 8c48ad9f24b2c941d2267cb53eb70ebecd103744	2020-06-10 21:34:18 -07:00
Vasiliy Kuznetsov	8292742ba0	fake_quant: move observer and fake_quant flags into buffers (#38368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38368 There is a need for some customers to enable/disable these flags in the middle of QAT. To make it work properly with DDP, we need to implement them using buffers so that they are replicated properly to all the nodes. This should solve issue https://github.com/pytorch/pytorch/issues/38081 Test Plan: CI Imported from OSS Differential Revision: D21537607 fbshipit-source-id: 8c9da022beb7aaa44c658268f02f99dd5aee93fd	2020-05-18 09:30:07 -07:00
Vasiliy Kuznetsov	b57c8b720e	[wip] Make quantization modules work with DataParallel (#37032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032 DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see https://github.com/pytorch/pytorch/pull/12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Imported from OSS Differential Revision: D21206454 fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7	2020-05-05 13:06:43 -07:00
Gao, Xiang	45e4b614d1	Per channel quantization performance improvement (#33772 ) Summary: Benchmark: NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X ```python import torch print(torch.__version__) for i in range(1000): torch.randn(1024 * 128, device='cuda') def cuda(e): a = torch.randn(2 e, 32, device='cuda') s = torch.randn(32, device='cuda') z = torch.randn(32, device='cuda') torch.cuda.synchronize() %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize() def cpu(e): a = torch.randn(2 e, 32, device='cpu') s = torch.randn(32, device='cpu') z = torch.randn(32, device='cpu') %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); for i in range(10, 24): cuda(i) print() for i in range(10, 32): cpu(i) ``` Before ``` 1.5.0a0+9bc922d 849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 1.5.0a0+a7ec8cc 92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Fixes https://github.com/pytorch/pytorch/issues/33647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772 Differential Revision: D20112531 Pulled By: ngimel fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f	2020-02-26 10:19:25 -08:00
Supriya Rao	996c0adb53	[quant] Regsiter fake_quant and observer attributes as buffers (#33626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626 For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest. Test Plan: Tested on actual model on GPU Imported from OSS Differential Revision: D20038839 fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4	2020-02-24 14:16:03 -08:00
Jerry Zhang	8c1268aad3	Use default scale/zero_point in fake_quantize module instead of None (#32318 ) Summary: Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out. fixes: https://github.com/pytorch/pytorch/issues/32082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318 Differential Revision: D19434801 Pulled By: jerryzh168 fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469	2020-01-17 11:04:08 -08:00
Raghuraman Krishnamoorthi	eccf42fd15	Bug fix: Handle missing keys in observer state dict during load (#30357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357 Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant. ghstack-source-id: 94468814 Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly. Differential Revision: D18668517 fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7	2019-11-26 06:53:45 -08:00
Jerry Zhang	661a6c8ef2	Add `get_qparams` and revert the changes to `calculate_qparams` (#30262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262 `get_qparams` returns all parameters that's needed to call quantize function Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18645047 fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a	2019-11-26 06:53:26 -08:00
Jerry Zhang	f2b851a9e5	Returning axis from calculate_qparams (#29494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494 `calculate_qparams` of per channel quantization should return the axis, this PR added this and also added corresponding support in graph mode Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18580905 fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396	2019-11-20 11:06:48 -08:00
Zafar Takhirov	a5ac7f6387	Changing observer name Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27779 Test Plan: Imported from OSS Differential Revision: D17886605 Pulled By: z-a-f fbshipit-source-id: 68c50b482e65015336ff27171fd730da493525b6	2019-10-17 11:36:03 -07:00
Chris Gottbrath	a96b003b39	docstring only formatting changes: quantize.py, fake_quantize.py, observer.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27415 Reviewed By: zafartahirov Differential Revision: D17783101 Pulled By: gottbrath fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f	2019-10-08 09:21:03 -07:00
Raghuraman Krishnamoorthi	ac0f18437f	MovingAverage Observer (#27396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396 Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches ghstack-source-id: 91369018 Test Plan: buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details Differential Revision: D17727213 fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0	2019-10-04 16:28:59 -07:00
Raghuraman Krishnamoorthi	9e3ba35500	Add control for observers in Fake-quantize module (#27113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27113 Fix bug in fake quant control of observer and fake-quantize operations. Add test to ensure that features work as expected ghstack-source-id: 91071181 Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control Differential Revision: D17678875 fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b	2019-09-30 18:23:26 -07:00
Raghuraman Krishnamoorthi	7dc7075795	Per channel fake quant (#26623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26623 Per-channel fake quant cpu and cuda operators, per-channel support in fake quant module, tests for per-channel fake-quant and serializability of fake quant modules ghstack-source-id: 91008299 ghstack-source-id: 91008299 Test Plan: buck test mode/dev caffe2/test:fake_quant -- Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929 ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed) ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed) ✓ caffe2/test:fake_quant - main 0.000 (passed) Differential Revision: D17439406 fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077	2019-09-30 00:21:25 -07:00
Raghuraman Krishnamoorthi	2ccbdb79c8	Per-channel baseline (#26516 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26516 ghstack-source-id: 90982010 Test Plan: Integrate per-channel support into conv and linear modules. The following tests pass: buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details Differential Revision: D17342622 fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e	2019-09-28 14:05:06 -07:00
Raghuraman Krishnamoorthi	8fa9900c28	control of observer/fake-quant operations (#26520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26520 Hooks to enable control of observer and fake quant that can be used by model.apply() to control fake quant during QAT ghstack-source-id: 90897063 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17491155 fbshipit-source-id: 80ff0d7a1ac35c96e054b4f0165a73c56c2f53cc	2019-09-27 11:01:34 -07:00
Raghuraman Krishnamoorthi	b0a2f6f2f5	Serialization and range reduction support for Fake Quant/Observer (#26519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26519 ghstack-source-id: 90895631 Test Plan: buck test caffe2/test:quantization -- 'test_histogram_observer \(test_quantization\.ObserverTest\)' --print-passing-details and buck test caffe2/test:fake_quant -- 'test_fq_serializable \(test_fake_quant\.TestFakeQuantizePerTensorAffine\)' --print-passing-details Differential Revision: D17217408 fbshipit-source-id: 0da7efdcdae0c065dd035c5dd2b6a78231545ece	2019-09-27 10:09:39 -07:00
Raghuraman Krishnamoorthi	9a5e2e80b8	Fake quantization enhancements for QAT/PTQ support- fix tests (#26876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26876 Add ability to turn fake quantization and observers independently. ghstack-source-id: 90892132 Test Plan: buck test caffe2/test:quantized -- 'test_conv_bn_relu \(test_qat\.IntrinsicQATModuleTest\)' --print-passing-details Differential Revision: D17592961 fbshipit-source-id: 24c60c94ed7c6c9fa55c634a8545731614e4f52f	2019-09-27 08:59:29 -07:00
Richard Zou	be93d30e37	Revert D17458232: Fake quantization enhancements for QAT/PTQ support Test Plan: revert-hammer Differential Revision: D17458232 Original commit changeset: f44380c60f1a fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2	2019-09-25 04:56:30 -07:00
Raghuraman Krishnamoorthi	e2c3d7e52c	Fake quantization enhancements for QAT/PTQ support (#26420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420 Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant. ghstack-source-id: 90704254 Test Plan: buck test caffe2/test:fake_quant -- --print-passing-details buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17458232 fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62	2019-09-25 02:02:00 -07:00
Dmytro Dzhulgakov	a79b3685db	Simplify observers declaration with functools.partial (#26492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26492 Previous definition of observers was quite clumsy - with things like `default_observer()()`. This PR strips a way a lot of craft and allows to pass just class names directly. In order to override default arguments either `functools.partial` can be used or convenient wrapper `MyObserver.with_args(x=1)` is provided. Also rename `QConfig_dynamic` to `QConfigDynamic` because it violates the naming convention. Test Plan: Imported from OSS Differential Revision: D17521265 Pulled By: dzhulgakov fbshipit-source-id: ba9df19b368641acf4093c43df9990796284fd9e	2019-09-23 10:15:59 -07:00

1 2

55 Commits