Summary: This follows https://github.com/pytorch/pytorch/pull/78452,
which replaced the qconfig_dict with QConfigMapping. This PR
additionally replaces get_default_*qconfig_dict with
get_default_*qconfig_mapping. For backward compatibility, we
deprecate the old functions instead of removing them.
Test Plan:
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
Reviewers: jerryzh168, vkuzo
Subscribers: jerryzh168, vkuzo, supriyar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79618
Approved by: https://github.com/jerryzh168
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79621
Move profiler to use compat bridge methods on libkineto::CpuTraceBuffer. Note that there is a specific order in which these changes must land:
First, the change adding the methods to Kineto must land.
Second, the change updating the pinned version of Kineto in PyTorch must land.
Third, this change must land AND BE COMMITTED TO FBCODE.
Fourth, the change to change kineto to use unique_ptr must land.
And finally, the pinned commit in pytorch/third_party must be updated again.
Only after all of these can the profiler start to rely on kineto using a unique_ptr under the hood.
Differential Revision: [D36679293](https://our.internmc.facebook.com/intern/diff/D36679293/)
Approved by: https://github.com/aaronenyeshi
- Remove wrappers in `__init__` around utils and instead expose those functions directly. Move the docstrings from `__init__` to corresponding functions in utils
- Annotate `torch.onnx.export` types
- Improve docstrings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78231
Approved by: https://github.com/BowenBao
Fixes#78510
This PR adds support for using fractions with `random_split`. This should be completely backwards-compatible as the fractional-style splitting is only applied when the sum across the input lengths is lower than 1.0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78877
Approved by: https://github.com/ejguan
Summary: There seems to be a off-by-one bug in `at::from_blob_quantized_per_tensor_affine`. For an input with sizes {N, C, H, W}, before strides would be calculated as {NCH, CH, H, 1}, now strides is calculated as {CHW, HW, W, 1}. The updated unit test catches this problem.
Test Plan:
```
buck test mode/dev-nosan //caffe2:quantized_test
```
before fix:
```
✓ ListingSuccess: caffe2:quantized_test : 9 tests discovered (15.632)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantDequantAPIs (0.004)
✗ Fail: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerTensor (0.002)
Test output:
> caffe2/aten/src/ATen/test/quantized_test.cpp:247
Expected equality of these values:
qtensor[h][w].item<float>()
Which is: -0.5
(custom_data[i] - zero_point) * scale
Which is: 0
stdout: Note: Google Test filter = TestQTensor.FromBlobQuantizedPerTensor
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from TestQTensor
[ RUN ] TestQTensor.FromBlobQuantizedPerTensor
caffe2/aten/src/ATen/test/quantized_test.cpp:247: Failure
Expected equality of these values:
qtensor[h][w].item<float>()
Which is: -0.5
(custom_data[i] - zero_point) * scale
Which is: 0
[ FAILED ] TestQTensor.FromBlobQuantizedPerTensor (2 ms)
[----------] 1 test from TestQTensor (2 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (2 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] TestQTensor.FromBlobQuantizedPerTensor
1 FAILED TEST
```
after fix:
```
✓ ListingSuccess: caffe2:quantized_test : 9 tests discovered (16.051)
✓ Pass: caffe2:quantized_test - TestQTensor.RoundingMode (0.002)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantizePerChannel4dChannelsLast (0.217)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantDequantAPIs (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.EmptyPerchannelQuantized (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.EmptyQuantized (0.002)
✓ Pass: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerChannel (0.004)
✓ Pass: caffe2:quantized_test - TestQTensor.QuantizePerChannel4d (0.005)
✓ Pass: caffe2:quantized_test - TestQTensor.FromBlobQuantizedPerTensor (0.003)
✓ Pass: caffe2:quantized_test - TestQTensor.Item (0.002)
Summary
Pass: 9
ListingSuccess: 1
```
Differential Revision: D37061355
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79314
Approved by: https://github.com/jerryzh168
Fixing the forward AD for `sgn` in the next PR of this stack uncovered a
number of issues with the derivatives of `l1_loss`. Upon inspection,
`l1_loss` was just implemented as a composite function, but it was not
differentiable. This PR makes it a fully differentiable function.
As a side note, `l1_loss_out` was incorrect in a number of ways. Even
more, it is not exposed to the public as `F.l1_loss` does not accept an
`out=` parameter. As such it is not even tested. I wonder how useful is
to have `out=` variants for loss functions if we don't expose them at
all. Even more, I wonder how useful is to have `_out` variants for loss
functions, given that their most normal use case is to return just a
real number cc jbschlosser
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78257
Approved by: https://github.com/jbschlosser
When `TrainingMode.PRESERVE` is set for export, the exporter used to change the model's training mode based on some logic. Now we respect the option and not touch the model's training state.
- Previously `_set_training_mode`'s behavior doesn't match what the global variable expects. This PR removes the deprecated `_set_training_mode` and makes the type correct.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78583
Approved by: https://github.com/BowenBao
Summary:
Previously, filling a quantized tensor only worked for nchw tensors.
This PR enables support for nhwc tensors. Test cases were added for per
tensor and per channel quantized NHWC tensors.
Test Plan:
```
python test/test_quantization.py -k test_qtensor_fill_per_channel_nhwc
python test/test_quantization.py -k test_qtensor_fill_per_tensor_nhwc
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79025
Approved by: https://github.com/jerryzh168
Summary: Implement decoder-only layer, which doesn't currently exist in torch. Implement incremental and forced decoding with a new multiheadattention and decoder forward pass. Rather similar to the transformer_encoder_layer_fwd. Not a public facing API, although may become public facing eventually.
Stacked on top of https://github.com/pytorch/pytorch/pull/79437.
Test Plan: See D36140513 numerical tests.
Reviewed By: mikekgfb
Differential Revision: D36987004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79438
Approved by: https://github.com/zrphercule
Summary: Refactor to reduce amount of copied code for decoder by finding common chunks for encoder and decoder. QKV in projection is a reasonable unit to copy out.
Test Plan:
buck run mode/opt -c fbcode.platform=platform010 -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=a100 //pytext/fb/tools:benchmark_transformers -- transformer --batch-size 64 --avg-sequence-length 235 --max-sequence-length 256 --iters 100 --module native
Benchmark and numerical tests work fine.
Reviewed By: mikekgfb
Differential Revision: D36138504
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79437
Approved by: https://github.com/jbschlosser
Relates to #76700
**Overview**: Made regex a local variable in the `isGreen` method and improved debugging message for missing relevant workflows. If there are required workflows missing, the message will now print which workflows were not found.
**Test Plan**: Updated relevant test case in `test_print_latest_commits.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79619
Approved by: https://github.com/janeyx99
Summary: The test is throwing a jit alias analysis not supporting error. Disabling it for now.
Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest
Reviewed By: mikeiovine
Differential Revision: D37056032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79574
Approved by: https://github.com/mikeiovine
Summary: This adds the pytorch operators that are currently missing in non-ads models from c2->pt mitigation: aten::index_put, aten::item, aten::tensor_split
Test Plan: buck run mode/opt caffe2/benchmarks/static_runtime:static_runtime_cpptest
Differential Revision: D36984961
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79065
Approved by: https://github.com/davidberard98
Summary:
Previously, the backend code "silently" ignores reduce_range=true when
using the qnnpack backend (which does not require a reduction in range).
We evaluated either 1) respecting the reduction in range to conform with
other backends (e.g., fbgemm) even when qnnpack does support the full
range and outputting a warning to let the user know that reduce_range
shoudl be set to false for qnnpack backend 2) throwing a warning and letting the user know that the
reduce_range=true setting is being ignored.
Option 1 would halve the range which could have some negative
implications to accuracy and lead to bc-breaking changes. Option 2 is also not ideal because it ignores any user settings
for reduce_range=true when using the qnnpack backend with dynamic and
linear quantized ops. We decided to go with option 2 as it is not
bc-breaking.
Fixes#68278
Test plan:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79273
Approved by: https://github.com/jerryzh168, https://github.com/vkuzo