Commit Graph

33437 Commits

Author SHA1 Message Date
8ab22a080b Build pytorch_android using Gradle wrapper. (#51067)
Summary:
[Here](https://docs.gradle.org/current/userguide/gradle_wrapper.html), there is the following description.
`The recommended way to execute any Gradle build is with the help of the Gradle Wrapper`

I took a little time to prepare Gradle for `pytorch_android` build. (version etc.)

I think using Gradle wrapper will make `pytorch_android` build more seamless.

Gradle wrapper version: 4.10.3

250c71121b/.circleci/scripts/build_android_gradle.sh (L13)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51067

Reviewed By: izdeby

Differential Revision: D26315718

Pulled By: IvanKobzarev

fbshipit-source-id: f8077d7b28dc0b03ee48bcdac2f5e47d9c1f04d9
2021-02-09 03:09:08 -08:00
034a007ad8 Remind about AutoNonVariableTypeMode in error message. (#51655)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51655

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D26228508

Pulled By: ailzhang

fbshipit-source-id: f5f48fde3611c84cc6473b77824ebf9dffbb4453
2021-02-08 19:22:38 -08:00
2303c244fc skip a second call to shouldUseRecordFunction for BackendSelect ops (#50891)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50891

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D25999514

Pulled By: bdhirsh

fbshipit-source-id: 8a6c17ab502fe463cf3fb38a1e555c64bc5556f0
2021-02-08 18:32:40 -08:00
7b9ca54ecf Reset checkpoint_valid flag when error happens during function execution (#51746)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/37874, https://github.com/pytorch/pytorch/issues/51743

Uses RAII to manage the flag so that it gets reset properly on exception

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51746

Reviewed By: izdeby

Differential Revision: D26319619

Pulled By: soulitzer

fbshipit-source-id: ea1235438ba516f99195c83fa23d5880f9977c93
2021-02-08 17:48:25 -08:00
dac730af11 Warn if mypy version doesn't match CI (#51799)
Summary:
This PR adds a local [`mypy` plugin](https://mypy.readthedocs.io/en/stable/extending_mypy.html#extending-mypy-using-plugins) that warns if you accidentally run `mypy` using a version that doesn't match [the version we install for CI](6045663f39/.circleci/docker/common/install_conda.sh (L117)), since this trips people up sometimes when `mypy` gives errors in some versions (see https://github.com/pytorch/pytorch/issues/51513) but not others.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51799

Test Plan:
To check that this doesn't break our `mypy` test(s) when you have the correct version installed:
```
python test/test_type_hints.py
```
To check that this does indeed warn when you have an incorrect `mypy` version installed, switch to a different version (e.g. 0.782), and run the above command or either of these:
```
mypy
mypy --config-file=mypy-strict.ini
```
You should get the following message on stderr:
```
You are using mypy version 0.782, which is not supported
in the PyTorch repo. Please switch to mypy version 0.770.

For example, if you installed mypy via pip, run this:

    pip install mypy==0.770

Or if you installed mypy via conda, run this:

    conda install -c conda-forge mypy=0.770
```

Reviewed By: janeyx99

Differential Revision: D26282010

Pulled By: samestep

fbshipit-source-id: 7b423020d0529700dea8972b27afa2d7068e1b12
2021-02-08 15:43:18 -08:00
21ef248fb8 [reland] Report test time regressions (#50171)
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.

**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.

- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`

Notes on the approach:

- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171

Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```

To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test

To test locally, use the following steps.

First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:

- https://pastebin.com/Jj1svhAn

Reviewed By: malfet, izdeby

Differential Revision: D26317769

Pulled By: samestep

fbshipit-source-id: 1ba06cec0fafac77f9e7341d57079543052d73db
2021-02-08 15:35:21 -08:00
9e4f3b89c4 [Gradient Compression] Add register_comm_hook API to DDP communication hooks documentation page (#51846)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51846

`register_comm_hook` method is defined in DistributedDataParallel module, but it is not covered in `distributed.rst`. Since it's closely related to DDP communication hook, add the docstrings to `ddp_comm_hooks.rst` instead of a reference.

Screenshot:

{F370425625}
ghstack-source-id: 121278173

Test Plan:
view locally

python_doc_test:
https://app.circleci.com/pipelines/github/pytorch/pytorch/271234/workflows/dc0b443d-8a62-4334-9b42-800c33a68553/jobs/10770636

Reviewed By: rohan-varma

Differential Revision: D26298191

fbshipit-source-id: 32e0685fd3c935cf9a2d129e6c520a94aa3e3817
2021-02-08 15:12:39 -08:00
1e70b4bb73 Add GH Actions CI to build nightly Docker and push to GitHub Container Registry (#51755)
Summary:
Currently PyTorch repository provides Dockerfile to build Docker with nightly builds, but it doesn't have CI to actually build those Dockers.
This PR adds a GitHub action workflow to create PyTorch nightly build Docker and publish them to GitHub Container Registry.
Also, add "--always" option to the `git describe --tags` command that generates the Docker image tag.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51755

Test Plan: Manually trigger the workflow build in the GitHub Actions web UI.

Reviewed By: seemethere

Differential Revision: D26320180

Pulled By: xuzhao9

fbshipit-source-id: e00b472df14f5913cab9b06a41e837014e87f1c7
2021-02-08 14:59:30 -08:00
58eb23378f Clean up usage of torch._six partially (#49785)
Summary:
See https://github.com/pytorch/pytorch/issues/42919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49785

Reviewed By: mruberry

Differential Revision: D25963833

Pulled By: bugra

fbshipit-source-id: 11c90d6b8d3f206c9d0a4d8621b773beb10c6ba2
2021-02-08 13:58:34 -08:00
97e35858ec [Resubmit] Add compare_set operation and test to TCPStore (#51815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51815

This is resubmission of #51593, already approved.

Test Plan: Imported from OSS

Reviewed By: izdeby

Differential Revision: D26316875

Pulled By: H-Huang

fbshipit-source-id: d81cb131ef6b9e2ebaee32bb505dfc11235bc29d
2021-02-08 13:44:31 -08:00
7363da7c57 onnx export of per channel fake quantize functions (#42835)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/39502

This PR adds support for exporting **fake_quantize_per_channel_affine** to a pair of QuantizeLinear and DequantizeLinear. Per tensor support was added by PR https://github.com/pytorch/pytorch/pull/39738.

`axis` attribute of QuantizeLinear and DequantizeLinear, which is required for per channel support, is added in opset13 added by https://github.com/onnx/onnx/pull/2772.

[update 1/20/2021]: opset13 is being supported on master, the added function is now properly tested. Code also rebased to new master.

The function is also tested offline with the following code
```python
import torch
from torch import quantization

from torchvision import models
qat_resnet18 = models.resnet18(pretrained=True).eval().cuda()

qat_resnet18.qconfig = quantization.QConfig(
    activation=quantization.default_fake_quant, weight=quantization.default_per_channel_weight_fake_quant)
quantization.prepare_qat(qat_resnet18, inplace=True)
qat_resnet18.apply(quantization.enable_observer)
qat_resnet18.apply(quantization.enable_fake_quant)

dummy_input = torch.randn(16, 3, 224, 224).cuda()
_ = qat_resnet18(dummy_input)
for module in qat_resnet18.modules():
    if isinstance(module, quantization.FakeQuantize):
        module.calculate_qparams()
qat_resnet18.apply(quantization.disable_observer)

qat_resnet18.cuda()

input_names = [ "actual_input_1" ]
output_names = [ "output1" ]

torch.onnx.export(qat_resnet18, dummy_input, "quant_model.onnx", verbose=True, opset_version=13)
```
It can generate the desired graph.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/42835

Reviewed By: houseroad

Differential Revision: D26293823

Pulled By: SplitInfinity

fbshipit-source-id: 300498a2e24b7731b12fa2fbdea4e73dde80e7ea
2021-02-08 13:09:50 -08:00
159c48b19b Fix triplet margin loss and reciprocal docs (#51650)
Summary:
Reciprocal: the note should be placed after the formula

Triplet-margin-loss (before):
![image](https://user-images.githubusercontent.com/13428986/106784863-cb3eb780-661a-11eb-8372-07b51e4cb2d4.png)
After:
![image](https://user-images.githubusercontent.com/13428986/106784948-e5789580-661a-11eb-890c-6185aab96e54.png)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51650

Reviewed By: izdeby

Differential Revision: D26314151

Pulled By: soulitzer

fbshipit-source-id: d7574e64e96a41a515231ba7e1008de8b2f292aa
2021-02-08 12:15:11 -08:00
d90911adf9 fix AdaptiveAveragePooling crash problem for non support input (#51443)
Summary:
For none support input, we should not do check in a parallel region, this PR will first do the dtype check, and then do parallel for.
Fixes https://github.com/pytorch/pytorch/issues/51352.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51443

Reviewed By: izdeby

Differential Revision: D26305584

Pulled By: ngimel

fbshipit-source-id: 6faa3148af5bdcd7246771c0ecb4db2b31ac82c6
2021-02-08 11:43:25 -08:00
b9acfcddeb Support mypy ignore annotation with particular rule specified (#51675)
Summary:
Previously TorchScript allows a ignore-all type check suppression rule that looks like
```
code code code  # type: ignore
```

But a more common use case is
```
code code code  # type: ignore[specific-rule]
```
This PR allows the more common use case

Fixes https://github.com/pytorch/pytorch/issues/48643

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51675

Reviewed By: ansley

Differential Revision: D26304870

Pulled By: gmagogsfm

fbshipit-source-id: 0ac9ee34f0219c86e428318a69484d5aa3ec433f
2021-02-08 11:21:47 -08:00
41bab9a4b6 Plumbing dispatch keys through the dispatcher (#49354)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49354

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D25614042

Pulled By: bdhirsh

fbshipit-source-id: 269a75e9a3ac518aa63bff2cafbd47ed2c4ff780
2021-02-08 11:09:51 -08:00
6fa5e96f2e remove unnecessary BoxedKernelWrapper specialization now that ops are all c10-full (#50963)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50963

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D26026665

Pulled By: bdhirsh

fbshipit-source-id: ef6e515f7dae5052538789e5b75dc551b4ce3b11
2021-02-08 11:06:51 -08:00
d9e6750759 fix multi_output_kernel (#51827)
Summary:
With zasdfgbnm's help and with his small TensorIterator kernel repro https://github.com/zasdfgbnm/tensoriterator we've found a workaround for what looks like a compiler bug in multi_output_kernel that manifests itself with cuda 10.2 and cuda 11 when there is a non-trivial OffsetCalculator.
It looks like those nvcc versions cannot handle inheritance in device structs, so instead of inheriting `multi_outputs_unroll` from `unroll` we make it independent.
cc vkuzo, haichuan-fb I verified that reverting https://github.com/pytorch/pytorch/issues/49315 to bring back multi_output_kernel and running `test_learnable_backward_per_channel_cuda` test passes, but I didn't do it in this PR - can you take it up as a follow-up?

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51827

Reviewed By: izdeby

Differential Revision: D26305559

Pulled By: ngimel

fbshipit-source-id: 1168e7c894d237a954abfd1998eaad54f0ce40a7
2021-02-08 10:42:50 -08:00
21dccbca62 Revert D26232345: [pytorch][PR] Report test time regressions
Test Plan: revert-hammer

Differential Revision:
D26232345 (7467f90b13)

Original commit changeset: b687b1737519

fbshipit-source-id: 10a031c5500b083f7c82f2ae2743b671c5a07bff
2021-02-08 10:15:07 -08:00
cyy
1aaddd83a5 don't set the same C++ and C standards twice (#51832)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51832

Reviewed By: izdeby

Differential Revision: D26312660

Pulled By: ezyang

fbshipit-source-id: 7d646cd106397e70bca0050d0aa30eb62b085cee
2021-02-08 08:53:26 -08:00
649e683255 Fix torch.nonzero type annotation (#51635)
Summary:
The overloads are a little tricky here. It's important that the overloads are such that it's unambiguous what
`torch.nonzero(x)` will resolve to - so just specify defaults for one of the overloads. Also, `out` is left out of the second overload
because a non-None value for `out` is not valid in combination with `as_tuple=True`.

Closes gh-51434

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51635

Reviewed By: zhangguanheng66

Differential Revision: D26279203

Pulled By: walterddr

fbshipit-source-id: 8459c04fc9fbf7fc5f31b3f631aaac2f98b17ea6
2021-02-08 08:45:44 -08:00
0dd1d60d54 [JIT] Remove Dropout during Frozon Optimization (#51589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51589

Dropout operators are only needed in training. Remove them for frozen models.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D26214259

fbshipit-source-id: 3ab05869e1e1f6c57498ba62bf40944f7c2189aa
2021-02-08 08:38:08 -08:00
9cbefad83f concantenate LICENSE files when building a wheel (#51634)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/50695

I checked locally that the concatenated license file appears at `torch-<version>.dist-info/LICENSE` in the wheel.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51634

Reviewed By: zhangguanheng66

Differential Revision: D26225550

Pulled By: walterddr

fbshipit-source-id: 830c59fb7aea0eb50b99e295edddad9edab6ba3a
2021-02-08 08:28:46 -08:00
b97a040f71 ENH: toggle TORCH_WARN_ONCE to TORCH_WARN for tests (#48560)
Summary:
Toward fixing https://github.com/pytorch/pytorch/issues/47624

~Step 1: add `TORCH_WARN_MAYBE` which can either warn once or every time in c++, and add a c++ function to toggle the value.
Step 2 will be to expose this to python for tests. Should I continue in this PR or should we take a different approach: add the python level exposure without changing any c++ code and then over a series of PRs change each call site to use the new macro and change the tests to make sure it is being checked?~

Step 1: add a python and c++ toggle to convert TORCH_WARN_ONCE into TORCH_WARN so the warnings can be caught in tests
Step 2: add a python-level decorator to use this toggle in tests
Step 3: (in future PRs): use the decorator to catch the warnings instead of `maybeWarnsRegex`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48560

Reviewed By: ngimel

Differential Revision: D26171175

Pulled By: mruberry

fbshipit-source-id: d83c18f131d282474a24c50f70a6eee82687158f
2021-02-08 08:21:19 -08:00
d454a84bab derivatives.yaml cleanup + restore codegen code forgotten in refactor (#51721)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51721

Reviewed By: zhangguanheng66

Differential Revision: D26285908

Pulled By: albanD

fbshipit-source-id: 3130736be9146eaee3a8e80be59a66eb2180d536
2021-02-08 08:03:40 -08:00
7467f90b13 Report test time regressions (#50171)
Summary:
This is a followup to https://github.com/pytorch/pytorch/issues/49190. Vaguely speaking, the goals are to make it easy to identify test time regressions introduced by PRs. Eventually the hope is to use this information to edit Dr CI comments, but this particular PR just does the analysis and prints it to stdout, so a followup PR would be needed to edit the actual comments on GitHub.

**Important:** for uninteresting reasons, this PR moves the `print_test_stats.py` file.

- *Before:* `test/print_test_stats.py`
- *After:* `torch/testing/_internal/print_test_stats.py`

Notes on the approach:

- Just getting the mean and stdev for the total job time of the last _N_ commits isn't sufficient, because e.g. if `master` was broken 5 commits ago, then a lot of those job times will be much shorter, breaking the statistics.
- We use the commit history to make better estimates for the mean and stdev of individual test (and suite) times, but only when the test in that historical commit is present and its status matches that of the base commit.
- We list all the tests that were removed or added, or whose status changed (e.g. skipped to not skipped, or vice versa), along with time (estimate) info for that test case and its containing suite.
- We don't list tests whose time changed a lot if their status didn't change, because there's a lot of noise and it's unclear how to do that well without too many false positives.
- We show a human-readable commit graph that indicates exactly how many commits are in the pool of commits that could be causing regressions (e.g. if a PR has multiple commits in it, or if the base commit on `master` doesn't have a report in S3).
- We don't show an overall estimate of whether the PR increased or decreased the total test job time, because it's noisy and it's a bit tricky to aggregate stdevs up from individual tests to the whole job level. This might change in a followup PR.
- Instead, we simply show a summary at the bottom which says how many tests were removed/added/modified (where "modified" means that the status changed), and our best estimates of the mean times (and stdevs) of those changes.
- Importantly, the summary at the bottom is only for the test cases that were already shown in the more verbose diff report, and does not include any information about tests whose status didn't change but whose running time got much longer.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50171

Test Plan:
To run the unit tests:
```
$ python test/test_testing.py
$ python test/print_test_stats.py
```

To verify that this works, check the [CircleCI logs](https://app.circleci.com/pipelines/github/pytorch/pytorch/258628/workflows/9cfadc34-e042-485e-b3b3-dc251f160307) for a test job run on this PR; for example:
- pytorch_linux_bionic_py3_6_clang9_test

To test locally, use the following steps.

First run an arbitrary test suite (you need to have some XML reports so that `test/print_test_stats.py` runs, but we'll be ignoring them here via the `--use-json` CLI option):
```
$ DATA_DIR=/tmp
$ ARBITRARY_TEST=testing
$ python test/test_$ARBITRARY_TEST.py --save-xml=$DATA_DIR/test/test_$ARBITRARY_TEST
```
Now choose a commit and a test job (it has to be on `master` since we're going to grab the test time data from S3, and [we only upload test times to S3 on the `master`, `nightly`, and `release` branches](https://github.com/pytorch/pytorch/pull/49645)):
```
$ export CIRCLE_SHA1=c39fb9771d89632c5c3a163d3c00af3bef1bd489
$ export CIRCLE_JOB=pytorch_linux_bionic_py3_6_clang9_test
```
Download the `*.json.bz2` file(s) for that commit/job pair:
```
$ aws s3 cp s3://ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/ $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB --recursive
```
And feed everything into `test/print_test_stats.py`:
```
$ bzip2 -kdc $DATA_DIR/ossci-metrics/test_time/$CIRCLE_SHA1/$CIRCLE_JOB/*Z.json.bz2 | torch/testing/_internal/print_test_stats.py --compare-with-s3 --use-json=/dev/stdin $DATA_DIR/test/test_$ARBITRARY_TEST
```
The first part of the output should be the same as before this PR; here is the new part, at the end of the output:

- https://pastebin.com/Jj1svhAn

Reviewed By: walterddr

Differential Revision: D26232345

Pulled By: samestep

fbshipit-source-id: b687b1737519d2eed68fbd591a667e4e029de509
2021-02-08 07:54:34 -08:00
c89f15ec6d Reland nightlies 11.2 (#51874)
Summary:
Cherry-picked commits from https://github.com/pytorch/pytorch/issues/51611.

Relanding after https://github.com/pytorch/pytorch/issues/51864 should fix failing CUDA tests

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51874

Reviewed By: malfet

Differential Revision: D26313173

Pulled By: janeyx99

fbshipit-source-id: 02250abb526cc7400bc2d9bbb146e8210ccd4b40
2021-02-08 07:41:45 -08:00
79832f3d77 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D26309565

fbshipit-source-id: b20d37ea90304052cef9b4dc359a5bd726d7fda7
2021-02-08 04:17:41 -08:00
bce4c82f0d [C2] Add TypeAndShape Inference logic for ReduceMean (#51828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51828

As desc.

Test Plan: Unit-tests.

Differential Revision: D26293844

fbshipit-source-id: 2eb2a694c439b794ad7c134409e2b8926aabc91f
2021-02-08 00:57:47 -08:00
fcf8b71234 Disable unaliged-access test from TestVectorizedMemoryAccess.CopyKernel (#51864)
Summary:
Test begins to fail after the driver udpate

See https://github.com/pytorch/pytorch/issues/51863

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51864

Reviewed By: bertmaher

Differential Revision: D26304018

Pulled By: malfet

fbshipit-source-id: bb7ade2f28d8cf8f847159d4ce92391f0794c258
2021-02-07 16:20:31 -08:00
0c313564af Backward through sparse_coo_tensor (#50361)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/49683

This PR  solves Backward through sparse_coo_tensor bug by implementing a `sparse_mask_helper` function for n-dimensional sparse tensor for CPU and CUDA which is used to reimplement `sparse_constructor_values_backward` function.

This `sparse_mask` function was implemented before for  backward  sparse-sparse matmul. However,  the algorithm is little different  because in this case it should be applyable not only for matrices but for n-dimensional tensors. Thankfully it was not quite hard to extend and now both share the same code base.

Note that  no new tests are required because now the backward for sparse-sparse matmul now uses the new `sparse_mask_helper`.

ngimel, mruberry - kindly review this.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/50361

Reviewed By: zhangguanheng66

Differential Revision: D26270483

Pulled By: ngimel

fbshipit-source-id: ee4bda49ff86e769342674b64d3c4bc34eae38ef
2021-02-06 23:15:54 -08:00
4b3c99ce4a [Resubmission] Add a documentation page for DDP communication hooks (#51773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51773

Resubmission of #51715.

Minor changes:
1) Removed "Note [Guidance to Tune ``matrix_approximation_rank`` And ``start_powerSGD_iter``]" in powerSGD_hook.py.

2) Removed the duplicate description of `torch.nn.parallel.DistributedDataParallel.register_comm_hook` in ddp_comm_hooks.rst, because it is already covered by distributed.rst.

Also updated the doc based on the comments from PowerSGD paper author Thijs Vogels .

It seems that `python_doc_test` was flaky. The previous error message was not informative:
https://app.circleci.com/pipelines/github/pytorch/pytorch/270682/workflows/8d186a3c-d682-46bf-b617-ad4eef5991e2/jobs/10739143, and all the warnings did also appear on the master branch.

Rebasing to a new master branch seems to get this fixed:
https://app.circleci.com/pipelines/github/pytorch/pytorch/270696/workflows/1a3adbea-6443-4876-b87b-e17d90d41428/jobs/10740021/steps

Screenshot:

{F369899792}
ghstack-source-id: 121199613

Test Plan: View locally

Reviewed By: mingzhe09088

Differential Revision: D26272687

fbshipit-source-id: 6677db496a68171798940a80343f4d9a508e15db
2021-02-06 21:22:04 -08:00
4968227058 add shape inference for Int8GenQuantParamsMinMax
Summary: As titleed

Test Plan: successful test flow with A* setup: f245569242

Reviewed By: anurag16

Differential Revision: D25966283

fbshipit-source-id: ef9945d5039933df44c2c3c26ca149f47538ff31
2021-02-06 17:50:43 -08:00
6c0bf28da6 [wip] doc_fix (#51825)
Summary:
tries to fix doc_test

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51825

Reviewed By: bertmaher

Differential Revision: D26295583

Pulled By: ngimel

fbshipit-source-id: 13f6e7f1675d810adfd4abd2d579e2812fe54c80
2021-02-06 11:36:36 -08:00
6488b2bc3a Revert D26282829: [pytorch][PR] Adding support for CUDA 11.2 in our nightly build matrix
Test Plan: revert-hammer

Differential Revision:
D26282829 (fb07aca7b0)

Original commit changeset: b15380e5c44a

fbshipit-source-id: 18f86e766ed9ec58da32167584bb5e4e2c87a639
2021-02-06 11:22:23 -08:00
fa70168804 Add metacompile of Ternary if (#51789)
Summary:
Fixes issue: https://github.com/pytorch/pytorch/issues/49728
========
Ternary if operation fails in Torchscript when the condition variable is annotated as Final.

Tests:
=======
pytest -k test_ternary_static_if test/test_jit.py

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51789

Reviewed By: gmagogsfm

Differential Revision: D26278969

Pulled By: nikithamalgifb

fbshipit-source-id: 27d1383290211503188428fb2e8b7749f59ba16e
2021-02-06 10:14:30 -08:00
8a9090219e [pyper] register aten::index_out (#51742)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51742

Register aten::index_out with StaticRuntime

Test Plan:
```
MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --c2_weights=/data/users/ansha/tmp/adfinder/models/c2_local_ro_weight_data.pb --c2_inputs=/data/users/ansha/tmp/adfinder/models/c2_local_ro_input_data.pb --pred_net=/data/users/ansha/tmp/adfinder/models/c2_local_ro_net.pb --c2_sigrid_transforms_opt=1 --c2_apply_nomnigraph_passes=1 --c2_use_memonger=1 --scripted_model=/data/users/ansha/tmp/adfinder/models2/210494966_0.predictor.disagg.local_ro.pt --pt_inputs=/data/users/ansha/tmp/adfinder/models/local_wrapped_input_data.pt --pt_enable_static_runtime=1 --pt_cleanup_activations=true --pt_enable_out_variant=true --compare_results=0 --iters=30000 --warmup_iters=10000 --num_threads=1 --do_profile=1 --benchmark_c2_predictor=0
```

Before total ms/iter: 0.699626
Before: 0.0277974 ms.    4.03198%. aten::index (5 nodes)

After total ms/iter: 0.696739
After: 0.0254255 ms.    3.67315%. aten::index (5 nodes)

Reviewed By: hlu1

Differential Revision: D26261215

fbshipit-source-id: b59ebd5ccd33478a9fbc4629a0075fec597a05cb
2021-02-06 04:26:15 -08:00
9a964ce89b Enables backend preprocessing to take place outside of the backend interface (#51757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51757

Enables backend preprocessing to take place outside of the backend interface.

What's new:
* A new definition for backend preprocessing (i.e. BackendPreprocessFunction).
* Registration of the backend's PyTorchBackendInterface interface implementation is augmented to take the BackendPreprocessFunction.
* A new registry is created to handle the BackendPreprocessFunction functions, using the backend's name as key.
* When a BackendPreprocessFunction is used, the PyTorchBackendInterface's "preprocess" method is not added to the LoweredModule. Instead, the BackendPreprocessFunction is called and its output used to set the LoweredModule's __processed_module.

Why?:
These changes are needed to avoid forcing backend preprocessing to be part of the LoweredModule, and in the future be able to eliminate "preprocess" from the PyTorchBackendInterface.
This is important for Mobile use cases where "preprocess" can take the bulk of the compilation process, and thus contain code dependencies that we do not want to bring (or cannot bring) to the Mobile binary.

What didn't change:
* Everything is backwards compatible:
** The existing "preprocess" method in PyTorchBackendInterface is still there.
** When backend registration is done without the BackendPreprocessFunction, as before, things work the same way: "preprocess" is added to LoweredModule, and invoked through the module's instance of the backend interface.

Longer term, the plan is to refactor existing users to move to the new backend registration.
ghstack-source-id: 121190883

Test Plan:
Updated existing tests (test_backend.py) to use the new registration mechanism.
Verified test ran and passed (in my OSS build).

Reviewed By: iseeyuan

Differential Revision: D26261042

fbshipit-source-id: 0dc378acd5f2ab60fcdc01f7373616d1db961e61
2021-02-06 01:07:17 -08:00
215d9daceb Refactor internal methods into debugging utilities (#51737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51737

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26288613

Pulled By: ansley

fbshipit-source-id: 4504b1af5be7a200c1a6a376d432d7224eb8a796
2021-02-05 21:42:18 -08:00
19753af6ea [QNNPACK Sparsity] Add aarch64 kernel of 8x1 sparsity (#51120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51120

Adds asm kernels for aarch64 using 8x1 sparsity. Also remove aarch32 8x4 prepacked kernels and 8x4 inline packed sse2 kernels.

Test Plan:
q8gemm-sparse-test

Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D26077766

fbshipit-source-id: 29793d30a47b8f4084daf8950d925dc804d3dc59
2021-02-05 18:47:38 -08:00
6b2811f288 [QNNPACK, Sparsity] Add 8x1 block sparse kernels for aarch32. (#51119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51119

Adds asm kernel for 8x1 block sparse kernel. Since ukernels is still
producing 4x8 blocks, similar to 1x4 sparsity pattern, we can use the
same prepacking kernel for activation. It does get a tiny bit hacky but
allows us to reuse the kernel.

Test Plan:
q8gemm-sparse-test
fully-connectest-sparse-test

Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D26077765

fbshipit-source-id: cc087b0ff717a613906d442ea73680e785e0ecc2
2021-02-05 18:47:33 -08:00
c034e0750c [QNNPACK, Sparsity] Code refactoring to allow for more generic block (#51118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51118

sparsity

Modify BCSR to pack generic block sparsity pattern.
Modify rest of the code to accommodate the change.
This is in preperation to support 8x1 sparsity.

Test Plan:
q8gemm-sparse-test

Imported from OSS

Reviewed By: AshkanAliabadi

Differential Revision: D26077767

fbshipit-source-id: 7179975b07a1cb76ef26896701d782fb04638743
2021-02-05 18:44:25 -08:00
bc1b1e8253 fixing mkldnn_linear & backward with silent error (#51713)
Summary:
mkldnn_linear & mkldnn_linear_backward_input gives wrong result when weight is non contiguous.

Issue exposed in PR https://github.com/pytorch/pytorch/issues/51613

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51713

Reviewed By: zhangguanheng66

Differential Revision: D26282319

Pulled By: ngimel

fbshipit-source-id: 96516e10c9dc72c30dac278fce09b746aa5f51b2
2021-02-05 18:36:30 -08:00
9112f4eded [FX][docs] Indent forward (#51802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51802

lol

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D26284311

Pulled By: jamesr66a

fbshipit-source-id: 0d303d8c99131abb8d97e0acd0ac2d810e1e950c
v1.8.0-rc1
2021-02-05 18:01:27 -08:00
8c48af822e pytorch docs: add fake_quantize functions documentation (#51748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51748

Adding docs for `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`
functions.

Note: not documenting `fake_quantize_per_tensor_affine_cachemask` and
`fake_quantize_per_channel_affine_cachemask` since they are implementation details
of `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine`,
and do not need to be exposed to the user at the moment.

Test Plan: Build the docs locally on Mac OS, it looks good

Reviewed By: supriyar

Differential Revision: D26270514

Pulled By: vkuzo

fbshipit-source-id: 8e3c9815a12a3427572cb4d34a779e9f5e4facdd
2021-02-05 17:53:02 -08:00
ececbcfff2 [Conda][Kineto] efine weak acc_get_device_type if kineto is used (#51818)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51818

Reviewed By: ilia-cher

Differential Revision: D26291188

Pulled By: malfet

fbshipit-source-id: 68797e02fe4dd54d8030e67aaf28046a4fae0770
2021-02-05 17:46:30 -08:00
fb07aca7b0 Adding support for CUDA 11.2 in our nightly build matrix (#51611)
Summary:
Replacing 11.0 with 11.2 in our nightlies.

(am slightly uncertain why the manywheel linux tests worked before we added the GPU driver for 11.2)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51611

Reviewed By: malfet, seemethere, zhangguanheng66

Differential Revision: D26282829

Pulled By: janeyx99

fbshipit-source-id: b15380e5c44a957e6a85e4f5fb9691ab9c6103a5
2021-02-05 15:40:31 -08:00
5c3a054b12 Add FLOPS support to the new profiler API. (#51734)
Summary:
The new profiler API was added in PR#48280. This PR is to add FLOPS
support to the new profiler API.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51734

Test Plan:
```python
python test/test_profiler.py -k test_flops
```

Reviewed By: xuzhao9

Differential Revision: D26261851

Pulled By: ilia-cher

fbshipit-source-id: dbeba4c197e6f51a9a8e640e8bb60ec38df87f73
2021-02-05 15:03:35 -08:00
430329e875 Revert D26009829: Optimize relu on cpu using clamp_min
Test Plan: revert-hammer

Differential Revision:
D26009829 (2054cd56c5)

Original commit changeset: 7bb1583ffb3e

fbshipit-source-id: 3e945b438fb8d83f721e400ae69be8848cab9720
2021-02-05 14:48:06 -08:00
50c9c08203 Enable GPU/RE tags for caffe2/caffe2/python/TARGETS
Summary: Moving caffe2_core_gpu_python contbuild to use GPU/RE

Test Plan: CI

Reviewed By: malfet

Differential Revision: D26261826

fbshipit-source-id: a6f8c7bd8368c1cb69499ea0ea7d5add0956a7ad
2021-02-05 13:52:48 -08:00
2054cd56c5 Optimize relu on cpu using clamp_min (#50924)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50924

`clamp_min` seems slightly faster than `threshold` (on avx2 cpus)
because it compiles down to vmaxps, rather than vcmpps+vblendv.

I see the biggest perf difference (about 20% faster) with float
tensors at 32k-64k elements.  Bigger tensors are more memory bound
although it looks like it might still be a tiny win (2%).

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D26009829

Pulled By: bertmaher

fbshipit-source-id: 7bb1583ffb3ee242e347f59be82e0712c7631f7e
2021-02-05 13:03:40 -08:00