Compare commits

...

1494 Commits

Author SHA1 Message Date
7f73f1d591 add python 3.8 workaround 2020-01-14 09:05:04 -08:00
ac15471de4 clarify when to use as_tuple in torch.nonzero (#32051)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31798

Differential Revision: D19272332

Pulled By: zou3519

fbshipit-source-id: 954d086a7b9f1a719e0dac303a4253bf7ec8e9f4
2020-01-14 11:07:33 -05:00
49364eb426 Fix typographical error in torch.triu docstring (#32067) (#32122)
Summary:
below --> above

Fixes https://github.com/pytorch/pytorch/issues/32032
Pull Request resolved: https://github.com/pytorch/pytorch/pull/32067

Differential Revision: D19355788

Pulled By: zou3519

fbshipit-source-id: dc7a2538a78cd11e72d47ad923ef50599a5a87e2
2020-01-14 10:02:37 -05:00
bcf2d65446 disable two more tests 2020-01-13 21:57:12 -08:00
f7a33f1eef disable a few more tests because of OSX failures similar to #30604 2020-01-13 13:21:49 -08:00
bd584d52df Disable test_backward_per_tensor in test_fake_quant (#30594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30594

This testcase started breaking, clean up for the build.
ghstack-source-id: 94736837

Test Plan: Unittest disabling change

Differential Revision: D18758635

fbshipit-source-id: 05df1158ff0ccd75e401f352da529fb663b1cae0
2020-01-13 13:15:20 -08:00
c697af4667 Temporarily disable test_numerical_consistency_per_tensor (#30600)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30600

test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows.
ghstack-source-id: 94742124

Test Plan: CircleCI tests

Differential Revision: D18760287

fbshipit-source-id: 7f59355eab74e811bb370ad2836ed2f1def1f621
2020-01-13 13:15:14 -08:00
0f3f4ec64c Kill hypothesis deadline testing (#30890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890

We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing

Test Plan: Imported from OSS

Differential Revision: D18857597

Pulled By: jamesr66a

fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6
2020-01-13 13:12:14 -08:00
509df600bb Revert "Remove javasphinx extension (#31955)" (#32059)
This reverts commit 8ada95e95092f93780bd56bad568e2491880e9fd.
2020-01-10 14:31:35 -05:00
187101a88e [v1.4.0] Minimal changes in interpolate to support Keypointrcnn (#32010)
* Fix interpolate

* add keypointrcnn test

* update ort versio for test

* pin tv version

* Update test.sh

* Get rid of onnxruntime test changes.

* [v1.4.0] Added torchvision tests as part of ORT tests (#31835)

Summary:
Added torchvision tests as part of ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835

Reviewed By: hl475

Differential Revision: D19278607

Pulled By: houseroad

fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd

* Remove faster_rcnn and mask_rcnn tests.

Co-authored-by: Lara Haidar <haidar.lara@gmail.com>
Co-authored-by: Negin Raoof <neginmr@utexas.edu>
2020-01-10 12:04:29 -05:00
e011d4a16e Restore CUDA half linspace+logspace and add coverage tests (#31959)
This PR restores the implementation of CUDA half linspace+logspace.

I added tests for the following:
- linspace+logspace have the same support for integral types on CPU/CUDA
- Precision tests for CUDA half, float, and double.

The precision for CUDA half seems bad, but I checked the numbers against
previous versions of pytorch. The output of CUDA Half linspace+logspace
are exactly the same when compared with 1.2.0.

Equivalent-ish PR on master:
https://github.com/pytorch/pytorch/pull/31962
2020-01-09 10:42:36 -05:00
8ada95e950 Remove javasphinx extension (#31955)
See PR [31581](https://github.com/pytorch/pytorch/pull/31581) for more details.
2020-01-08 14:09:19 -08:00
21c2481dfe Fix nvcc math functions for MSVC 2019 (#31704) (#31816)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31108.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31704

Differential Revision: D19256110

Pulled By: mingbowan

fbshipit-source-id: a4aba2830aba002497f70a75ef995e5e7de08393
(cherry picked from commit 7a3ed36309f48cb833f1690991c7b0f59da6ce11)
2020-01-08 16:30:07 -05:00
398e8ba182 Include two caffe2 ops in v1.4.0 (#31716)
* move AliasWithNameOp to caffe2/operators

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31281

Reviewed By: houseroad

Differential Revision: D19053453

fbshipit-source-id: 350bfd5c001db9c17916dcae7ade8f56db1e9841

* move BatchPermutationOp to caffe2/operators

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31350

Reviewed By: houseroad

Differential Revision: D19053527

fbshipit-source-id: 50d11f137d0f5c07e8ad899a3a84d56a042bbc32

Co-authored-by: wat3rBro <wangyanghan6@gmail.com>
2020-01-08 13:28:13 -05:00
074b30cdcb Restructure docs organization and naming and add Javadoc (#31581)
* Restructure docs organization and naming and add Javadoc

- Rename “Other Languages” → “Language Bindings”
- Move the Community section to the bottom
- Move "Language Bindings" above "Python API"
- Add Javadoc url in index.rst

* Delete no longer needed java rst files. Remove javasphinx extension.
2020-01-08 10:22:35 -08:00
319bd5d431 Disable flaky TestMomentumSGD.test_fp16momentum_sgd (#31369) (#31637)
Summary:
Related to https://github.com/pytorch/pytorch/issues/31368
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31369

Co-authored-by: Vitaly Fedyunin <vitalyf@fb.com>
2019-12-26 13:20:37 -08:00
5a20bbd377 [v1.4.0 Support optional float parameters (float?, optional<double>). (#31530)
This is going to be used by upsample (which currently uses magic values to represent optionals).

For now, we just introduce a fake function for testing (torch._test_optional_float(x)).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517
2019-12-26 10:50:33 -08:00
fa59a9e190 Dump operator names of a script module, v1.4.0 pick request (#30747)
* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']

* Dump operator names of a script module

Summary:

Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size.

Example:
import torch
m = torch.jit.load("example.pt")
print(torch.jit.export_opnames(m))

The outputs are in alphabetical order:
['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']
2019-12-26 10:49:49 -08:00
143868c3df cherry pick 30320 (#31573) 2019-12-23 22:49:26 -08:00
964929fcc2 hacky way to fix android-ndk build (#31529)
* hacky way to fix android build

* should run!!!

* again!!
2019-12-20 18:01:32 -08:00
cd20ecb472 no xla build/test for v1.4.0 (#31518) 2019-12-20 10:43:36 -08:00
19d4fd4910 Specify ordering on singular values and eigenvalues output from torch… (#30389) (#30575)
Summary:
….svd/symeig respectively

Changelog:
- Adds a note to docstrings of the both functions specifying the ordering

Fixes https://github.com/pytorch/pytorch/issues/30301
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389

Differential Revision: D18707608

Pulled By: zou3519

fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8
2019-12-19 16:10:07 -08:00
a7d187baa4 [v1.4.0] Fix reading __cuda_array_interface__ inferred strides, add test. (#31450)
This is a simpler fix than https://github.com/pytorch/pytorch/pull/24947, which both fixed the bug and updated the protocol version.
This also adds a test (which the previous PR did not).

So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.
2019-12-19 16:09:37 -08:00
0541546ac5 Fix unflatten when dim is a negative integer (#31208) (#31432)
Summary:
Changelog:
- Wrap dim to be a positive integer when dim is negative
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31208

Test Plan:
- Updated tests in test_namedtensor.py

Fixes https://github.com/pytorch/pytorch/issues/31184

Differential Revision: D19036569

Pulled By: zou3519

fbshipit-source-id: 86e01e20988dee7c4b6c73232f66282d687f9a2c
2019-12-19 16:09:28 -08:00
369ab73efd Fix copy kernel speed regression introduced in #29631 (#31279) (#31322)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/31271

This fixes copy kernel speed regression introduced in https://github.com/pytorch/pytorch/issues/29631.

The previous implementation forces the compiler to instantiate `static_cast_with_inter_type` because it is passed as an argument of a function. This behavior makes it impossible for compilers to do optimizations like automatic vectorization, and, function call itself is expensive compared to a single casting instruction.

To check the change, run
```
readelf -Ws /home/xgao/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so | grep static_cast_with_inter_type
```

On nightly build, we have output
```
168217: 0000000001852bf0     5 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsdE5applyEd
168816: 0000000001852d30    33 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEaE5applyEa
168843: 00000000018531f0     7 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIblE5applyEl
168930: 0000000001852c20     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIslE5applyEl
168935: 00000000018528d0   124 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIfNS_4HalfEE5applyES1_
169023: 0000000001852f30    17 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEhE5applyEh
169713: 00000000018525c0     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIahE5applyEh
170033: 0000000001852c10     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsiE5applyEi
170105: 0000000001852bd0     5 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIshE5applyEh
170980: 0000000001852fc0    27 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdES1_IfEE5applyES3_
171398: 0000000001852810    13 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIdbE5applyEb
171574: 00000000018532e0    35 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIbNS_8BFloat16EE5applyES1_
171734: 0000000001852b20     6 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIlSt7complexIdEE5applyES2_
172422: 0000000001853350    54 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EaE5applyEa
172704: 00000000018533c0    38 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EfE5applyEf
172976: 0000000001852890    10 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIflE5applyEl
173038: 0000000001852f80     9 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEfE5applyEf
173329: 00000000018531c0    20 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIbfE5applyEf
173779: 00000000018524d0     3 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIhiE5applyEi
174032: 0000000001852960    14 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIfNS_8BFloat16EE5applyES1_
174334: 0000000001852d60    29 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEdE5applyEd
174470: 0000000001852c60   124 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIsNS_4HalfEE5applyES1_
174770: 0000000001852bc0    15 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIlNS_8BFloat16EE5applyES1_
176408: 0000000001853980   144 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeINS_4HalfEbE5applyEb
176475: 0000000001852790   128 FUNC    LOCAL  DEFAULT    9 _ZN3c1027static_cast_with_inter_typeIdNS_4HalfEE5applyES1_
....
```

And after this PR, we get empty output
```
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31279

Differential Revision: D19075587

Pulled By: ngimel

fbshipit-source-id: c20088241f39fa40c1d055f0a46eb5b9ece52e71
2019-12-19 16:09:11 -08:00
9f558e1ee6 turn off profiling graph exec (#30750) 2019-12-19 16:08:59 -08:00
f0ddfff200 Fix exception message in Java Tensor (#30776)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30205

Test Plan: Imported from OSS

Reviewed By: linbinyu

Differential Revision: D18653568

Pulled By: dreiss

fbshipit-source-id: a5fcb809eba641a7fbd0e99e835eceeb248e680c
2019-12-19 16:08:49 -08:00
2de184b5a9 Update persons_of_interest.rst (#30648) 2019-12-19 16:08:39 -08:00
e0eeddfc78 torch.where changes made on 1.3.1 but not on master (#30729)
* Make zeros argument of torch.where same dtype as other argument (#30661)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30661

Cherry-picked from https://github.com/pytorch/pytorch/pull/29080

Test Plan: Imported from OSS

Differential Revision: D18781870

Pulled By: nairbv

fbshipit-source-id: 9de85aa91bf7e0856f35c7c6238a8923315ed27f

Co-authored-by: ifedan

* Added check for torch.where on CPU that both arguments have same dtype (#30662)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662

Cherry picked from: https://github.com/pytorch/pytorch/pull/29081

Test Plan: Imported from OSS

Differential Revision: D18782295

Pulled By: nairbv

fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04

Co-authored-by: ifedan
2019-12-19 16:01:51 -08:00
7727b57d08 [v1.4.0 cherrypick] Fix BC for quantized linear (#30629) 2019-12-19 16:01:26 -08:00
9e7dc37f90 Updates to Quantization documentation (#30372)
* added entires to quantization.rst per issue #27938

* more minor tweaks to quantization.rst to reflect the quantization support list (#27938)

* added discussion about setting backend engine to QNNPACK to quantization.rst (#29735)

* added docstrings to the fused functions in torch/nn/intrinsic/modules/fused.py (#26899)

* fixed the docstring for  torch.nn.intrinsic.quantized.ConvReLU3d  (#27451)

* fixed the formatting on fuse_modules() (#26305)

* fixed rendering issue on QConfig (#30283)

* resolved feedback on PR #30288. Thanks Raghu
2019-12-19 16:01:09 -08:00
227017059f Fix BC test for v1.4.0 (#31442)
* Fix BC test for v1.4.0

* Print out all the broken ops

* White list the broken ones
2019-12-19 14:16:24 -08:00
aeeccc1486 Disable the rebase logic to make the CI pass (#31399) 2019-12-18 12:21:13 -08:00
0b91246cbd [v1.4.0] Fix coverage and hypothesis conflict (#31429)
Summary:
Temporarily enforcing versions for all envs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31320

Differential Revision: D19122781

Pulled By: VitalyFedyunin

fbshipit-source-id: fe6473b177367371387d4b3b873131e7ecfbc0f8
2019-12-18 12:16:05 -08:00
0856d6f53c use earlier docker image to make sure generated binary size is small (#31142)
* use earlier docker image to make sure generated binary size is small

* fix hypothesis version
2019-12-17 15:03:29 -08:00
336e0d2874 our setup requires actions/checkout@v1 to work correctly (#31371)
* checkout correct branch for linting

* try #2

* try #3

* try #4
2019-12-17 10:56:50 -08:00
3b36f2068d Revert "Merge branch 'v1.4.0' of https://github.com/pytorch/pytorch into lahaidar/cherry_pick_28324"
This reverts commit 6207945564b317f4300264e80d125b9a7225b81e, reversing
changes made to 27a2ecb0a5da9507a2b0a0315a7dfeab4b9f85f9.
2019-12-13 16:20:28 -08:00
6207945564 Merge branch 'v1.4.0' of https://github.com/pytorch/pytorch into lahaidar/cherry_pick_28324 2019-12-13 15:48:08 -08:00
aecae514ab Merge branch 'cherry_pick_28324' of https://github.com/houseroad/pytorch into lahaidar/cherry_pick_28324 2019-12-13 15:45:32 -08:00
27a2ecb0a5 Revert "[v1.4.0 cherrypick] ONNX Interpolate Add Scales Params (#31170)" (#31272)
This reverts commit e36fd7b0bae7b350886bf090f7ce222a0c6218df.
2019-12-13 15:14:42 -08:00
e36fd7b0ba [v1.4.0 cherrypick] ONNX Interpolate Add Scales Params (#31170)
The original PR is #28324

We hope we can cover torchvision models in PyTorch ONNX exporter with release 1.4. This PR is part of it.
2019-12-13 15:08:35 -08:00
799cb646a6 update expect files 2019-12-13 11:36:06 -08:00
f60c63155a ONNX Interpolate Add Scales Params (#28324)
Summary:
Fix for : https://github.com/pytorch/pytorch/issues/27176
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324

Reviewed By: hl475

Differential Revision: D18309133

Pulled By: houseroad

fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf
2019-12-13 11:36:06 -08:00
954d9ea466 fix test ci by pinning hypothesis and correcting the import (#31201)
* fix test ci by pinning hypothesis and correcting the import, from https://github.com/pytorch/pytorch/pull/31137

* also update for windows build
2019-12-13 11:30:57 -08:00
71185fb2a0 update expect files 2019-12-12 10:54:17 -08:00
a06f26560c Make Conv{1,2,3}dOptions and ConvTranspose{1,2,3}dOptions different classes (#31005)
Summary:
Currently, both `Conv{1,2,3}dOptions` and `ConvTranspose{1,2,3}dOptions` are aliases of the `ConvOptions<{1,2,3}>` class, which causes confusion because the `ConvOptions` class has parameters such as `transposed` that shouldn't be exposed to the end user. (This has caused issues such as https://github.com/pytorch/pytorch/issues/30931.) This PR makes the following improvements:
1. Rename the original `torch::nn::ConvOptions<N>` class to `torch::nn::detail::ConvNdOptions<N>` class, to signify that it's an implementation detail and should not be used publicly.
2. Create new classes `torch::nn::ConvOptions<N>` and `torch::nn::ConvTransposeOptions<N>`, which have parameters that exactly match the constructor of `torch.nn.Conv{1,2,3}d` and `torch.nn.ConvTranspose{1,2,3}d` in Python API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/31005

Differential Revision: D18898048

Pulled By: yf225

fbshipit-source-id: 7663d646304c8cb004ca7f4aa4e70d3612c7bc75
2019-12-12 11:46:33 -05:00
e4cec279c6 ONNX Interpolate Add Scales Params (#28324)
Summary:
Fix for : https://github.com/pytorch/pytorch/issues/27176
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324

Reviewed By: hl475

Differential Revision: D18309133

Pulled By: houseroad

fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf
2019-12-11 22:05:47 -08:00
b8b50aa909 Fix missing virtual destructor (#30927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927

Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen
ghstack-source-id: 95144736

Test Plan: waitforsandcastle

Differential Revision: D18870351

fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82
2019-12-09 12:47:01 -08:00
db686de13f [1.4.0] Enable len(dataloader) for iterable dataset (#30828)
* enable len(dl) for iterable dataset

* warn if len was called
2019-12-06 18:25:14 -05:00
288e463693 Fix a clang 7 compiler bug for c++14 mode (#30891)
This is already fixed in master as part of bc2e6d10fa.

Before this fix, compiling PyTorch with `-std=c++14` failed on clang 7 due to a compiler bug in the optimizer. With this fix, it works and people can compile PyTorch (or PyTorch extensions) with `-std=c++14`.
2019-12-06 14:11:12 -05:00
73783d1048 Update persons_of_interest.rst 2019-12-05 21:27:01 -08:00
8891d4eeb1 fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_overflow_large to working state (#30793) 2019-12-04 23:13:17 -05:00
2085a6f329 Add local shutdown to process group agent (#30330)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330

This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately.

ghstack-source-id: 94673884
ghstack-source-id: 94673884

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18661775

fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2
2019-12-04 19:23:58 -08:00
3eda9e7da2 By default ignore RRef leaks during shutdown (#30217)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217

Before this commit, RRefContext throws an error if it detects any
RRef leak during shutdown. However, this requires applications to
make sure that is has freed all references to RRefs in application
code, which can be a bad debugging experience when for large
applications. Besides, this also relies on Python GC to free things
up in time, which might not always be true. After this commit,
RRefContext would ignore leaking RRefs during shutdown, as shutdown
is called when the application has finished training and no longer
care about local states. Hence, it should be OK to just ignore
those leaks and destroy OwnerRRefs. If application would like to
enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak
to False.

Test Plan: Imported from OSS

Differential Revision: D18632546

Pulled By: mrshenli

fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38
2019-12-04 13:33:31 -05:00
fb8aa0e98c Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684)
Summary:
This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to.

Fixes https://github.com/pytorch/pytorch/issues/30682.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684

Differential Revision: D18795717

Pulled By: yf225

fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad
2019-12-04 11:35:19 -05:00
c79b79dadd add default arg for init_method (#30208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208

Adds default arg for init_method so users don't have to pass this in,
and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs.
ghstack-source-id: 94500475

Test Plan: Unit tests pass.

Reviewed By: mrshenli

Differential Revision: D18630074

fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a
2019-12-03 15:26:51 -08:00
21acca4528 Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626)
Summary:
PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements:
1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior.
2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/*recurse=*/false)` / `named_buffers(/*recurse=*/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626

Differential Revision: D18777507

Pulled By: yf225

fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b
2019-12-03 15:57:32 -05:00
f710757557 Skip undefined tensors when moving torch::nn module to a different device (#30523)
Summary:
This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523

Differential Revision: D18732904

Pulled By: yf225

fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d
2019-12-03 15:57:32 -05:00
a5272cb643 Error instead of assertion failure for div by sparse (#30260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30260

fixes: https://github.com/pytorch/pytorch/issues/30044

Without this PR,

```
>>> torch.tensor(1.) / torch.tensor(1.).to_sparse()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: r.is_sparse() INTERNAL ASSERT FAILED at /Users/distiller/project/conda/conda-bld/pytorch_1570710797334/work/aten/src/ATen/native/sparse/SparseTensorMath.cpp:168, please report a bug to PyTorch.
```

Test Plan:
Ran the same code with this change:

```
In [1]: import torch
In [2]: torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-2-7177f54f30bb> in <module>
----> 1 torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse()

RuntimeError: Unsupported tensor layout
```

Differential Revision: D18657387

Pulled By: nairbv

fbshipit-source-id: cd23570d46f5b26fd84049e5e63b61b19835603d
2019-11-22 11:31:26 -08:00
638f4c1fb3 Update Cocoapods to 1.4.0 (#30326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30326

Note that this PR won't trigger the cocoapods build. We'll push the binary and release the cocoapods after the branch cut.

Test Plan: Imported from OSS

Differential Revision: D18660308

Pulled By: xta0

fbshipit-source-id: 95dd97b7b67e70ecee3a65d8bbc125791872b7ca
2019-11-22 11:31:21 -08:00
97fae401f0 Use LinearPackedParams everywhere
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198

Test Plan: Imported from OSS

Differential Revision: D18628003

Pulled By: jamesr66a

fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa
2019-11-22 11:31:17 -08:00
1cc321deed Memoize parseIR calls in graph mode quantization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30188

Test Plan: Imported from OSS

Differential Revision: D18625743

Pulled By: jamesr66a

fbshipit-source-id: 88f9da8e79324ba91e3550a8fc1a05e85bb83a86
2019-11-22 11:31:13 -08:00
65f465050b Dont use SubgraphRewriter in FoldQuantizeCallIntoBuffer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30264

Test Plan: Imported from OSS

Differential Revision: D18645531

Pulled By: jamesr66a

fbshipit-source-id: 44fc0f0a3c8cabe62924baae0d556e43bbf637ec
2019-11-22 11:31:08 -08:00
a9f3f48f88 Revert D5578006: Add local shutdown to process group agent
Test Plan: revert-hammer

Differential Revision:
D5578006

Original commit changeset: 6258879fb44c

fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1
2019-11-22 11:31:04 -08:00
fa242246ee add unit tests to iOS CI jobs (#30133)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30133

### Summary

Recently we've found that the master branch was constantly broken due to some unwanted change being landed on mobile. The problem is that our CI was not able to detect the runtime errors.

### Previous work

- Add an unit test target to the iOS TestApp ( #29962 )
- Update Fastlane to run tests ( #29963 )

### What's been changed in CI

1. XCode version has been updated to 11.2.1
2. For iOS simulator build, we'll run some unit tests( currently only one) after the build test.

Test Plan: Imported from OSS

Differential Revision: D18641413

Pulled By: xta0

fbshipit-source-id: 12942206f1dee045b2addba3ae618760e992752c
2019-11-22 10:52:11 -08:00
7903fb118f Move qkv_same, kv_same into branch (#30142)
Summary:
Perf improvements to multi_head_attention_forward

- qkv_same and kv_same were not used outside of that branch. Further, kv_same was calculated even though it is not used if qkv_same
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30142

Differential Revision: D18610938

Pulled By: cpuhrsch

fbshipit-source-id: 19b7456f20aef90032b0f42d7da8c8a2d5563ee3
2019-11-22 10:40:02 -08:00
5d7b2089e8 Draft version: Make AliasAnalysisKind optional in Op Registration API (#30187)
Summary:
Don't look into deep into the diff's implementation. The reason to send out this diff is to help sync on the design first. Once we agree on the design, I will update the implementation accordingly.

**Here is the basic design for achieving this functionality:**

**Q1: Do we need to tell apart case between the following:**
case 1:  registry 1: PURE -> registry 2: CONSERVATIVE
case 2:  registry 1: PURE -> registry 2: <not set>

A: should be yes though, right now both cases have same value(due to defaulting to CONSERVATIVE) in operators_ and operatorLookupTable_.
case 1 should be denied while case 2 should be legal case where registry 1 will be PURE at the end.

**How to tell apart both cases:**

Right now, AliasAnalysisKind::CONSERVATIVE is by default (code pointer: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/core/dispatch/OperatorOptions.h?lines=22%2C52)

Current approach: Introducing a boolean flag in OperatorOptions called isDefault, defaulting to value true. When manually call setAliasAnalysis(AliasAnalysisKind), it will be set too false.
And then when findSchema() in Dispatcher.cpp,  we will check response's option's isDefault value.
If isDefault = true, then with some sanity check and if all checks passed, we can update the option info in both operators_ and operatorLookupTable_

Other approaches:
1. Introducing a new AliasAnalaysisKind maybe called NOT_SPECIFIED.  (I am not doing it this way since then I need to update other callosities related to AliasAnalaysisKind::CONSERVATIVE) Also, we will need to have additional logics to align between NOT_SPECIFIED and CONSERVATIVE

**What data to be updated:**
corresponding entry in std::list<OperatorDef> operators_ and LeftRight<ska::flat_hash_map<OperatorName, OperatorHandle>> operatorLookupTable_

(More things to be discussed here.)

**Do we need to trigger listeners if an entry get updated:**
I think no.
callOnOperatorRegistered(op) seems only to be using OperatorHandle.schema now from the only callsite from register_c10_ops.cpp
(code pointers: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/core/dispatch/Dispatcher.cpp?commit=b4cefeaa98dca5b1ec5f7a0bca6028e368960244&lines=87-90
and https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/register_c10_ops.cpp?lines=178&link_ref=biggrep)

However, things can be much more complicated if future extensions may use options when some listeners want to use options value to register operators.

**Future reading list + remaining questions:**
1. How options get consumed on the other side.
2. Usages for fields in OperatorEntry besides schema/options/kernals
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30187

Test Plan:
[xintchen@devvm6308.prn2 ~/fbsource/fbcode] buck test mode/dev //caffe2:ATen-core-test

All tests passed

Differential Revision: D18530964

Pulled By: charliechen0401

fbshipit-source-id: 60c0560a63a36e54f09f397667bb7122b61d6a8e
2019-11-22 10:20:41 -08:00
c478a92b93 Add local shutdown to process group agent (#30020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020
This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times.

ghstack-source-id: 94415336

Test Plan: Unit tests pass.

Differential Revision: D5578006

fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02
2019-11-22 10:03:00 -08:00
559b3b5a7a Use unboxed registration for most of operators used in lite interpreter. (#30239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30239

Use unboxed registration per smessmer 's request. For some ops with optional arg or tensor list that unboxed registration are not supported, we still use boxed.

Test Plan: Imported from OSS

Differential Revision: D18653846

Pulled By: iseeyuan

fbshipit-source-id: c22ce8111dfff0ba63316a9bcfe2b712b2d31fc1
2019-11-22 10:00:30 -08:00
f41422121e default construct rpc agent options based on the backend type (#30201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30201

Provide a default constructor so that users don't have to construct
RPC agent options. Also rename this to RPCBackend Options as suggested.
ghstack-source-id: 94411768

Test Plan: Unit tests pass.

Differential Revision: D18628698

fbshipit-source-id: 81fb45f124ad1006e628f6045162308093c9d446
2019-11-22 08:18:06 -08:00
3455231e9c Expose configuration of Numa directories to setup.py (#30104)
Summary:
https://github.com/pytorch/pytorch/issues/29968
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30104

Differential Revision: D18656882

Pulled By: ezyang

fbshipit-source-id: f932a98674033f1a3184dc1c22faa6f8c2b50134
2019-11-22 07:07:39 -08:00
faacbfa8bf Migrate index_add cpu from TH to ATen (#28421)
Summary:
Migrate index_add cpu from TH to ATen.

I couldn't find replacement for get1d and set1d, so doing pointer arithmetic inplace.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28421

Test Plan: existing tests

Differential Revision: D18060971

Pulled By: ggoossen

fbshipit-source-id: 413719990cdb2fe578964cde14e93577e48a4342
2019-11-22 06:25:13 -08:00
183aa1534f Add --no_python flag (#29144)
Summary:
Allows you to use a bash script wrapper in-between launch and your
training script. e.g.
```
python -m torch.distributed.launch --nproc_per_node=8 --no_python --use_env \
    bash -c 'exec numactl --cpunodebind=$(( LOCAL_RANK / 4 )) "$@"' -- \
    python train.py ...
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29144

Differential Revision: D18345647

Pulled By: pietern

fbshipit-source-id: f05849c38c82de782988d07d300e00cf9f37253a
2019-11-22 06:05:41 -08:00
29887f813a Remove unused forward declaration (#30154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30154

This doesn't seem to be used in thread_pool.cpp.
ghstack-source-id: 94264158

Test Plan: Let's see if this compiles.

Differential Revision: D18614141

fbshipit-source-id: c6ff3db56b55fcee7d8123d909ee275690163ece
2019-11-22 05:24:53 -08:00
a074080d57 Mark c10d::~NCCLUtils as noexcept (#29118)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29118

It's never a good idea to throw from a destructor and per #28288 we
can't use `std::make_shared` on a class with a `noexcept(false)`
destructor.

To fix this, we `abort` instead of throw from the `NCCLComm` destructor.

Closes #28288.
ghstack-source-id: 93182910

Test Plan: ProcessGroupNCCLErrorsTest runs successfully.

Reviewed By: pritamdamania87

Differential Revision: D18298271

fbshipit-source-id: ccac37753fef64fb63cb304433f4f97dc5621379
2019-11-22 04:06:12 -08:00
95b451d386 fixing test_tensorboard for py2 (#30298)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30298

This diff fixes test_tensorboard for python2:
- proto serialization is different in py2 vs py3 (e.g. for bytes) -> simple string comparison will fail for test_pytorch_graph. Modified to make graph comparison field by field

Reviewed By: J0Nreynolds

Differential Revision: D18654691

fbshipit-source-id: fdbca32e9a7fc2ea70a040bb825eab8a48d0dfe4
2019-11-22 01:02:07 -08:00
f5ef3a6fb6 disable JIT optimizer in Android wrapper for mobile custom build (#30285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30285

PR #30144 introduced custom build script to tailor build to specific
models. It requires a list of all potentially used ops at build time.

Some JIT optimization passes can transform the IR by replacing
operators, e.g. decompose pass can replace aten::addmm with aten::mm if
coefficients are 1s.

Disabling optimization pass can ensure that the list of ops we dump from
the model is the list of ops that are needed.

Test Plan: - rerun the test on PR #30144 to verify the raw list without aten::mm works.

Differential Revision: D18652777

Pulled By: ljk53

fbshipit-source-id: 084751cb9a9ee16d8df7e743e9e5782ffd8bc4e3
2019-11-22 00:25:04 -08:00
1690feba9f add mobile build CI with host toolchain (#30292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30292

We already have CI jobs to build Android/iOS libraries, but there are
two issues:
- It's no easy for people who are not regularly working on mobile to debug
these CI errors as they need setup Android/iOS build environment;
- It's hard to run cross-compiled mobile libraries as it requires
emulator. It happened a couple times recently that it can build but fail
to load and run a model with mobile build.

To address these problems, create this new CI job to build mobile
library with linux host toolchain so that we can build & test without
involving Android/iOS environment/simulator. Will add tests on top of it next.

Test Plan: - check the new CI job

Differential Revision: D18654074

Pulled By: ljk53

fbshipit-source-id: eb1baee97a7b52c44979dbf1719c3357e08f895e
2019-11-22 00:02:27 -08:00
48b943960e Add bfloat16 support in linear algebra on ROCm (#27719)
Summary:
This adds support for gemm-style matrix multiplications with data and output in bf16 to PyTorch on ROCm to the backend (i.e., bgemm).

Enable operators depending on bgemm.

With this change, bf16 matrices on ROCm can be multiplied on the GPU.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27719

Differential Revision: D18653514

Pulled By: bddppq

fbshipit-source-id: 805db923579bec6fc8fd1c51eeb5b1ef85a96758
2019-11-21 23:54:03 -08:00
23650671a8 add_hparams() NoneType error (#30286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30286

add_hparams() in torch.utils.tensorboard.writer produced the following error
python3.7/site-packages/torch/utils/tensorboard/writer.py", line 294, in add_hparams
    with SummaryWriter(log_dir=os.path.join(self.file_writer.get_logdir(), str(time.time()))) as w_hp:
AttributeError: 'NoneType' object has no attribute 'get_logdir'
Other methods such as add_scalar() and add_histogram() use self._get_file_writer() instead of self.file_writer directly.

Test Plan:
```
writer = summary_writer()
writer.add_hparams({"a": 0, "b": 0}, {"hparam/test_accuracy": 0.5}))
writer.flush()
writer.close()
```

Reviewed By: J0Nreynolds, sanekmelnikov

Differential Revision: D18650610

fbshipit-source-id: 1039dd2067d37913a8a131c8b372491a63154899
2019-11-21 23:25:26 -08:00
5e19460ced cache tensor scalar_type in OperandInfo (#30065)
Summary:
Caches result of `scalar_type` call in TensorIterator and TensorOptions, because the call is expensive.
This removes 120 - 150 ns of overhead (from 1.25 us to 1.12 us for out-of-place case, from 0.86 us to 0.73 us for inplace case)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30065

Test Plan: Covered by existing tests

Differential Revision: D18576236

Pulled By: ngimel

fbshipit-source-id: 17f63851a911fc572c2146f8a520b7f0dadfd14a
2019-11-21 23:25:22 -08:00
73c9e6e6b6 Rename function parameters to avoid [-Werror,-Wshadow] (#30276)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30276

### Summary

When building PyTorch for iOS in BUCK, the compiler complains about the ivar shadowing

```
/Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h:184:144: error: declaration shadows a field of 'c10::Dispatcher' [-Werror,-Wshadow]
inline Return Dispatcher::doCallUnboxed(const DispatchTable& dispatchTable, const LeftRight<ska::flat_hash_map<TensorTypeId, KernelFunction>>& backendFallbackKernels_, Args... args) const {
                                                                                                                                               ^
/Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h:134:63: note: previous declaration is here
  LeftRight<ska::flat_hash_map<TensorTypeId, KernelFunction>> backendFallbackKernels_;
```
This happens because the internal iOS compiler enforces the `[-Werror, -Wshadow]` on every source file when compiling. Say in `benchmark.mm` we import `<torch/script.h>`, then it'll  leads all the way to `Dispatcher.h`

```
 In file included from Apps/Internal/PyTorchPlayground/PyTorchPlayground/Application/Benchmark/Benchmark.mm:6:
In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/ATen.h:5:
In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/Context.h:4:
In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/Tensor.h:12:
In file included from buck-out/cells/fbsource/gen/xplat/caffe2/TensorMethods.h/TensorMethods.h:10:
/Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h
```
It'd be better to have a separate name for function parameters.

cc shoumikhin

Test Plan: Imported from OSS

Differential Revision: D18649116

Pulled By: xta0

fbshipit-source-id: 19f50b7a23c11dedcafc2ac2d85b45ae4999be2f
2019-11-21 21:59:41 -08:00
a822a1d2a8 Avoid overwriting output type in onnx graph (#25906)
Summary:
When creating the onnx graph, we overwrite the output type with the output type of the PT graph.
In some special cases, when using scripting, the PT graph does not have type information. We want to avoid overwriting the input type is these cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25906

Reviewed By: hl475

Differential Revision: D18645903

Pulled By: houseroad

fbshipit-source-id: 56acc43e0c15c74ac8ebd689e04f7371054e362e
2019-11-21 21:30:12 -08:00
30874b31a9 Enable JNI build on Mac host (#30207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30207

This should work now that we're not using gold-specific linker flags.

Test Plan: CI

Differential Revision: D18653521

Pulled By: dreiss

fbshipit-source-id: 31c3cdbefc37b87bfb4140ffbac781131fe72ab3
2019-11-21 20:10:10 -08:00
e5fc86130a Remove unnecessary linker flags from JNI host build (#30206)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30206

- --whole-archive isn't needed because we link libtorch as a dynamic
  dependency, rather than static.
- --gc-sections isn't necessary because most (all?) of the code in our
  JNI library is used (and we're not staticly linking libtorch).
  Removing this one is useful because it's not supported by lld.

Test Plan:
Built on Linux.  Library size was unchanged.
Upcoming diff enables Mac JNI build.

Differential Revision: D18653500

Pulled By: dreiss

fbshipit-source-id: 49ce46fb86a775186f803ada50445b4b2acb54a8
2019-11-21 20:10:06 -08:00
4609c626c5 Enable test_call_method_on_rref in rpc_test (#30261)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30261

With #29827, the flakiness should disappear for test_call_method_on_rref

Test Plan: Imported from OSS

Differential Revision: D18645036

Pulled By: mrshenli

fbshipit-source-id: 44d759062fc78b1a797266096dbb4ddd104f07eb
2019-11-21 19:38:19 -08:00
aa1e99e983 Fix two links in RPC API doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30259

Test Plan: Imported from OSS

Differential Revision: D18644749

Pulled By: mrshenli

fbshipit-source-id: ff515d2588cd59e0d87f020a01885156a6644450
2019-11-21 19:32:22 -08:00
168570b0da move module_save.cpp to non-mobile build section in cmake (#30221)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30221

PR #29881 moved Module::save() methods to a separate source file
and removed C10_MOBILE gating logic. Seems it should stay with
export_module.cpp (which is in "NOT INTERN_BUILD_MOBILE" section).
Otherwise it causes link error with build_mobile.sh.

Test:
- build locally
- check CI

Test Plan: Imported from OSS

Differential Revision: D18649234

Pulled By: ljk53

fbshipit-source-id: b6c90a532d191c41ce10c1047a869d8f73854c4d
2019-11-21 18:56:34 -08:00
0c04763d59 Changes to get inlined graph and proper names after JIT updates (#30244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30244

This makes several small changes to the tensorboard graph parsing methods to address the recent changes to the PyTorch JIT trace/graph.
- Inline graph to get information for all nodes
- Assign and propagate scope names to GetAttr nodes
- Prune all useless GetAttr nodes (any with a ClassType output type - tensors and primitives are kept)
- Create output nodes so output tensor shape can be examined

Reviewed By: sanekmelnikov

Differential Revision: D18556323

fbshipit-source-id: b73a809bacfa554c3fe9c4ae3563525f57539874
2019-11-21 16:59:28 -08:00
983728489a Add ONNX Tests for Torchvision Models (#30121)
Summary:
Adding tests for exporting Torchvision models to ONNX and testing them against ORT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30121

Reviewed By: hl475

Differential Revision: D18619563

Pulled By: houseroad

fbshipit-source-id: 4f78f6876337b941d62efbf5c753c52f6c877d3c
2019-11-21 16:53:59 -08:00
fea963d3ae Fix BackendType repr in doc (#30243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30243

Before this commit, rpc docs shows init_rpc as the following:

```
torch.distributed.rpc.init_rpc(
   name,
   backend=<BackendType.PROCESS_GROUP: BackendValue(
     construct_rpc_agent_options_handler=<function _process_group_construct_rpc_agent_options_handler>,
     init_backend_handler=<function _process_group_init_backend_handler>)>,
   init_method=None,
   rank=-1,
   world_size=None,
   rpc_agent_options=None
)
```

It unnecessarily leaks implementation details. This commit adds a
__repr__ function to BackendType Enum class to address this problem.

closes #29905

Test Plan: Imported from OSS

Differential Revision: D18641559

Pulled By: mrshenli

fbshipit-source-id: 19bf8a2d21c8207f026d097d8e3f077578d53106
2019-11-21 16:22:43 -08:00
063e22b7c2 Fix RRef design doc warning (#30240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30240

Get rid of the following warning when build docs:

```
/Users/shenli/Project/pytorch/docs/source/notes/rref.rst:184: WARNING: Error in "code" directive:
maximum 1 argument(s) allowed, 6 supplied.

.. code::
  import torch
  import torch.distributed.rpc as rpc

  # on worker A
  rref = rpc.remote('B', torch.add, args=(torch.ones(2), 1))
  # say the rref has RRefId 100 and ForkId 1
  rref.to_here()
```

Test Plan: Imported from OSS

Differential Revision: D18640016

Pulled By: mrshenli

fbshipit-source-id: d527827f01183411d4b4c73e0a976bdd7fccbf49
2019-11-21 16:22:39 -08:00
e0325011e4 Add link to RRef protocol in RPC doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30218

Test Plan: Imported from OSS

Differential Revision: D18638881

Pulled By: mrshenli

fbshipit-source-id: ca6fae6f8cea8cdcc33d275dd71a347fbb5dd45c
2019-11-21 16:22:35 -08:00
f2f285c240 Add arguments to benchmark to run pytext models. Output results in ms. (#30273)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30273

Pytext models expect input of the form `1xlength` and another input specifying the length.
Add the `pytext_len` argument to specify this.
ghstack-source-id: 94383501

Test Plan: ./speed_benchmark_torch --model model.pt --input_dims "1,4" --input_type int64 --warmup 10 --iter 10 --report_pep=true --pytext_len=4

Reviewed By: iseeyuan

Differential Revision: D18646028

fbshipit-source-id: 7d5fe0c36da6e5f7b0261619ce4784a46b70f3d8
2019-11-21 16:03:00 -08:00
b2b1601b30 Docker image build on CircleCI (#29932)
Summary:
the source are copied from https://github.com/pytorch/pytorch-ci-dockerfiles, added  .circleci/docker/build_docker.sh to start building job with circleci specific variables.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29932

Differential Revision: D18645740

Pulled By: mingbowan

fbshipit-source-id: 15fdec85ce59f72daa418ac59792535fed1d136b
2019-11-21 15:31:51 -08:00
352731bd6e Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip}
Test Plan: revert-hammer

Differential Revision:
D18632773

Original commit changeset: ea717c81e0d7

fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d
2019-11-21 15:01:09 -08:00
eff4c4d7c1 Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL
Test Plan: revert-hammer

Differential Revision:
D18301806

Original commit changeset: 03da6a26c41e

fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39
2019-11-21 14:50:07 -08:00
cbe0a996f0 Change dimType for shapeInfo (#30183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30183

Resubmit for D18579363 with fix

Test Plan: see D18579363

Reviewed By: ipiszy

Differential Revision: D18623090

fbshipit-source-id: 23c9176a22d9a5547e6b298f0d51717399d10751
2019-11-21 14:35:19 -08:00
188d0a9add Skips flaky UtilsNMSTest.GPUEqualsCPURotatedCorrectnessTest (#30053)
Summary:
See https://github.com/pytorch/pytorch/issues/26811.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30053

Differential Revision: D18597070

Pulled By: mruberry

fbshipit-source-id: a3ab8abda8e019fb9978ad8d41ef44451129868c
2019-11-21 13:44:44 -08:00
f4b9690f2d Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095)
Summary:
Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions.

Fixes https://github.com/pytorch/pytorch/issues/29065
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095

Differential Revision: D18301806

Pulled By: ezyang

fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a
2019-11-21 13:44:40 -08:00
0fdbb762d1 Warn user when resizing out Tensor after arange() (#29195)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28347

gchanan , I am generating a warning as follows:
```
(torch_new) prasun@prasun-xps:~/dev/explore-array-computing$ python arange_test.py
Trying 45...
  Before arange shape is torch.Size([1, 45])
  After arange shape is torch.Size([1, 45])
Trying 46...
  Before arange shape is torch.Size([1, 46])
  After arange shape is torch.Size([1, 46])
Trying 47...
  Before arange shape is torch.Size([1, 47])
  After arange shape is torch.Size([1, 47])
Trying 48...
  Before arange shape is torch.Size([1, 48])
  After arange shape is torch.Size([1, 48])
Trying 49...
  Before arange shape is torch.Size([1, 49])
../aten/src/ATen/native/RangeFactories.cpp:163: UserWarning: Size of out Tensor does not match the result Tensor. The output Tensor will be resized!
  After arange shape is torch.Size([50])
Traceback (most recent call last):
  File "arange_test.py", line 10, in <module>
    assert len(line.shape) == 2
AssertionError
```

Is this alright ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29195

Differential Revision: D18638184

Pulled By: ezyang

fbshipit-source-id: a93e4ce615b5a315570f9951021ef74fc1d895a6
2019-11-21 13:06:14 -08:00
1bba0eb35b Add clone_instance for Module (#30168)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30168

Previous implementation of `clone` in `script::Module` copies both the module instance and the
class type, after we enabled type sharing https://github.com/pytorch/pytorch/pull/26666 we also
need to have a function to clone instance only and share the underlying class type.

Test Plan:
tbd

Imported from OSS

Differential Revision: D18631324

fbshipit-source-id: dbadcf19695faee0f755f45093b24618c047b9d1
2019-11-21 13:00:34 -08:00
2c1c6de122 Represent the original python name the same way in traced and scripted modules.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29912

Test Plan: Imported from OSS

Differential Revision: D18533135

Pulled By: ZolotukhinM

fbshipit-source-id: 080dbafa5dcd8c1fb12fec0c956e52fceec430e7
2019-11-21 11:55:40 -08:00
ec30d9028a Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731

The new structure is that libtorch_cpu contains the bulk of our
code, and libtorch depends on libtorch_cpu and libtorch_cuda.

Some subtleties about the patch:
- There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file.
- DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases.
- torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library)
- The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774
- In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l
ibprotobuf.a(arena.cc.o) is referenced by DSO"
- A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions.
- There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this.
- Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases.

Fixes #27215 (as our libraries are smaller), and executes on
part of the plan in #29235.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18632773

Pulled By: ezyang

fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82
2019-11-21 11:27:33 -08:00
d934cf484b call find_package(OpenMP) only when USE_OPENMP=ON (#30223)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30223

I ran into find_package(OpenMP) failure in some linux environment when
USE_OPENMP=OFF. Figured this workaround to unblock - not sure how hard
to find & fix the root cause of find_package() failure.

Test:
- works in my case;
- will check CI;

Test Plan: Imported from OSS

Differential Revision: D18640309

Pulled By: ljk53

fbshipit-source-id: b5b30623f5da4edbe59574a8b35286b74c3225d3
2019-11-21 10:35:15 -08:00
7d3afc4186 enable the per channel dynamic quantization (#30122)
Summary:
The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122

Differential Revision: D18630541

Pulled By: lly-zero-one

fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf
2019-11-21 10:12:05 -08:00
3ba1456aee Fix clip_grad_norm_ / clip_grad_value_ to take input by value instead of by non-const ref (#30216)
Summary:
The original design of `torch::nn::utils::clip_grad_norm_` / `clip_grad_value_` takes input by non-const reference, which prevents users from passing rvalue reference as input into the functions. This PR changes the functions to take input by value, which matches the Python version's semantics, and also adheres to the C++ API convention that if a function modifies its input in-place, it should take that input by value.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30216

Differential Revision: D18632543

Pulled By: yf225

fbshipit-source-id: 97a09d6467f982fe9c8120f483a9c07fcf13699e
2019-11-21 10:07:00 -08:00
6e4c23b02f Add RPC internal helper that overrides the default pickler. (#30185)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30185

To enable share_memory over RPC, add an internal helper that overrides the default RPC pickler.
Replace D18598974
ghstack-source-id: 94299660

Test Plan:
`python test/test_rpc_spawn RpcTestWithSpawn.test_use_rpc_pickler`

`buck test mode/dev-nosan //caffe2/test:rpc_spawn -- test_use_rpc_pickler`

Reviewed By: mrshenli

Differential Revision: D18621372

fbshipit-source-id: c680ef711b2c42524c47a5266e911fa8e0cd45ae
2019-11-21 10:01:02 -08:00
e3334723b2 fix a crash due in nested bailouts (#30097)
Summary:
A prim::BailOut also needs to capture max trip counts as for some graphs they aren't constants and they are used in continuation graphs to figure out the remaining number of iterations to run.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30097

Differential Revision: D18624446

Pulled By: Krovatkin

fbshipit-source-id: 085d25981c6669f65848996cd2d50066cc252048
2019-11-21 09:53:12 -08:00
9e81616343 Merge Tensor and Variable types. (#28287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28287

This PR eliminates the static distinction between
Tensor and Variable.  Every Variable is a Tensor, no need to static_cast
or call the Variable constructor.

To do this, I need Tensor to have API parity with Variable. I have already
moved most of the methods I don't want in Tensor off Variable.
These implementations are all placed in Tensor.cpp.

One API difference is that all Variable methods now have const, so we no longer
have faux const-correctness (see https://github.com/zdevito/ATen/issues/27 for
back story)

This diff is BC breaking in a few ways:
- Because torch::autograd::Variable is now just an alias of at::Tensor, ADL for
  `torch::autograd` functions no longer works, you have to explicitly qualify
  them with `torch::autograd` (examples: `torch/nn/parallel/data_parallel.h`)
- Because Variable and Tensor are now the same type, code which assumes that
  they are different types (e.g., for the purposes of templating, or enable_if checks)
  will not work until you delete the (now) redundant overload/specialization.
  (examples: `torch/nn/modules/container/any.h`, `torch/csrc/utils/pybind.h`)

Some other notes:
- I'm not sure what was going with the old template implementation of `extract_vars`,
  but I couldn't get the sfinae version to work. Replacing it with an overloading based version
  made it work.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18571426

Pulled By: ezyang

fbshipit-source-id: 2ea8151e5f1d8512cdebf1345399642e68b707b8
2019-11-21 09:26:39 -08:00
a78e7eadbd Fix typo in extending doc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30159

Differential Revision: D18619060

Pulled By: albanD

fbshipit-source-id: 1109c8da6242dffd6315b0c9de0f8ca34df0b276
2019-11-21 08:12:32 -08:00
5d80f30f70 add missing space to mask index error msg
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30196

Differential Revision: D18632801

Pulled By: ezyang

fbshipit-source-id: 73f0ba169813cf65f9815307129743ef6fcebcb3
2019-11-21 07:46:08 -08:00
e05e90c62e TensorTypeId-based non-RAII setter/getter for LocalTensorTypeSet (#30113)
Summary:
As discussed in https://github.com/pytorch/pytorch/pull/29592, https://github.com/pytorch/pytorch/pull/29592#issuecomment-553043596.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30113

Differential Revision: D18620080

Pulled By: ezyang

fbshipit-source-id: 0b10a703e68aca6a991d500fb478bd320006d31b
2019-11-21 07:13:03 -08:00
f7b12a9858 fix aten::grad to return optional list (#29577)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29577

`torch.autograd.grad` can return none is one of the input is not in the
autograd graph or not requires_grad, this fix it so that it return a
list of optional tensor instead of list of tensor.

This might have BC issue unfortunately, but I think it's rare both
internal and external (only training use it, and most of the training
use backward, instead of autograd.grad), so whitelist it.

Test Plan: Imported from OSS

Differential Revision: D18491642

fbshipit-source-id: d32b2b3446cf9e8b9a98f6d203a21a75643d8991
2019-11-20 22:19:10 -08:00
38ca3552d9 Unit Test for the Legacy Dynamic Quantized Linear operator (#23139)
Summary: Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```.

Test Plan:
buck test mode/dev caffe2/test:quantized -- 'test_qlinear_legacy \(test_quantized\.TestDynamicQuantizedLinear\)'  --print-passing-details

  [jianyuhuang@devvm29567.prn1.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_dynamic_qlinear \(test_quantized\.TestQuantizedLinear\)'  --print-passing-details
  Parsing buck files: finished in 1.8 sec
  Building: finished in 3.4 sec (100%) 6772/6772 jobs, 2 updated
    Total time: 5.2 sec
  Trace available for this run at /tmp/testpilot.20190714-220130.2698168.log
  TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
  Testpilot build revision 4f180136f799ab45ec2bf5d7644cb14955d4dd7a fbpkg
  6c6253f255644ca3b8ce1bc5955b0f25 at Mon Jul  8 14:13:38 2019 by twsvcscm from /
   usr/local/fbprojects/packages/testinfra.testpilot/651/t.par
  Discovering tests
  Running 1 tests
  Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
        ✓ caffe2/test:quantized - test_dynamic_qlinear (test_quantized.TestQuantizedLinear) 0.023 1/1
  (passed)
  Test output:
  > test_dynamic_qlinear (test_quantized.TestQuantizedLinear) ... ok
  >
  > ----------------------------------------------------------------------
  > Ran 1 test in 0.024s
  >
  > OK
  Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617
  Summary (total time 9.03s):
    PASS: 1
    FAIL: 0
    SKIP: 0
    FATAL: 0
    TIMEOUT: 0
    OMIT: 0

Differential Revision: D16404027

fbshipit-source-id: 4c85dd255637fd8b1eb4830e0464f48c22706f41
2019-11-20 20:59:35 -08:00
1eb9f49cc6 Fix test_jit under pytest
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30212

Test Plan: Imported from OSS

Differential Revision: D18632004

Pulled By: jamesr66a

fbshipit-source-id: d5cfd351890140c604535744598d0f6ad8989450
2019-11-20 20:44:28 -08:00
b154a8cfc7 Integrating the int64_t GEMM in FBGEMM into PyTorch Linear op (#30143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30143

We would like to integrate the int64 GEMM in FBGEMM into PyTorch. This brings ~4x speedup for the Linear op with LongTensor.

Benchmark code:
```
from __future__ import absolute_import, division, print_function, unicode_literals

import time
import torch

torch.set_num_threads(1)

print("M, N, K, GOPS/sec")

for M in range(128, 1025, 128):
    N = M
    K = M

    x = torch.LongTensor(M, K)
    w = torch.LongTensor(K, N)

    NITER = 20

    # Test torch.nn.functional.linear
    s = time.time()
    for _ in range(NITER):
        torch.nn.functional.linear(x, w)
        # Z = x @ w
    elapsed_per_iter_linear = (time.time() - s) / NITER

    print(
        "{}, {}, {}, {:0.2f}".format(M, N, K, 2.0 * M * N * K / elapsed_per_iter_linear / 1e9)
    )
```

Before this PR:
```
M, N, K, GOPS/sec
128, 128, 128, 2.31
256, 256, 256, 2.49
384, 384, 384, 2.54
512, 512, 512, 2.57
640, 640, 640, 2.46
768, 768, 768, 2.59
896, 896, 896, 2.59
1024, 1024, 1024, 2.61
```

After this PR:
```
(base) [root@rtptest10054.frc2 ~/jhuang_test/int64_gemm]# python torch_linear.py
M, N, K, GOPS/sec
128, 128, 128, 5.35
256, 256, 256, 8.34
384, 384, 384, 9.03
512, 512, 512, 9.22
640, 640, 640, 9.55
768, 768, 768, 9.73
896, 896, 896, 9.82
1024, 1024, 1024, 9.63
```
ghstack-source-id: 94308012

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D18610019

fbshipit-source-id: f830660927b2666db34427d9de51db011f80f766
2019-11-20 20:22:50 -08:00
cc16819028 Add abort API in gloo ProcessGroup Send/Recv Work (#29928)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29928

Original author: Shihao Xu
- Add abort to `c10d::ProcessGroup::Work`.
- Change the return type of `c10d::ProcessGroup::Work::wait()` to boolean to indicate if the work is aborted after waiting.
- Add unit test for the correctness of abort.
ghstack-source-id: 94305515
ghstack-source-id: 94305515

Differential Revision: D5685727

fbshipit-source-id: 6e682bb563c2393a5c303c877331140417d3f607
2019-11-20 20:18:54 -08:00
0a77c090d5 C++ parity, convert_parameters (#29267)
Summary:
yf225 https://github.com/pytorch/pytorch/issues/25883
update parameters_to_vector and vector_to_parameters
check please!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29267

Differential Revision: D18628571

Pulled By: yf225

fbshipit-source-id: 03783e6b0f8183dd97ae48f3da4acb1d07083555
2019-11-20 19:59:11 -08:00
bbb3c415c9 ONNX Hardtanh Opset 11 Support (#30169)
Summary:
Add support for hardtanh that was blacklisted in opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30169

Reviewed By: hl475

Differential Revision: D18619552

Pulled By: houseroad

fbshipit-source-id: 0c1bfb0a53d1dd2327c5db7afd03a90482abb9fe
2019-11-20 18:59:00 -08:00
fd74a19aa4 apply clang format -i (#30180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30180

Just applying `clang-format -i` to not mix it with other changes

Test Plan: Imported from OSS

Differential Revision: D18627473

Pulled By: IvanKobzarev

fbshipit-source-id: ed341e356fea31b8515de29d5ea2ede07e8b66a2
2019-11-20 16:46:43 -08:00
1aa80471b8 minor fix to filter (#30200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30200

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --ai_pep_format True --operators None --iterations -1 --warmup_iterations -1 --wipe_cache --forward_only False --device cpu --tag_filter all --use_jit False --operator_range b-z
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: batchnorm
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.29026457108557224"}
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.2813781425356865"}
PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.28009670320898294"}
...

Reviewed By: hl475

Differential Revision: D18627512

fbshipit-source-id: 23f622b96168f90a8d8648bfd9ff9a5116baafdf
2019-11-20 16:36:04 -08:00
f1a0a27da1 col max hist observer
Summary:
Add InputColumnMaxHistogramNetObserver and InputColumnMaxHistogramObserver to dnnlowp observers.

Sample output histogram at /mnt/public/amyyang/test/col_max_test.log (generated for ctr_web_feed)
```
columns:
        "op_index",
        "input_idx",
        "blob_name",
        "col_idx",
        "min",
        "max",
        "nbins"
```

Test Plan: Tested with ctr_web_feed

Reviewed By: csummersea

Differential Revision: D18194229

fbshipit-source-id: 1402fcdc174a1f52744c850f5e2cc3bdc73c3a45
2019-11-20 16:29:53 -08:00
449828378d Serialize ClassType as its qualname
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30058

Test Plan: Imported from OSS

Differential Revision: D18584269

Pulled By: jamesr66a

fbshipit-source-id: 5f1d0142bd7cd94eecbd2ed9250a0de47639040b
2019-11-20 16:17:26 -08:00
2803261a23 Update API doc for wait_all_workers after rename
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30179

Test Plan: Imported from OSS

Differential Revision: D18623092

Pulled By: mrshenli

fbshipit-source-id: 1bbffc7476f256c156783274f7ef51342820edcd
2019-11-20 16:12:30 -08:00
de05114618 polish examples in docstrings and update docs to reflect correct use of (#30052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052

Some of the examples provided in `rpc/api.py` were not updated along
with the code changes, this PR updates them. Also removes the
`dist.ProcessGroup` information since `init_rpc` now initializes a default
process group.
ghstack-source-id: 94273004

Test Plan: Unit tests pass

Differential Revision: D18582596

fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f
2019-11-20 15:30:38 -08:00
bebed492cf Make RRefContext singleton leaky, deal with module destruct order race. (#30172)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30172

RRefContext is a conventional singleton, used by rref.cpp. At module teardown
time, it's not defined whether rref_context.cpp or rref.cpp will be destroyed first.

We were observing a SIGSEGV because RRefContext is destroyed before a dangling
~UserRRef() call is able to execute. Particularly, the underlying
ctx.agent()->getWorkerInfo(ownerId_) call failed.

This change just avoids the SIGSEGV by forcing an intentional leak, though we still
need to deal with why there's a dangling UserRref at module destruction time.
ghstack-source-id: 94287441

Test Plan:
existing test suite
       test_elastic_averaging in context of D18511430, where the segfault reproed reliable.

Differential Revision: D18620786

fbshipit-source-id: 17b6ccc0eb1724b579a68615e4afb8e9672b0662
2019-11-20 15:12:51 -08:00
211e39fd1c add docs for profiling PyTorch with py-spy (#30166)
Summary:
This adds developer documentation for profiling PyTorch using py-spy. In my work on `__torch_function__` I found its ability to profile native code and dump flame graphs extremely useful. I'm not aware of another Python sampling profiler with similar functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30166

Differential Revision: D18625133

Pulled By: ezyang

fbshipit-source-id: cf1b851564a07c9f12fcf1338ac4527f4a3c61c0
2019-11-20 15:09:40 -08:00
36aaa299f8 shut up clang-tidy on ir.h/cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30118

Test Plan: Imported from OSS

Differential Revision: D18620239

fbshipit-source-id: 5734d9d1f38a9b38ac4a1fc121fb246b783fa262
2019-11-20 13:19:25 -08:00
43fb0015db custom build script (#30144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144

Create script to produce libtorch that only contains ops needed by specific
models. Developers can use this workflow to further optimize mobile build size.

Need keep a dummy stub for unused (stripped) ops because some JIT side
logic requires certain function schemas to be existed in the JIT op
registry.

Test Steps:
1. Build "dump_operator_names" binary and use it to dump root ops needed
by a specific model:
```
build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml
```

2. The MobileNetV2 model should use the following ops:
```
- aten::t
- aten::dropout
- aten::mean.dim
- aten::add.Tensor
- prim::ListConstruct
- aten::addmm
- aten::_convolution
- aten::batch_norm
- aten::hardtanh_
- aten::mm
```
NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm".
You need fix it manually for now.

3. Run custom build script locally (use Android as an example):
```
SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a
```

4. Checkout demo app that uses locally built library instead of
downloading from jcenter repo:
```
git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git
```

5. Copy locally built libraries to demo app folder:
```
find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \;
```

6. Build demo app with locally built libtorch:
```
cd ${HOME}/src/android-demo-app/HelloWorldApp
./gradlew clean && ./gradlew assembleDebug
```

7. Install and run the demo app.

In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M.

Test Plan: Imported from OSS

Differential Revision: D18612127

Pulled By: ljk53

fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a
2019-11-20 13:16:02 -08:00
ae6af8d55f Enable multinomial for torch.half (#29266)
Summary:
Changelog:
- Re-enable multinomial sampling when the probability tensor has `dtype == torch.half`.

It seems to have been missed in https://github.com/pytorch/pytorch/issues/28481.

Fixes https://github.com/pytorch/pytorch/issues/29211
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29266

Differential Revision: D18619105

Pulled By: ezyang

fbshipit-source-id: 1f87e5183e75de5c5e0ffde862fc72d040b32864
2019-11-20 13:06:46 -08:00
51259e5024 Updating submodules
Summary:
GitHub commits:

7cc9d9257b
93d91859c8
ab0a6495f6
3cd75736a7
fb3e6aac5d
4ac5fd6ed9
cf783ae678
6aaaa4754f

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 60b6d716aa073eda5fc6dbcbd0daeee536c25314
2019-11-20 13:02:49 -08:00
c4e7f1b232 Revert D18579363: Change dimType for shapeInfo
Test Plan: revert-hammer

Differential Revision:
D18579363

Original commit changeset: 72d5a2a8a20a

fbshipit-source-id: 282c195a160892641728d0fbcc2e704a4b5b2d05
2019-11-20 12:59:02 -08:00
c2b7b2cbf8 Make observed values actually flow through observers (#30140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30140

This seems more semantically correct to me, and makes it so we don't have to iterate over Uses of observed values

Test Plan: Imported from OSS

Differential Revision: D18610676

Pulled By: jamesr66a

fbshipit-source-id: f835266f148bd8198b05cd9df95276e1112dd250
2019-11-20 12:48:16 -08:00
2d534abb39 Modernize graph mode IR API calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30130

Test Plan: Imported from OSS

Differential Revision: D18608004

Pulled By: jamesr66a

fbshipit-source-id: 42e946ec96b1d26a364abe0a7eb71aa0aecc52ed
2019-11-20 12:48:12 -08:00
73cf4d468f Design doc for Remote Reference (#30066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30066

This commit adds design reasoning and walks through four scenarios
for RRef.

Test Plan: Imported from OSS

Reviewed By: rohan-varma

Differential Revision: D18595094

Pulled By: mrshenli

fbshipit-source-id: 134102901ce515a44a2e7cd013b62143a6158120
2019-11-20 12:42:28 -08:00
5cbdbddc12 Add test for F::max_unpool3d, and update parity table
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30171

Differential Revision: D18620503

Pulled By: yf225

fbshipit-source-id: 52adf9a6c0238b5cdb2e11e03807fb7dd73880bf
2019-11-20 12:42:24 -08:00
f304bd5062 rename join_rpc to wait_all_workers in public api (#30050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30050

Renames this API to wait_all_workers as discussed.
ghstack-source-id: 94273005

Test Plan: Unit tests pass

Differential Revision: D18581466

fbshipit-source-id: 4ff5d5fb2d528f17252d5b5f30c3047d2efb92bf
2019-11-20 12:38:35 -08:00
a460c856dd Fix naming for kl_div and binary_cross_entropy functional options (#30146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146

This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.

Test Plan: Imported from OSS

Differential Revision: D18618971

Pulled By: yf225

fbshipit-source-id: 2af62c1a0ace2cd0c36c2f1071639bf131d8fe61
2019-11-20 12:23:50 -08:00
9cb8fb61c2 update operator_range discription in op bench (#30170)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170

as title

Test Plan:
```
buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef
...
ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end>

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 60.551

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cuda
# Input: M: 8, N: 32, K: 256, device: cuda
Forward Execution Time (us) : 67.716
...

buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cpu
# Input: M: 1, N: 256, K: 3136, device: cpu
Forward Execution Time (us) : 296.004
...

Reviewed By: hl475

Differential Revision: D18619975

fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516
2019-11-20 12:07:14 -08:00
ff7afede92 Stop showing .api as an API path component in RPC docs (#30160)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30160

The path torch.distributed.rpc.api is an implementation detail, which
should not be used by applications to import RPC APIs. Instead, all
RPC APIs are exposed directly as torch.distributed.rpc.*. This
commit makes the API doc consistent with the above expectation.

Test Plan: Imported from OSS

Differential Revision: D18616359

Pulled By: mrshenli

fbshipit-source-id: 8207f7d36c24cf55af737c03a27fd1896c231641
2019-11-20 12:04:10 -08:00
0762bbfc9a Eliminate tensor copies from compute_common_type_ in TensorIterator. (#30018)
Summary:
This requires refactoring at::native::result_type to operate as a
state machine, processing the input types one at a time.  There may
be other places in the code base that could benefit from adopting
this approach as well.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30018

Differential Revision: D18606427

Pulled By: resistor

fbshipit-source-id: f6b779326bdb746508690cf7ca6de777adc66244
2019-11-20 11:51:28 -08:00
ff94ddda08 Change dimType for shapeInfo (#30047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30047

Previously, we use dimType to represent dimension type for a tensor. Now, change it to vector<DimType> to record dim type for every dimension of the tensor.

Reviewed By: yinghai, ipiszy

Differential Revision: D18579363

fbshipit-source-id: 72d5a2a8a20a7653e73e64c8eb97f7eed953ea93
2019-11-20 11:43:35 -08:00
7201a2e854 remove consistency check from setup (#30043)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30043

This is already checked on in the GH actions linter, so this check is
redundant. And putting it in `setup` has the effect of blocking direct
changes to config.yml when I want to experiment, which is a bit
bothersome.

Test Plan: Imported from OSS

Differential Revision: D18611674

Pulled By: suo

fbshipit-source-id: f81670ae9f264408a3ea72c1ba5fcea208681311
2019-11-20 11:14:47 -08:00
67b77afcdf Fast histogram observer
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29790

Test Plan:
import torch
import time
import numpy as np
from torch.quantization.observer import HistogramObserver

X = torch.randn(1,1,224,224)

obs = HistogramObserver(2048)
acc_time = 0
for i in range(100):
   X = torch.randn(10,1,320,320)
   start = time.time()
   obs(X)
   #obs.forward_new(X)
   acc_time = acc_time + time.time()-start
print(acc_time)

Imported from OSS

Differential Revision: D18508562

fbshipit-source-id: 456e82360ce1b3f9d8b6e1832d23f1339655011a
2019-11-20 11:14:41 -08:00
f03db0cd19 Add torch::nn::functional to C++/Python parity tracker (#29819)
Summary:
This PR adds all `torch::nn::functional` functions and updated their parity status in the C++/Python parity tracker.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29819

Differential Revision: D18617762

Pulled By: yf225

fbshipit-source-id: 75a4d770e2da28b626f785cab243465dbc51efd1
2019-11-20 11:14:36 -08:00
f2b851a9e5 Returning axis from calculate_qparams (#29494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494

`calculate_qparams` of per channel quantization should return the axis, this
PR added this and also added corresponding support in graph mode

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18580905

fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396
2019-11-20 11:06:48 -08:00
64817a43d2 Test for per channel graph mode quantization (#29493)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29493

att

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D18580907

fbshipit-source-id: 05218e012c0322bb88714670d5dbe9332252f2ee
2019-11-20 11:06:44 -08:00
fbcb88e8b3 Split module.cpp and export.cpp to support saving on mobile (#29881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29881

Breaking these into separate files allows us to have three different builds:
- Mobile inference-only.
- Mobile with module saving.
- Server with module saving and other export functions like ONNX.

And this can be accomplished just by selecting which cpp files to compile,
without setting any preprocessor flags.

Test Plan: CI.  Local mobile+saving build.

Reviewed By: smessmer

Differential Revision: D18509296

fbshipit-source-id: 9438273bac4624df5c7f035b2bacb901cce43053
2019-11-20 10:47:21 -08:00
72bc7bf37b Revert D18612158: Fix naming for kl_div and binary_cross_entropy functional options
Test Plan: revert-hammer

Differential Revision:
D18612158

Original commit changeset: 8c403fa1c2a0

fbshipit-source-id: f22d7c4664119d4e7397fc017bacecf3e318af11
2019-11-20 10:26:31 -08:00
d11dfd1a84 only run embeddingbag op on cpu (#30163)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30163

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag
Parsing buck files: finished in 0.9 sec
Building: finished in 02:32.5 min (100%) 7358/7358 jobs, 1 updated
  Total time: 02:33.5 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --operators embeddingbag
Parsing buck files: finished in 0.9 sec
Building: finished in 5.3 sec (100%) 5604/5604 jobs, 0 updated
  Total time: 6.3 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue_cpu
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True, device: cpu
Forward Execution Time (us) : 62.608
...

Reviewed By: hl475

Differential Revision: D18617540

fbshipit-source-id: 062dd73c455db8b67749078603745651b55254b2
2019-11-20 10:02:39 -08:00
e84fcc1fd1 Fix naming for kl_div and binary_cross_entropy functional options (#30146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146

This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options.

Test Plan: Imported from OSS

Differential Revision: D18612158

Pulled By: yf225

fbshipit-source-id: 8c403fa1c2a0a65734a3ec2387cc0937c46cab24
2019-11-20 09:44:21 -08:00
b0309d1b5b More documentation on caffe2_interface_library (#29903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29903

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18616888

Pulled By: ezyang

fbshipit-source-id: 360760a688dcc8ba117cd79d89db2afb2c35ab27
2019-11-20 08:58:01 -08:00
36a47d71e1 Enabled bfloat16 for cuda (#27259)
Summary:
Enabled basic support for bfloat16 on cuda
Tested via unit tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27259

Differential Revision: D17728661

Pulled By: izdeby

fbshipit-source-id: 99efb6bc4aec029fe6bbc8a68963dca9c9dc5810
2019-11-20 08:49:56 -08:00
551e387fff Disable flaky test test_graph_for_py_nested_remote_call
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30132

Test Plan: Imported from OSS

Differential Revision: D18609560

Pulled By: mrshenli

fbshipit-source-id: 00fbfc8753e002808f49cf9f09ce0c0966a74485
2019-11-20 07:44:00 -08:00
13283e0cbb Change order of recalculating numel and restriding (#30025)
Summary:
Fix order of recalculating numel and restriding as first one should always go first
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30025

Differential Revision: D18576446

Pulled By: VitalyFedyunin

fbshipit-source-id: fe9e18ec2bbb7b43d634e150f8979b8d6b7c5196
2019-11-20 07:36:14 -08:00
c2c835dd95 Port sigmoid backward to Aten(CPU+CUDA) (#29185)
Summary:
VitalyFedyunin, This PR is about port sigmoid backward to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
if torch.cuda.is_available():
    device = "cuda"

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    for i in range(1000):
        output = input.sigmoid().sum()
        output.backward()

#get running time
for n in [100, 10000]:
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    for i in range(10000):
        output = input.sigmoid().sum()
        t1 = _time()
        output.backward()
        t2 = _time()
        bwd_t = bwd_t + (t2 - t1)
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d), backwad avg time is %.2f (ms)." % (n, bwd_avg))
```
Test Device: CPU: skx-8280, GPU: Tesla P40

**Perfromance**:
Before:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 4.21 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.06 (ms).
input size(128, 10000), backwad avg time is 2.30 (ms).
```
After:
```
GPU:
input size(128, 100), backwad avg time is 0.14 (ms).
input size(128, 10000), backwad avg time is 0.17 (ms).
CPU:
OMP_NUM_THREADS=56
input size(128, 100), backwad avg time is 0.05 (ms).
input size(128, 10000), backwad avg time is 0.48 (ms).
OMP_NUM_THREADS=1
input size(128, 100), backwad avg time is 0.04 (ms).
input size(128, 10000), backwad avg time is 0.86 (ms).
```
How to set number thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`
echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"
export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0
numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run **./run.sh num_threads test.py**.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29185

Differential Revision: D18587352

Pulled By: VitalyFedyunin

fbshipit-source-id: 8167ca261960399f795d35a83fa8c4be365bc4da
2019-11-20 07:31:42 -08:00
c0104a1c89 Fix typo in comment in cpp_extension (#30028)
Summary:
From https://github.com/pytorch/pytorch/issues/26614
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30028

Differential Revision: D18597666

Pulled By: albanD

fbshipit-source-id: 93bf0e4ee34a63df4b544d44f630a9c0fc95fd83
2019-11-20 07:16:48 -08:00
f8e7f3fca4 C++ API parity: BCEWithLogitsLoss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28783

Test Plan: Imported from OSS

Differential Revision: D18202435

Pulled By: pbelevich

fbshipit-source-id: 011b028bbb2a091e98d3548616b99d7b4569c239
2019-11-20 06:46:38 -08:00
93db2b86d1 Fix type sharing on loaded ScriptModules (#29826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29826

After save/load, we lose concrete type information. So if you tried to
script something that contained a loaded ScriptModule as a submodule,
the following sequence happened:
1. During ConcreteType inference, the loaded submodule got a new
inferred type.
2. But it already has a type! So there was a type mismatch.

To fix this, we should generate a ConcreteType directly from the loaded
submodule type (similar to what we do for interfaces). This makes sense
too--the ConcreteModuleType should be empty, since all the "sugaredness"
was stripped out during the save/load process.

Test Plan: Imported from OSS

Differential Revision: D18575009

Pulled By: suo

fbshipit-source-id: 4d329b7e9b7e7624f459e50092e35ab0ab813791
2019-11-20 01:13:09 -08:00
558a777615 Re-unify module and interface in ConcreteModuleType (#29825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29825

We made `ModuleInfo` a union initially to represent the idea that a
submodule could either be a regular module or a module interface.

This PR represents module interfaces as a ConcreteModuleType with no
info (e.g.  no "sugaredness"), and with the interface type as the
underlying `jitType_`. This has the effect of reducing the special
casing around adding/maintaining module info.

Test Plan: Imported from OSS

Differential Revision: D18575011

Pulled By: suo

fbshipit-source-id: 53e297b39aa1a03bcdadd795ff225aa68fec9d70
2019-11-20 01:13:06 -08:00
63e66fd267 Split ConcreteModuleType into two types (#29824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29824

We have two distinct phases/uses for ConcreteModuleType:
1. We are building it up and using it to check whether we can
reuse JIT types. (RawConcreteModuleType)
2. We are using it to satisfy ModuleValue::attr queries.
(ConcreteModuleType)

These types share an underlying `ConcreteModuleTypeData` which
actually stores the relevant info.

Previously they were the same type because I was lazy, but it's been the
source of a bug. So split them to formalize the differing invariants for
the two phases.

Test Plan: Imported from OSS

Differential Revision: D18575010

Pulled By: suo

fbshipit-source-id: 3e4ebcd36e78b947150d8f0dbb74ecccad23e7c4
2019-11-20 01:13:02 -08:00
7495c25440 Updating submodules
Summary:
GitHub commits:

b21fd47972
950060c67b
d5cfc73665
195d10ad15
22c4b39574
0306e01233
fc0ad8b966
6f87219b24
9c674a1271
69ac8aeb62
672beabd4c

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 96ba9389d7c7faf53c0c5775a425dbea17da217a
2019-11-19 23:21:05 -08:00
c06f9023e5 Polish rpc docstring. (#30069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30069

1) Fix rpc docstrings
2) Fix some links
ghstack-source-id: 94250890

Test Plan: waitforbuildbot

Differential Revision: D18588231

fbshipit-source-id: 33846ace1afa94d25f34b0370437abf6d9408f06
2019-11-19 23:10:14 -08:00
def2985e90 add flag to strip C10 error message (#30111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30111

Add flag to strip C10 error message. To ensure there's no size regression, add the same flag to existing caffe2 and pytorch build

Test Plan: size bot check

Reviewed By: dreiss

Differential Revision: D18577969

fbshipit-source-id: 84ac57b11ec5c29e831d619260024a0a4a6fdcd0
2019-11-19 22:53:59 -08:00
88ef402cb5 Add distributed optimizer section to distributed autograd design doc. (#30068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30068

ghstack-source-id: 94228719

Test Plan: waitforbuildbot

Differential Revision: D18556536

fbshipit-source-id: decd6927bfdd1ee3c81fef7430aa7095d7f38d33
2019-11-19 22:43:03 -08:00
b410d864c9 make python remote exception to rethrow when using remote reference to itself (#29930)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29930
Right now, python call remote exception rethrown is coupled with deserializtiaon.
For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown.

This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here()

close #29924
ghstack-source-id: 94210894

Test Plan: unit tests

Differential Revision: D18541916

fbshipit-source-id: 7cda93f623d52c740b3c1b1fa9a442f866984340
2019-11-19 21:33:21 -08:00
1b26e3ff6d fbjni gradle obey ABI_FILTERS parameter
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30135

Test Plan: Imported from OSS

Differential Revision: D18610031

Pulled By: IvanKobzarev

fbshipit-source-id: 7dd8240b71e9f6d77f723243991cd1b5c9984df6
2019-11-19 20:09:48 -08:00
cc81769e10 C++ API parity: isfinite
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30083

Test Plan: Imported from OSS

Differential Revision: D18594723

Pulled By: pbelevich

fbshipit-source-id: 5970e0aa6ef8994e9c4a741784fd053383aaceb7
2019-11-19 20:00:05 -08:00
b2291d4600 Make PerChannelMinMaxObserver scriptable using torch.jit.ignore (#29416)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29416

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18580906

fbshipit-source-id: 5370300b89e26c2b4662b17e51284e8708cb5843
2019-11-19 19:12:55 -08:00
80e3f17301 Resubmit "Add RpcAgentOptions struct type, which bundles different required arguments for different RpcAgents" (#30093)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30093

https://github.com/pytorch/pytorch/pull/28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in https://github.com/pytorch/pytorch/issues/29031.

To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields.
ghstack-source-id: 94197295

Test Plan:
### OSS RPC + RRef tests

```
buck test mode/dev-nosan //caffe2/test:rpc_fork
```

```
buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_rpc_fork_test -- test_sync_rpc
```

### Prototype RRef tests

```
buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc
```

```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc_thrift_rpc_agent
```

### Dist autograd

```
buck test mode/dev-nosan caffe2/test:dist_autograd_fork
```

```
buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_dist_autograd_fork_test
```

Differential Revision: D18595578

fbshipit-source-id: 616fca3b844c171ed5277bbc6a2b1693bc3a8065
2019-11-19 18:52:30 -08:00
15bc41a8aa Overwrite __setstate__ func in MultiheadAttention (#29001)
Summary:
Overwrite `__setstate__` func in nn.MultiheadAttention func and add `self._qkv_same_embed_dim` attribute in the `dict`. Current users should not be affected by the change.

The changes have been tested to load a MultiheadAttention model trained by PyTorch 1.1. If users have an old MultiheadAttention model, please use `torch.load` func to load the old model for inference under v1.4.0 and above.

```
import torch
model = torch.load('old_v1.1.0_MultiheadAttention.pt') # model works for torch 1.4
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29001

Differential Revision: D18257671

Pulled By: zhangguanheng66

fbshipit-source-id: fa41b85f6d53034dc9f445af60f2ad9636e9abf7
2019-11-19 18:32:44 -08:00
07e14c7cd0 DistributedOptimizer: wait for all workers to finish _LocalOptimizer constructor (#30062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30062

This allows to catch exceptions during optimizer creation.
ghstack-source-id: 94232436

Test Plan: new unit test.

Differential Revision: D18586108

fbshipit-source-id: 71cfdf337fe803dbea8787b4c68e5a52b70a1f68
2019-11-19 18:30:00 -08:00
2367e71f55 Disable ProfilingGraphExecutorImpl for mobile (#30067)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30067

### Summary

The mobile build has been broken since last week due to a runtime error caused by a missing operator in JIT:

```shell
libc++abi.dylib: terminating with uncaught exception of type torch::jit::script::ErrorReport:
Unknown builtin op: aten::_adaptive_avg_pool2d_backward.
Could not find any similar ops to aten::_adaptive_avg_pool2d_backward. This op may not exist or may not be currently supported in TorchScript.
:
at <string>:9:28
                grad_self = grad.expand(self.size()) / (self_size[-1] * self_size[-2])
            else:
                grad_self = torch._adaptive_avg_pool2d_backward(grad, self)
                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

            return grad_self
```
### How this happens

Since we've disabled the autograd for the opensourced version, the `backward` ops won't get registered by JIT.

When `forward` runs, a `GraphExecutor` will be created according to the value of  `executor_mode`.  In the mobile case , this one was set to true, which gives us the `ProfilingGraphExecutorImpl` object. Seems like this executor will eventually try to emit IR for autograd schemas? which causes the error.

### Fix

There are two ways to fix it.

1. Add a macro to disable `profiling_mode` as well as `executor_mode` on mobile. Like what `FBCODE_CAFFE2` does [here](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/profiling_graph_executor_impl.cpp#L22).
2. Disable the two modes in runtime, by calling ` torch::jit::getExecutorMode() = false;` before calling forward.

(IMO, The second fix is sort of a workaround as it doesn't make sense from a user perspective (Why I need to do this).  But the up side is that we don't have to introduce yet another macro )

Feel free to drop comments, if there is a better way to fix it.

### How this was not detected by our mobile CI

We're working on adding runtime tests to our mobile build to prevent similar issues like this.

### Test Plan

- The error above disappears
- Don't break CI

cc AshkanAliabadi

Test Plan: Imported from OSS

Differential Revision: D18605998

Pulled By: xta0

fbshipit-source-id: 11fa85c2b44d54bc28a9c45731af0f5d17d5804c
2019-11-19 18:04:57 -08:00
2c8dce915c Show full call stack in TorchScript exception even when calls were inlined.
Summary:
This uses newly added InlinedCallStack to print the original call stack
even if the real call stack is shallower because of inlining.
This change also makes torchscript stacktraces look like python ones.

Example:
```
torch.jit.script
def baz(c, b):
    return c + b

torch.jit.script
def foo(c, b):
    return baz(c, b)

torch.jit.script
def bar(c, b):
    return foo(c, b)

bar(torch.rand(10), torch.rand(9))
```

Output before:
```
Traceback (most recent call last):
  File "fail.py", line 25, in <module>
    bar(torch.rand(10), torch.rand(9))
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
The above operation failed in interpreter, with the following stack trace:
at fail.py:15:11
torch.jit.script
def baz(c, b):
    return c + b
           ~~~~~ <--- HERE
```

Output after:
```
Traceback (most recent call last):
  File "fail.py", line 41, in <module>
    bar(torch.rand(10), torch.rand(9))
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
The above operation failed in interpreter.
Traceback (most recent call last):
  File "fail.py", line 33
torch.jit.script
def bar(c, b):
    return foo(c, b)
           ~~~ <--- HERE
  File "fail.py", line 29, in foo
torch.jit.script
def foo(c, b):
    return baz(c, b)
           ~~~ <--- HERE
  File "fail.py", line 25, in baz
torch.jit.script
def baz(c, b):
    return c + b
           ~~~~~ <--- HERE
```

Output of non-scripted python code:
```
Traceback (most recent call last):
  File "fail.py", line 36, in <module>
    bar(torch.rand(10), torch.rand(9))
  File "fail.py", line 21, in bar
    return foo(c, b)
  File "fail.py", line 18, in foo
    return baz(c, b)
  File "fail.py", line 15, in baz
    return c + b
RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0
```

Differential Revision: D18532812

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: e7e5ba5e4a8f1c7086406271d0f1685d9db8541a
2019-11-19 17:58:55 -08:00
a9d1465c82 Add logging to inliner. (#27922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27922

gh-metadata: pytorch pytorch 27922 gh/ZolotukhinM/140/head

Differential Revision: D17914135

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: d75bdf1efbfdc877f10017b16046bdbdc97e2dd6
2019-11-19 17:58:51 -08:00
59eb682ce3 Add InlinedCallStack class. (#27921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27921

InlinedCallstack serves a similar purpose to Scope, but instead of storing
string names of the functions it stores pointer to Function objects
themselves. Currently, scopes are used in tracing and callstacks are
used in scripting - hopefully I would be able to merge them in future.

gh-metadata: pytorch pytorch 27921 gh/ZolotukhinM/139/head

Differential Revision: D17914132

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: b1daa6700199ee1a97a7f49a6fced9ac0dc13051
2019-11-19 17:58:46 -08:00
12263cfa98 Make inlineCallTo to take Function instead of Graph as the callee argument. (#27920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27920

gh-metadata: pytorch pytorch 27920 gh/ZolotukhinM/138/head

Differential Revision: D17914133

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 6aec2a71ed5718fecab81a107e37b26088b94c65
2019-11-19 17:58:42 -08:00
0eb8c3dbfb Add a variant of insertGraph that fills values map. (#27919)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27919

gh-metadata: pytorch pytorch 27919 gh/ZolotukhinM/137/head

Differential Revision: D17914134

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: ecc85c97b497eaf82e25e9c6b4477f6b1103bf69
2019-11-19 17:58:37 -08:00
e951f7cf58 Add Python3 ROCm CentOS docker image (#30119)
Summary:
959b068874
d2ee605730
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30119

Differential Revision: D18604645

Pulled By: bddppq

fbshipit-source-id: d9375e44dad9570ef8fc3d1bbd557795543f8bb2
2019-11-19 17:54:05 -08:00
bb1d9b238d torch::nn::FractionalMaxPool{2,3}d module and functional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29933

Test Plan: Imported from OSS

Differential Revision: D18548174

Pulled By: yf225

fbshipit-source-id: 070776db6e8b7ad94d9b7cbd82b3d6966f061a46
2019-11-19 17:24:07 -08:00
ec52d911bd InstanceNorm{1,2,3}d (#28790)
Summary:
Hi yf225,

I have a few doubts related to implementation:
1) What tests do I have to write?
2) What does _load_state_from_dict does?
3) Do I need to override reset() function as I can not see it's utility?
4) InstanceNormOptions could be removed with BatchNormOptions, but I find that
`track_running_status` is not defined instead `stateful` is defined.

InstanceNorm{1,2,3}d https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28790

Differential Revision: D18588666

Pulled By: yf225

fbshipit-source-id: bb9b81f01f62c3fc8765fa0ba0716768087ee155
2019-11-19 16:57:01 -08:00
8e3486de81 No debug symbols in release android buidls (#30123)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30123

In groovy string `'false'` is resolved as boolean `true`

thats why even as in `gradle.properties`:
```
nativeLibsDoNotStrip=false
```
branch `if (nativeLibsDoNotStrip)` always passed

Test Plan: Imported from OSS

Differential Revision: D18606907

Pulled By: IvanKobzarev

fbshipit-source-id: c10140e775624294c732e78ae3c41e05c7c9ad92
2019-11-19 16:44:56 -08:00
5fa941d4e2 update fastlane to use Scanfile (#29963)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29963

### Summary

To run unit tests via Fastlane, simply run `fastlane scan`. Under the hood, it uses `xcodebuild` to run the unit tests. The Scanfile serves as a config file for Fastlane where you can specify parameters you want to pass to `xcodebuild`. More about Scan - https://docs.fastlane.tools/actions/scan/

### Test Plan

- `fastlane scan` is able to run on CI machines.

Test Plan: Imported from OSS

Differential Revision: D18606098

Pulled By: xta0

fbshipit-source-id: b4727d964fa56076b2ff383b40d1b13607721394
2019-11-19 16:32:26 -08:00
99c59d73a7 Remove input_channels / output_channels / with_bias from ConvOptions (#29838)
Summary:
Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838

Differential Revision: D18597943

Pulled By: yf225

fbshipit-source-id: 59101437f032f042574998eb90eaf0be09352364
2019-11-19 16:28:54 -08:00
868cb05a30 Resubmit "Add RpcAgentTestFixture to extract duplicate code" (#30092)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30092

There are duplicate code for component that rely on RpcAgent. Extract them into a re-usable test fixture class.
ghstack-source-id: 94196891

Test Plan:
### RPC + RRef

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck test mode/dev-nosan //caffe2/test:rpc_spawn
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift
```

### Dist Autograd

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork

buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn
```

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift

buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn_thrift
```

### Dist Optimizer

```
buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork

buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn
```

```
buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork_thrift

buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn_thrift
```

Differential Revision: D18595408

fbshipit-source-id: 8360759c63e838fb19d4eb1aeacca0bf8eb4b55f
2019-11-19 16:24:51 -08:00
877c96cddf explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008

Test Plan: Imported from OSS

Differential Revision: D18575981

Pulled By: VitalyFedyunin

fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c
2019-11-19 16:19:29 -08:00
e46babb637 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30007

Test Plan: Imported from OSS

Differential Revision: D18575982

Pulled By: VitalyFedyunin

fbshipit-source-id: 83be0857fe1080216cd09547a2b3d34455a0cce4
2019-11-19 16:19:24 -08:00
04018ba865 explicitly provide memory format when calling to *_like operators (Redo of 81bf7364)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30006

Test Plan: Imported from OSS

Differential Revision: D18575984

Pulled By: VitalyFedyunin

fbshipit-source-id: b72ea0404f0363001c94f39567c0aeae71cb1f67
2019-11-19 16:19:20 -08:00
66913fe5c1 explicitly provide memory format when calling to *_like operators (Redo of cc1c01)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30005

Test Plan: Imported from OSS

Differential Revision: D18575976

Pulled By: VitalyFedyunin

fbshipit-source-id: 94cc213f42f9bd50eaa096872f38c4563e5c9ba1
2019-11-19 16:19:16 -08:00
dc9e7b73e1 explicitly provide memory format when calling to *_like operators (Redo of e3e06549)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30004

Test Plan: Imported from OSS

Differential Revision: D18575977

Pulled By: VitalyFedyunin

fbshipit-source-id: 344e9a11c93c7e4a822f424c94fa2255592d118e
2019-11-19 16:19:11 -08:00
66cb93c762 explicitly provide memory format when calling to *_like operators (Redo of 4b4aa)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30003

Test Plan: Imported from OSS

Differential Revision: D18575975

Pulled By: VitalyFedyunin

fbshipit-source-id: ce767d116bd821c8e16a7fc7d1be3fca957dcada
2019-11-19 16:19:07 -08:00
295feb4e9a explicitly provide memory format when calling to *_like operators (Redo of ce438f6967)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30002

Test Plan: Imported from OSS

Differential Revision: D18575983

Pulled By: VitalyFedyunin

fbshipit-source-id: f018c04c2799a42196077e9868f799cbb047ac6d
2019-11-19 16:19:03 -08:00
20b73e1805 explicitly provide memory format when calling to *_like operators (Redo of 631b22d)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30001

Test Plan: Imported from OSS

Differential Revision: D18575979

Pulled By: VitalyFedyunin

fbshipit-source-id: d6fe8a6e1b45673f85a0dd49bd6becfadc5091b4
2019-11-19 16:18:58 -08:00
c15a4a0971 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30000

Test Plan: Imported from OSS

Differential Revision: D18575980

Pulled By: VitalyFedyunin

fbshipit-source-id: b0e804fe84ada0617852025fa502c0fb93849cb9
2019-11-19 16:18:54 -08:00
2b1466e665 allow operator_range to take multiple ranges (#30124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30124

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operator_range a,b-c
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cuda
# Input: M: 8, N: 32, K: 256, device: cuda
Forward Execution Time (us) : 71.683

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cuda
# Input: M: 1, N: 256, K: 3136, device: cuda
Forward Execution Time (us) : 118.840

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N8192_K1_cuda
# Input: M: 1, N: 8192, K: 1, device: cuda
Forward Execution Time (us) : 134.274

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M128_N128_K1_dim1_cuda
# Input: M: 128, N: 128, K: 1, dim: 1, device: cuda
Forward Execution Time (us) : 109.172
...

Reviewed By: hl475

Differential Revision: D18605640

fbshipit-source-id: 4ae9b91a50c4cdf1b161b6c5c58f365ba514050c
2019-11-19 16:15:46 -08:00
05a7aaa742 Pass Tensor instead of Tensor& to torch::nn functionals that can change input in place (#30112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30112

Currently, we have torch::nn functionals that takes `input` as `Tensor&` in order to be able to in-place change `input`'s value. We likely shouldn't do this because it will prevent the following use case:
```cpp
F::elu(torch::tensor(1), F::ELUFuncOptions().inplace(true))
```
The solution is to change the type of `input` to `Tensor`, so that we can pass an rvalue into the functional.

Test Plan: Imported from OSS

Differential Revision: D18601580

Pulled By: yf225

fbshipit-source-id: 639a86eb62f6c986b0f20bf7e201983e83126e73
2019-11-19 16:11:39 -08:00
a75b669b0f C++ API: torch::nn::ConvTranspose{1,2,3}d (#29721)
Summary:
Add torch::nn::ConvTranspose{1,2,3}d module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29721

Differential Revision: D18588943

Pulled By: yf225

fbshipit-source-id: d4dbb091389367e70459399d5cda3778325c2120
2019-11-19 16:04:12 -08:00
c2e576e74b Per channel quantization support in insert_prepack_unpack (#29701)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29701

att

Test Plan:
python test/test_jit.py 'TestJit.test_insert_prepack_unpack'

Imported from OSS

Differential Revision: D18580908

fbshipit-source-id: 2d1ce9b6279586198cb53a7fd2a35325fa20bf20
2019-11-19 15:53:04 -08:00
63c957cd94 Use std::shared_ptr for DistAutogradContext. (#29770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29770

We were passing around const and non-const references for
DistAutogradContext from DistAutogradContainer. This wasn't safe since the
context could be deleted from the container and a thread might still be using
the reference. This usually would happen when a backward pass fails on the node
driving the backward pass (resulting in delete context messages being sent to
all nodes) but other nodes are still executing code related to that autograd
context.

This was also the reason why `test_backward_autograd_engine_error` was flaky.

Using a std::shared_ptr everywhere ensures we're safe and never crash.

Closes #28928
Closes #26922
ghstack-source-id: 94201446

Differential Revision: D18494814

fbshipit-source-id: 0c925fdbd5755f6d876dad56885e2cbaf41fc5f0
2019-11-19 15:50:42 -08:00
79b797ccac Build time warning on windows for fbgemm (#29062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062

Build time warning
ghstack-source-id: 94202405

Test Plan: None

Reviewed By: jianyuh

Differential Revision: D18279505

fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3
2019-11-19 14:30:20 -08:00
5aa50c7f3c Enable test_nested_rref in rpc_test.py (#30100)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30100

As after #29827 we only test RPC using spawn, the multi-thread/fork
error should disappear.

Test Plan: Imported from OSS

Differential Revision: D18597002

Pulled By: mrshenli

fbshipit-source-id: 64aa6a59248e5d1b7e1ad1aebffb6a25248388d2
2019-11-19 13:28:05 -08:00
a243e0872e Enable test_nested_remote in rpc_test.py (#30099)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30099

As after #29827 we only test RPC using spawn, the multi-thread/fork
error should disappear.

Test Plan: Imported from OSS

Differential Revision: D18597003

Pulled By: mrshenli

fbshipit-source-id: ebfb1f6f3f961d98351e06ce4b951793a9b95398
2019-11-19 13:28:01 -08:00
8912e6caf5 Enable test_nested_rpc in rpc_test.py (#30098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30098

As after #29827 we only test RPC using spawn, the multi-thread/fork
error should disappear.

Test Plan: Imported from OSS

Differential Revision: D18597001

Pulled By: mrshenli

fbshipit-source-id: 68256289085fac1a9ca76d5b4882e97e2f81d1f4
2019-11-19 13:27:57 -08:00
a689e3a0c4 Support per channel quantization in insert_quant_dequant and fold_prepack (#29492)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29492

Previously graph mode quantization only works for per tensor quantization,
this PR added support for per channel quantization as well, changes include
- insert per channel quantization calls(insert_quant_dequant)
- add support of folding for prepacked per channel quantized weight (fold_prepack)

Test Plan:
test is not possible until we can script PerChannelObserver, which comes in https://github.com/pytorch/pytorch/pull/29416
we'll add test in a separate PR after that.

Imported from OSS

Differential Revision: D18580444

fbshipit-source-id: 347c07f201648ec49f070523642a9170278f8aa4
2019-11-19 12:25:28 -08:00
0ab03d3283 only run embeddingbag benchmark on cpu (#30106)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30106

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

Reviewed By: hl475

Differential Revision: D18598198

fbshipit-source-id: 9b7d103410f1183fdf6776047ea2ef8dba4b7831
2019-11-19 12:07:34 -08:00
4b0a6d299c test reporting (#29658)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29658

This PR makes our test scripts output artifacts that CircleCI can
understand. This has a few benefits:
1. We can actually see failed tests and their output in the job screen
(instead of having to scroll through logs)
2. We can use the CircleCI test metadata API to track failed tests
programmatically.

it looks like this (old ui):
https://circleci.com/gh/pytorch/pytorch/3546584?pipelines-ui-opt-out
or this (new ui):
https://app.circleci.com/jobs/github/pytorch/pytorch/3546584/tests

Test Plan: Imported from OSS

Differential Revision: D18597261

Pulled By: suo

fbshipit-source-id: 07fc7d26bbb834e13cc4cc0e48178645ae6579f5
2019-11-19 11:15:31 -08:00
1dbc84ab6d Remove unnecessary conditional (#29901)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29901

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18594828

Pulled By: ezyang

fbshipit-source-id: cf4ade2da9bf8769cfb3149713941aa9e5e0d197
2019-11-19 11:06:30 -08:00
57acc2ff3a add an unit test target to TestApp (#29962)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29962

### Summary

Recently we've found that the master branch was constantly broken due to some unwanted change being landed on mobile. The problem is that our CI was not able to detect the runtime errors. Starting from this PR, we'll add some unit tests to the iOS Simulator build. As follows:

1. Add an unit test target to XCode (this PR)
2. Use Fastlane to run the tests on CI
3. Modify the CI scripts to trigger tests

### Test Plan

- Don't break the existing CI jobs unless they are flaky.

Test Plan: Imported from OSS

Differential Revision: D18582908

Pulled By: xta0

fbshipit-source-id: f960c47d3bbda79e754a0513e8711867fd3588d2
2019-11-19 11:03:45 -08:00
23991e89cc change operator_range to work with lower and upper in op bench (#30096)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30096

as title

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test  -- --iterations 1 --operator_range a-a
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.quint8_contigTrue
# Input: N: 2, dtype: torch.quint8, contig: True
Forward Execution Time (us) : 22.251

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.qint8_contigTrue
# Input: N: 2, dtype: torch.qint8, contig: True
Forward Execution Time (us) : 17.247

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_N2_dtypetorch.qint32_contigTrue
# Input: N: 2, dtype: torch.qint32, contig: True
Forward Execution Time (us) : 29.653
...

Reviewed By: hl475

Differential Revision: D18596447

fbshipit-source-id: eac8d9d90db244aa9799293c22bb0d30cf3edf58
2019-11-19 11:01:02 -08:00
dca123e76d Add zipfile serialization (#29232)
Summary:
Stacked PRs
 * https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC
 * **https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization**

This adds a serialization method that uses a zipfile (https://github.com/pytorch/pytorch/issues/26567). Right now it is
guarded behind a flag `_use_new_zipfile_serialization`. In release mode it seems to have performance about the same / slightly better than the current serialization in some simple benchmarks for large/small tensors.

Follow ups:
* Flip the `_use_new_zipfile_serialization` flag
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29232

Differential Revision: D18332036

Pulled By: driazati

fbshipit-source-id: 1bac0847c4d599612cba905f2cac8248783be2f4
2019-11-19 10:17:32 -08:00
2b02d154db Implement fast pass for CPU scalars /number literals (#29915)
Summary:
The main changes in this PR are:
- skip device dispatch for CPU scalars (number literals also fall into this). In most cases scalars should be on CPU for best perf, but if users explicitly put on other device, we will respect that setting and exit fast pass.
- directly manipulate Tensor data_ptr when filling scalar into a 1-element tensor.

Some perf benchmark numbers:
```
## Before
In [4]: def test(x):
   ...:     x = x + 2
   ...:     return x
   ...:

In [5]: with torch.no_grad():
   ...:     x = torch.ones(100)
   ...:     %timeit {test(x)}
   ...:
79.8 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

## After
In [2]: def test(x):
   ...:     x = x + 2
   ...:     return x
   ...:

In [3]: with torch.no_grad():
   ...:     x = torch.ones(100)
   ...:     %timeit {test(x)}
   ...:
60.5 µs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
```

Before the patch `tensor_slow` took 15.74% of total time.
<img width="1186" alt="Screen Shot 2019-11-15 at 12 49 51 PM" src="https://user-images.githubusercontent.com/5248122/68976895-cc808c00-07ab-11ea-8f3c-7f15597d12cf.png">
After the patch `tensor_slow` takes 3.84% of total time.
<img width="1190" alt="Screen Shot 2019-11-15 at 1 13 03 PM" src="https://user-images.githubusercontent.com/5248122/68976925-e28e4c80-07ab-11ea-94c0-91172fc3bb53.png">

cc: roosephu who originally reported this issue to me.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29915

Differential Revision: D18584251

Pulled By: ailzhang

fbshipit-source-id: 2353c8012450a81872e1e09717b3b181362be401
2019-11-19 10:14:38 -08:00
e88d096321 C++/Python API Parity: add AlphaDropout (#28424)
Summary:
- add `AlphaDropoutImpl` to `modules/dropout.h` and `modules/dropout.cpp`
 - add `functional/dropout.h` containing the `alpha_dropout` function
 - include `functional/dropout.h` in `nn/functional.h`
 - add functional and module tests
-  related issue https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28424

Differential Revision: D18589162

Pulled By: yf225

fbshipit-source-id: c85734e02431a6c052515e26b11ca30ad7303644
2019-11-19 10:05:51 -08:00
1597f22982 fix device check in op bench (#30091)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30091

as title

Test Plan:
```
Before:
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 91.190

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 27.062

After:
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 28.154

# Benchmarking PyTorch: abs_
# Mode: Eager
# Name: abs__M512_N512_cuda
# Input: M: 512, N: 512, device: cuda
Forward Execution Time (us) : 15.959
...

Reviewed By: hl475

Differential Revision: D18595176

fbshipit-source-id: 048c5b7b2a5318c3687412e12e8d2d5f380a8139
2019-11-19 10:05:47 -08:00
37ca5a8a64 convert_sync_batchnorm should not convert _InstanceNorm instances (#29985)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29187

This introduces a new class `_NormBase` that `_InstanceNorm` and `_BatchNorm` inherit from separately. This means the `isinstance(module, _BatchNorm)` check won't falsely pass for `_InstanceNorm`.

The suggested fix of adding `and not isinstance(module, _InstanceNorm)` works as well, but requires introducing a cyclic dependency between `instancenorm.py` and `batchnorm.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29985

Differential Revision: D18588104

Pulled By: yf225

fbshipit-source-id: f599da3b902ad9c56836db4d429bfc462ed51338
2019-11-19 09:39:36 -08:00
45024e7a35 Support Exporting Bitshift to ONNX (#28210)
Summary:
Support exporting left/right bitshifts to ONNX for all opset versions.

ONNX has a bitshift operator in opset 11, but it only supports unsigned ints, so it can't be used in PyTorch (since only uint8 is the only uint type).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28210

Reviewed By: hl475

Differential Revision: D18575512

Pulled By: houseroad

fbshipit-source-id: 74161db67f599996a0614981edcc171af6780d21
2019-11-19 09:25:50 -08:00
a3494bd56b CPU-Strided-Complex Fixes for real and imag ops (#29840)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

- [x]  Replaced std:real(a) with a.real() in kernel level code.
- [x]  Fixed Vec256_base implementation of complex ops so that it works correctly on Non-AVX devices.
- [x]  Fix NumericUtils.h

cc: iotamudelta, ezyang, bddppq, zasdfgbnm
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29840

Differential Revision: D18531274

Pulled By: ezyang

fbshipit-source-id: 0fa842c68e4bd55134fe0271880e2d15fe692b7f
2019-11-19 09:21:44 -08:00
7d287688eb Revert D5689636: Add RpcAgentTestFixture to extract duplicate code
Test Plan: revert-hammer

Differential Revision:
D5689636

Original commit changeset: f35eea1359ad

fbshipit-source-id: 31928fce5e96b3beceefbc9a03f54769f10b7e1a
2019-11-19 08:14:44 -08:00
1dda8186ae Revert D18549919: Add RpcAgentOptions struct type, which bundles different required arguments for different RpcAgents
Test Plan: revert-hammer

Differential Revision:
D18549919

Original commit changeset: b9f3f1a41d1f

fbshipit-source-id: 2d5e578d18c0725b59eb99a0e942fbf7fe3341ee
2019-11-19 08:14:40 -08:00
861ef05015 Remove rpc fork and dist autograd fork tests from PyTorch repo (#29827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29827

There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running.

Test Plan: unit tests

Differential Revision: D18507384

fbshipit-source-id: 9e239f13850832b4b84724828537f73512f3fca9
2019-11-19 07:02:59 -08:00
83513506c3 poll for timed out futures in process group agent (#29601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29601

Follow up from https://github.com/pytorch/pytorch/pull/28392. Adds a background thread to `ProcessGroupAgent` that polls for timed out RPCs at a pre-set interval, and marks them as completed with a timeout exception if they have timed out. Also deletes the futures from the corresponding maps `futures_` and `futureTimeouts`. Unit tests are added to ensure that timed out RPCs are appropriately cleaned up.

Also adds a `shutdown` variable to process group agent to control the shutting down of this background thread, which can eventually be extended to use for controlling a clean shutdown of process group agent.
ghstack-source-id: 94175131

Test Plan: Added unit tests

Differential Revision: D18434215

fbshipit-source-id: c48abdb8759fe1447200ec66bb9d4b1c50ec4535
2019-11-19 06:42:04 -08:00
21dc1d4543 Add RpcAgentOptions struct type, which bundles different required arguments for different RpcAgents (#29972)
Summary:
https://github.com/pytorch/pytorch/pull/28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in https://github.com/pytorch/pytorch/issues/29031.

To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields.

closes https://github.com/pytorch/pytorch/issues/29031
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29972

Differential Revision: D18549919

Pulled By: xush6528

fbshipit-source-id: b9f3f1a41d1ff18498734081870820b055d56f5b
2019-11-19 01:00:08 -08:00
82b6300fea Disable openmp in static and dynamic histograms (#30072)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30072

Fix the test failure with mode/opt-lto by disabling openmp in both static and dynamic histograms. We will just use single thread in histogram processing as it's the common use case.

Test Plan:
```
buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger/workflows:int8_static_quantization_exporter -- --model-dir /mnt/public/summerdeng/ads/ --model-name downsized_ins_97293388_0.predictor --run --iter 10  --dataset-path /mnt/public/summerdeng/ads/ctr_instagram_story_int8/dataset/train/dataset_115764229_10 --hive-path="hive://ad_delivery/ig_ad_prefiltered_training_data_orc_injected/ds=2019-09-09/pipeline=ctr_instagram_story_click_only_model_opt_out_df" --collect-histogram --activation-histogram-file=/mnt/public/summerdeng/ads/ctr_instagram_story_int8/activation_histograms/dummy_debug_OOM.txt
```
```
buck test mode/opt-lto caffe2/caffe2/quantization/server:dynamic_histogram_test -- --run-disabled
```

Reviewed By: hx89

Differential Revision: D18554614

fbshipit-source-id: cfff51174154e753b7123b4ec502b88ffc508917
2019-11-19 00:32:46 -08:00
a9ad2e2f00 fix batch norm for empty inputs (#30035)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/29578
Shape check is moved up as much as possible, because backends by and large don't correctly handle empty inputs, so check needs to be done before backend selection. That also automatically takes care of backward, because forward for empty input is automatically differentiable, so no backend-specific backward routines are ever called.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30035

Test Plan: tests for empty inputs are added.

Differential Revision: D18584427

Pulled By: ngimel

fbshipit-source-id: a42918f50eb1f6995921aafa92879cd42dd5e9e1
2019-11-18 23:08:12 -08:00
c272758b43 Mobile module forward() pass input by value. (#30060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30060

Mobile forward() passed inputs by reference, which is different from JIT's script::module. To make it consistent, change it pass by value.

Test Plan: Imported from OSS

Differential Revision: D18587786

Pulled By: iseeyuan

fbshipit-source-id: fa398124fd0a5168f708733ff88f0ba327726f43
2019-11-18 22:33:38 -08:00
267fd4a06c Fix for batch norm 2D with affine=False (#29458)
Summary:
This is a fix for batch norm 2D with affine=False.
Repro: https://github.com/pytorch/pytorch/issues/29271
Error is because the output of the unsqueeze op does not have scalar type information. So I moved the references to scalar type after the unsqueeze line.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29458

Reviewed By: hl475

Differential Revision: D18400975

Pulled By: houseroad

fbshipit-source-id: f5c5633857c584edcef3b9e9946861dcfccccd75
2019-11-18 21:52:11 -08:00
a4f60b64dc explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29391

Test Plan: Imported from OSS

Differential Revision: D18429726

Pulled By: VitalyFedyunin

fbshipit-source-id: 07dfff568ad776cf792122913530566d53be55fa
2019-11-18 21:47:52 -08:00
2dba553990 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29390

Test Plan: Imported from OSS

Differential Revision: D18429722

Pulled By: VitalyFedyunin

fbshipit-source-id: e5f40da1550b4316e9c4725adbdf557c832b7563
2019-11-18 21:47:47 -08:00
3045b2a366 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29389

Test Plan: Imported from OSS

Differential Revision: D18429731

Pulled By: VitalyFedyunin

fbshipit-source-id: 99ee8ae11fbaf05c91903d7df7622c90369ce7ce
2019-11-18 21:47:43 -08:00
735517fa87 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29388

Test Plan: Imported from OSS

Differential Revision: D18429725

Pulled By: VitalyFedyunin

fbshipit-source-id: 6b7662874e229e6fb0d4bbcf32ec15fc824d6118
2019-11-18 21:47:39 -08:00
5b15f32697 rename benchmark_all_other_test (#30048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30048

as title

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_other_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 142.032
...

Reviewed By: hl475

Differential Revision: D18580754

fbshipit-source-id: 125482d2987cbdb1d019ccedf56a9da5a7cebaba
2019-11-18 21:39:31 -08:00
97156f548d Add hash and equality operators for WorkerInfo (#29958)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29958

DistributedOptimizer relies on hashing WorkerInfo in order to coalesce fan-out RPCs. This will likely be a very common use case (EASGD will do the same, for example).
ghstack-source-id: 94169198

Test Plan: unit test.

Differential Revision: D18548257

fbshipit-source-id: 7d67d4e1b9bc60403c372164982a75ae8c1d8389
2019-11-18 20:47:13 -08:00
8b9bac1fad add operator-range argument to the op bench (#30051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051

This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped.

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range b-c
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ceil
# Mode: Eager
# Name: ceil_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 110.720

# Benchmarking PyTorch: ceil_
# Mode: Eager
# Name: ceil__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 51.128
...

buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iterations 1 --operator_range None
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 107.113

# Benchmarking PyTorch: abs_
# Mode: Eager
# Name: abs__M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 54.259
...

Reviewed By: hl475

Differential Revision: D18581910

fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4
2019-11-18 20:34:43 -08:00
64706e0a74 change conv, batchnorm input shapes (#30041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30041

as title

(Note: this ignores all push blocking failures!)

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 751635.354

Reviewed By: hl475

Differential Revision: D18579767

fbshipit-source-id: 53bfac704828a836412434a66000c17f6ac1c727
2019-11-18 20:34:28 -08:00
3250d5008f change the starting iters to reduce execution time (#30040)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30040

The benchmark will run each test in a loop of 200 iters, then keep doubling the number of iters until the time is significant. For operators which have very large input shapes, the initial 200 iters will take too much time which is not really necessary. This diff changed that 200 to 100.

(Note: this ignores all push blocking failures!)

Test Plan:
```
Before
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 729634.577

After
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : None

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu
# Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu
Forward Execution Time (us) : 718315.899

Reviewed By: hl475

Differential Revision: D18579588

fbshipit-source-id: ef52474cf77e7549bbab0a9ae7b1b0c04023d208
2019-11-18 20:34:16 -08:00
3bd0f476d4 Revert D18233037: C++ API parity: isfinite
Test Plan: revert-hammer

Differential Revision:
D18233037

Original commit changeset: c76b9467bbc1

fbshipit-source-id: 97d2cfa9de767a8c3a0ca919f9d768e959fa484e
2019-11-18 20:26:19 -08:00
63f4b607aa Ensure initializedContextIds_ map is cleaned up appropriately in DistEngine. (#29787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29787

The initializedContextIds_ map was never cleaned up in DistEngine and
kept on growing as we continue to run backward passes. To fix this, in this PR
we ensure that the context id is cleaned up from this map once we are done with
the backward pass.

Closes #29083
ghstack-source-id: 94161770

Test Plan: waitforbuildbot

Differential Revision: D18498937

fbshipit-source-id: 8d31fc066f6994627766f2b6ca36efa1bef89840
2019-11-18 20:11:18 -08:00
26dabad5a4 Add LiteModule java class for lite interpreter. (#30061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30061
Create INativePeer Interface and move NativePeer class from Module.java. Create LiteModuleLoader and LiteNativePeer.java for Lite Interpreter binding.
ghstack-source-id: 94169187

Reviewed By: dreiss

Differential Revision: D18511688

fbshipit-source-id: 1a69c94b28c8a02631f53079ca7ddcaa57eca38f
2019-11-18 19:53:20 -08:00
a1fc46d2b5 Updating submodules
Summary:
GitHub commits:

385acc503c
b35b183e45

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: c7eccd88c804f1afd1db8d52221665b87ab51837
2019-11-18 19:09:52 -08:00
8df5e10ee9 C++ API parity: isfinite
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28918

Test Plan: Imported from OSS

Differential Revision: D18233037

Pulled By: pbelevich

fbshipit-source-id: c76b9467bbc1fbb2c9bf49855895c98438b36c12
2019-11-18 19:06:57 -08:00
5d69bc1eda Add docs for distributed optimizer. (#29971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29971

ghstack-source-id: 94132160

Test Plan: waitforbuildbot

Differential Revision: D18554631

fbshipit-source-id: c4485f7cff5159f423d0f35d1caf71074b62dc28
2019-11-18 18:51:26 -08:00
4f94aed8a3 Reformatting module class. (#29957)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29957

Reformatting module class.
ghstack-source-id: 94058645

Test Plan: buck build xplat/caffe2/android:pytorch

Reviewed By: iseeyuan

Differential Revision: D18548185

fbshipit-source-id: 8c1f5cbf491d42915e091e6245b4f308eb162f93
2019-11-18 18:39:29 -08:00
ab93b3df60 Polish distributed autograd docs. (#29942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29942

1) Added links to the design.
2) Fixed function signautres.
3) Expanded examples
ghstack-source-id: 94162372

Test Plan: waitforbuildbot

Differential Revision: D18547103

fbshipit-source-id: 067ba166c107ed14085af8ee3306d3f8a9dcebe7
2019-11-18 18:13:08 -08:00
df6a1c0437 Remove rpc.sync_rpc from the public API. (#30033)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30033

Removing this API for now since we don't have a concrete use-case for
this yet and as a result exposing this as a public API might result in users
depending on this API.

We can always add some variant of this API back if needed later.
ghstack-source-id: 94138302

Test Plan: waitforbuildbot

Differential Revision: D18578056

fbshipit-source-id: 078c62331725e03bd5702624afc16b1cdcdf26a4
2019-11-18 18:02:07 -08:00
905792af1f disabling persistent mode for cuDNN BN on NCHW (#30031)
Summary:
This is to help the bisecting for unstable convergence that https://github.com/pytorch/pytorch/issues/29997 targets, comparing to the other PR, this one is a smaller hammer (few lines of code change) and would facilitate our future repro/fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30031

Differential Revision: D18577624

Pulled By: VitalyFedyunin

fbshipit-source-id: 92a76cf5db24b25105395f80086d90d8e51dcc4b
2019-11-18 17:28:27 -08:00
9c7e604c60 SyncBatchNorm Update on input dimension checks (#29626)
Summary:
update the requirements on input dimensions for `torch.nn.SyncBatchNorm`:
1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ;
2. requires at least two element along normalization plane (BatchNorm behavior);
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626

Differential Revision: D18492531

Pulled By: albanD

fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e
2019-11-18 16:09:51 -08:00
5b6dd52e3c Build Unit Test of SparseRAdam
Summary: We added caffe2 python wrapper and unit test for the SparseRAdam C++ operator.

Test Plan:
Unit test is constructed following the design pattern of [Wngrad optimizer](https://our.intern.facebook.com/intern/diff/D8655724/). Test passed smoothly.
buck test //caffe2/caffe2/python:optimizer_test -- TestSparseRAdam

Test result:
{F221144048}

Reviewed By: wx1988

Differential Revision: D18330650

fbshipit-source-id: e0f4724c2b616b665e2a0fe2e5c3430696cca7ee
2019-11-18 15:22:37 -08:00
64cdc648da fix submodule traversal in FoldPrepackedWeightIntoModule (#29925)
Summary:
similar to https://github.com/pytorch/pytorch/pull/29914
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29925

Differential Revision: D18548029

Pulled By: jerryzh168

fbshipit-source-id: 7b36133454c5190be19380bf125203807ea0b129
2019-11-18 13:34:45 -08:00
b4f33c1c21 Updating submodules
Summary:
GitHub commits:

3acb25f216
b830bffa96
fc7064cb4e
aa3975852e
b2a3d8944d

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 34571d2a94a8fd93d744dc58a0ba7681f3fdc6b2
2019-11-18 13:08:32 -08:00
8dd67057f1 Add RpcAgentTestFixture to extract duplicate code (#29747)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29747

There are duplicate code for component that rely on RpcAgent. Extract them into a re-usable test fixture class.

Test Plan:
### RPC + RRef

```
buck test mode/dev-nosan //caffe2/test:rpc_fork

buck test mode/dev-nosan //caffe2/test:rpc_spawn
```

```
buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift

buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift
```

### Dist Autograd

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork

buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn
```

```
buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift

buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn_thrift
```

### Dist Optimizer

```
buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork

buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn
```

```
buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork_thrift

buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn_thrift
```

Differential Revision: D5689636

fbshipit-source-id: f35eea1359addaaac9bd8d00d0a5df228a236511
2019-11-18 12:54:17 -08:00
6d6380fd4e Update CODEOWNERS for distributed and rpc modules
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29988

Test Plan: Imported from OSS

Differential Revision: D18576548

Pulled By: mrshenli

fbshipit-source-id: 1170b6970727c9698b6fdbf0c40fc317d17ea8ea
2019-11-18 12:45:52 -08:00
adfb8a4888 Fix bug in atomicAdd for int16_t (#29231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29231

Fixes: https://github.com/pytorch/pytorch/issues/29153

Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros.

Incorrect behavior before this PR:

```
In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 )

In [24]: sparse
Out[24]:
tensor(indices=tensor([[0, 0],
                       [1, 1]]),
       values=tensor([5, 6]),
       device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16,
       layout=torch.sparse_coo)

In [25]: sparse.coalesce()
Out[25]:
tensor(indices=tensor([[0],
                       [1]]),
       values=tensor([11]),
       device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16,
       layout=torch.sparse_coo)

In [26]: sparse.to_dense()
Out[26]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)

In [27]: sparse.coalesce().to_dense()
Out[27]:
tensor([[ 0, 11],
        [ 0,  0]], device='cuda:0', dtype=torch.int16)

In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse)
Out[30]:
tensor([[0, 0],
        [0, 0]], device='cuda:0', dtype=torch.int16)
```

Test Plan: Imported from OSS

Differential Revision: D18575666

Pulled By: nairbv

fbshipit-source-id: 9b193b386bf4a9615014aa890d2e9f4f694940ac
2019-11-18 12:42:02 -08:00
45e980a243 Skip broken test test_cuda_kernel_loop_overflow_large (#30021)
Summary:
The previous "expectedFailure" decoration has broken ROCm CI

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/7674//console

```
16:23:52 test_cuda_kernel_loop_overflow_large (__main__.TestCuda) ... unexpected success

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30021

Differential Revision: D18574931

Pulled By: bddppq

fbshipit-source-id: 7b5240f9f3a610adda633f8b0dd9137e40b12e2f
2019-11-18 12:38:37 -08:00
189b24ebe9 reorganize test binaries of op bench (#30023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/30023

This diff doesn't change how users run the benchmarks. But under the hood, we group all the tests into three groups: unary test, quantized test, and the rest ops (we name it others here).

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 17914.301
...
# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 66525.855
...
# Benchmarking PyTorch: mul
# Mode: Eager
# Name: mul_N2_dtypetorch.qint32_contigTrue
# Input: N: 2, dtype: torch.qint32, contig: True
Forward Execution Time (us) : 290.555
...

Reviewed By: hl475

Differential Revision: D18574719

fbshipit-source-id: f7ff1d952031129adde51ebf002e4891bd484680
2019-11-18 12:21:26 -08:00
91c6d2e51c Add support for quantized operator conversion from PT to C2 via ONNX (#29694)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29694

This PR adds preliminary support required to be able to run quantized pytorch models on a C2 backend.
For quantized ops we use a custom domain name 'caffe2' to register the ops if they are in the "quantized" namespace.
The change also adds JIT pass to unpack the quantized weights and insert the unpacked values into the graph.
The actual tensor values are looked up from the params dict.

Test Plan:
python test/onnx/test_pytorch_onnx_caffe2.py TestQuantizedOps

Imported from OSS

Reviewed By: houseroad

Differential Revision: D18467130

fbshipit-source-id: 53ebd8c43935f7d7e74305dad6c231a2247df176
2019-11-18 12:12:40 -08:00
b45069b59f fix fc fp16 quantization (#29469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29469

The original approach is to save both fp16 and fp32 for all models, which increased the filesize and memory.

This diff is to save 'used' blobs into predictor file.

Test Plan:
fc clone workflow :
f149878151

ctr mbl feed test with fc fp16 quantization:
f149996395

No fp32 in local file
{F221750392}

QRT after the fix:
https://fburl.com/qrt/cp8r8263

Reviewed By: wx1988

Differential Revision: D18382503

fbshipit-source-id: 231c41668f25b1d35ca8d4358ce9b12ba60a4f91
2019-11-18 11:26:49 -08:00
a3ee504c33 Integrate RAdam to SparseAdamOp
Summary:
T53944549 aims to integrate [`RAdam`](https://arxiv.org/pdf/1908.03265.pdf) optimizer to `Adam`. In this diff, we first try to integrate `RAdam` to `SparseAdamOp` on CPU platform.

Note that `adam_op.cc` and `adam_op_gpu.cu` may be implemented in other diffs.

The implementation of `RAdam` follows the algorithm below:
 {F220259279}

The algorithm of [`Adam`](https://arxiv.org/pdf/1412.6980.pdf) is attached:
{F220389971}

Test Plan: Run `buck build caffe2` successfully.

Reviewed By: wx1988

Differential Revision: D18239578

fbshipit-source-id: fdc028261ee20986cae1f30f1d26d8705587331a
2019-11-18 10:20:01 -08:00
82682b3e96 Revert D18531481: Remove input_channels / output_channels / with_bias from ConvOptions
Test Plan: revert-hammer

Differential Revision:
D18531481

Original commit changeset: e48d9e8cf110

fbshipit-source-id: a233425cc10278552674c48b6b577ef53fca0632
2019-11-18 09:10:54 -08:00
f6cadad174 Delete redefinitions of methods in Variable already present on Tensor. (#29667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29667

Some previous implementations are defined in native_functions.yaml.
In this case, I don't define them explicitly in Tensor; instead
they are placed in VariableTypeManual.cpp. When I did this, I would
have deleted documentation; instead, this documentation was moved
to native_functions.yaml

This also replaces `current_version` with just `_version`.

This is a carved out portion of #28287, rebased past Tensor-Variable
merge.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18504934

Pulled By: ezyang

fbshipit-source-id: be7adf45b637daffe2b0b1631eb31d967525fc31
2019-11-18 08:12:16 -08:00
1ab2f043ba Move most methods off Variable into torch::autograd::impl functions. (#29665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29665

Our intention is to merge the static distinction between Tensor and
Variable.  Ordinarily, this would entail merging the methods of Tensor
and Variable.  But there are a lot of "private"-ish methods on Variable
that we don't actually want to dump onto the Tensor class.  So, as prep
work, we move all of those methods off of Variable and into
the torch::autograd::impl namespace (impl as in, please don't use this
end users).  This ends up being a fairly large patch because all of
the call sites have to play ball too.

While I was on the topic, I also moved any of the touched functions into
the C++ file, so that modifying them would not trigger a recompilation of
all of torch.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18496169

Pulled By: ezyang

fbshipit-source-id: afb203252620ec274be596b3e7b1d84d321bad3a
2019-11-18 08:12:12 -08:00
38340f59fd randint accept generator=None (#29748)
Summary:
This PR fixes the inconsistent behavior of `randint`'s `generator=` kwarg. It does not accept `None`, which is inconsistent with how other random functions behave:
```
In [12]: torch.randint(0, 4, size=(2,3), generator=torch.Generator())
Out[12]:
tensor([[2, 0, 1],
        [0, 1, 3]])

In [13]: torch.randint(0, 4, size=(2,3), generator=None)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-a6bc6525a1e1> in <module>
----> 1 torch.randint(0, 4, size=(2,3), generator=None)

TypeError: randint() received an invalid combination of arguments - got (int, int, generator=NoneType, size=tuple), but expected one of:
 * (int high, tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
 * (int high, tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
 * (int low, int high, tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
 * (int low, int high, tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad)
```

Other random functions work fine:
```
In [9]: torch.bernoulli(torch.ones(3))
Out[9]: tensor([1., 1., 1.])

In [10]: torch.bernoulli(torch.ones(3), generator=None)
Out[10]: tensor([1., 1., 1.])
```

This PR also documents the `generator=` kwarg, and fixes https://github.com/pytorch/pytorch/issues/29683 since it's a related easy fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29748

Differential Revision: D18529951

Pulled By: ezyang

fbshipit-source-id: e956cc989decc94e9483fd4a30f9255240d7c07e
2019-11-18 08:07:29 -08:00
94016b153a Fix typo in documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29755

Differential Revision: D18529963

Pulled By: ezyang

fbshipit-source-id: 8d9100f00c46238fa3210944864b1d178717499f
2019-11-18 07:44:12 -08:00
a573f8f7d7 Disable broken test_cuda_kernel_loop_overflow_large test (#29904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29904

See https://github.com/pytorch/pytorch/issues/26838

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18539740

Pulled By: ezyang

fbshipit-source-id: c3dcaaa0d8eedcfa4173c2b6ec139090bdace4b4
2019-11-18 07:38:34 -08:00
7782f4bc50 Updating submodules
Summary:
GitHub commits:

ea4aa9fc07
54e6aa5568
da41ae5048
da70fce0d3
0bec77c2d2
09fd20898f
b47c7f5c77
5762809397
241c174631

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 1739f00a0f1e4ffe4b5ebb9e6f5dce403a5adf8b
2019-11-18 07:09:35 -08:00
0e5200adfe Refactor target_compile_options into torch_compile_options (#29730)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29730

Back in the day, Caffe2 had a good idea: instead of spattering
target_compile_options all over the codebase, define a helper
function which sets all the options for a target.  This is especially
helpful if I want to split libtorch.so into libtorch_cpu.so
and libtorch_cuda.so; I need a way to easily apply options
to multiple targets.  A shared helper function is just the ticket.

I moved every target_compile_options call in caffe2/CMakeLists.txt
that didn't seem target dependent (exclusions included OpenMP flags,
API-related macros, ONNX related macros and HIP flags) into
torch_compile_options.  I slavishly preserved the structure:
there's a nearly redundant WERROR if() in the output but I preserved
it.

There is one thing I don't like about this, which is that now
the compile options are off in a random directory that no one would
expect.  But c'est la vie...

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18571166

Pulled By: ezyang

fbshipit-source-id: 21cd5f7663485077600782078fbb1787fab09035
2019-11-18 07:05:48 -08:00
1381301d46 Remove AT_LINK_STYLE entirely. (#29729)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29729

It already errored when you built with CUDA/HIP support as no longer supported;
now I expunge it entirely.  Along the way, I delete useless INTERFACE
libraries (which aren't used anywhere else in the cmake.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18571167

Pulled By: ezyang

fbshipit-source-id: f88c73a16fad3b61eaa7745a2d15514c68704bec
2019-11-18 07:05:43 -08:00
639133d6d1 rename init_model_parallel to init_rpc (#29762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762

Rename this API as discussed, since it's use cases extend beyond only
model parallelism.
ghstack-source-id: 94020627

Test Plan: Unit tests pass

Differential Revision: D18491743

fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57
2019-11-18 06:07:44 -08:00
5f510374e7 Add torch.memory_format support to the TorchScript
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28544

Test Plan: Imported from OSS

Differential Revision: D18093801

Pulled By: VitalyFedyunin

fbshipit-source-id: 2c82a1508da50a24825b44939434d86546cf1e19
2019-11-18 05:35:49 -08:00
cb43170dcb Add memory format support to the resize_ op. (#28292)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28292

Allows to simplify patterns like:

1.
	output.resize_({sizeB, sizeC, osizeH, osizeW}).as_strided_({sizeB, sizeC, osizeH, osizeW}, {sizeC*osizeH*osizeW, 1, osizeW*sizeC, sizeC});

2.
	output.resize_({nbatch, nInputPlane, outputHeight, outputWidth});
	indices.resize_({nbatch, nInputPlane, outputHeight, outputWidth});
	output.unsafeGetTensorImpl()->empty_tensor_restride(memory_format);
	indices.unsafeGetTensorImpl()->empty_tensor_restride(memory_format);

3.
	gradInput.resize_as_(input);
  	gradInput.unsafeGetTensorImpl()->empty_tensor_restride(memory_format);

Test Plan: Imported from OSS

Differential Revision: D18044978

Pulled By: VitalyFedyunin

fbshipit-source-id: bbf67c25f9cf88bc6e949089a3b247df50f86dc4
2019-11-18 05:35:44 -08:00
a7df36964c TensroIterator preserve format for binary, ternary operators.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28291

Test Plan: Imported from OSS

Differential Revision: D18044977

Pulled By: VitalyFedyunin

fbshipit-source-id: 793bab47d8cfc1b0d6229f1b0688352ee94c3e48
2019-11-18 05:35:40 -08:00
b80c4f60fb Add channels last support to cuda.comm.scatter and gather
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28077

Test Plan: Imported from OSS

Differential Revision: D17980305

Pulled By: VitalyFedyunin

fbshipit-source-id: e4741194baac3d93f2d53724582dc4c38f82ee84
2019-11-18 05:35:35 -08:00
026a2a4ec4 Kill operator== of TensorOptions as confusing one
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28076

Test Plan: Imported from OSS

Differential Revision: D17980306

Pulled By: VitalyFedyunin

fbshipit-source-id: 2f206d5069ce0bd828d4e96f2e98cf2baa1dfec7
2019-11-18 05:35:29 -08:00
9f3b347874 Add memory format support to resize_as_ operator (#27979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27979

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980311

Pulled By: VitalyFedyunin

fbshipit-source-id: 12d013521091fcc9c045833577f6dc78d7b1e68f
2019-11-18 05:35:23 -08:00
a3588b6ed9 Updating submodules
Summary:
GitHub commits:

62c3b48cf4

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 41d1346f2405bce84984b02e3a951bb0e30868b7
2019-11-18 05:35:17 -08:00
bb217eee98 Updating submodules
Summary:
GitHub commits:

4624a94bf7

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 128a7f6b1e3207bcea19925e1709b0ecc0c957ab
2019-11-17 23:13:14 -08:00
18bdf97dbb Factor Module into Object and Module
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29500

Test Plan: Imported from OSS

Differential Revision: D18463064

Pulled By: jamesr66a

fbshipit-source-id: d37bef242a8626593d4b8754042152cfc0f0acb2
2019-11-17 22:58:50 -08:00
14946a8891 Updating submodules
Summary:
GitHub commits:

eeb38ffd62
f27f096824
d5c51096af
76432027c0
e6135854c5
83800eae9a
5a5b563db5

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 1eab55cd73b143acedfad7bf6fcad44b8a2cc12e
2019-11-17 18:38:06 -08:00
6bf87dae90 Updating submodules
Summary:
GitHub commits:

ead3bceee0

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: cefa462dc00d8e9d43474689042cc5043c99644f
2019-11-17 18:38:01 -08:00
2b5213d94c Updating submodules
Summary:
GitHub commits:

f163b30ade

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 5350123e471a38585893e75adffbdedd05f72167
2019-11-17 02:20:28 -08:00
b011461c9f Add missing operators for pytext, v2 (#29970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29970

Add operators and JMP instruction used in PyText model in lite interpreter.

Test Plan: Imported from OSS

Differential Revision: D18555483

fbshipit-source-id: e5124d908762f78fb548505aecf33be8c8503275
2019-11-16 23:59:12 -08:00
6980cb2519 Add overload name to JIT prim operators, version 2 (#29960)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960

Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators.

Test Plan: Imported from OSS

Differential Revision: D18555484

fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34
2019-11-16 23:59:07 -08:00
689b4bea7b torch::nn::GLU and F::glu (#29922)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29922

* #29920 [C++ API] torch::nn::GroupNorm and F::group_norm

Test Plan: Imported from OSS

Differential Revision: D18558818

Pulled By: yf225

fbshipit-source-id: ff80d634309fcb55f53db8dcf86eb9cf8161b37e
2019-11-16 21:03:38 -08:00
d5bf51b684 torch::nn::GroupNorm and F::group_norm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29920

Test Plan: Imported from OSS

Differential Revision: D18539314

Pulled By: yf225

fbshipit-source-id: dabbbaac31796fe7bfde02487737971bde699c1c
2019-11-16 19:22:11 -08:00
93c5d79953 Updating submodules
Summary:
GitHub commits:

56fc7ed20e

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 6d48ebb6f5b631a7b3e5bcd96fa2fa92ca6c1ba5
2019-11-16 13:23:45 -08:00
30d37e82db Revert D18521937: Enable full error message for mobile builds
Test Plan: revert-hammer

Differential Revision:
D18521937

Original commit changeset: 99673b60a03d

fbshipit-source-id: 1946982201e4a21015bc9cd8abaa64a68ff8774f
2019-11-16 12:20:27 -08:00
e1d13f4f8b C++ API parity: NLLLoss & CrossEntropyLoss (#29812)
Summary:
Hi yf225 , I have added **NLLLoss and CrossEntropyLoss.**
```

Also, while using log_softmax in cross_entropy_loss, I am getting an error
../caffe2/../torch/csrc/api/include/torch/nn/functional/loss.h:537:63: error: no matching function for call to  log_softmax(const at::Tensor&)’
     const Tensor& log_softmax_input = torch::log_softmax(input);

aten/src/ATen/Functions.h:5551:22: note: candidate: at::Tensor at::log_softmax(const at::Tensor&, int64_t, c10::optional<c10::ScalarType>)
 static inline Tensor log_softmax(const Tensor & self, int64_t dim, c10::optional<ScalarType> dtype) {
                      ^~~~~~~~~~~
aten/src/ATen/Functions.h:5551:22: note:   candidate expects 3 arguments, 1 provided
```

I think the other two parameters should be optional as in python frontend(shown in documentation here at https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.log_softmax ). Rest, there were no errors in build and tests have passed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29812

Differential Revision: D18548249

Pulled By: yf225

fbshipit-source-id: 2ab350abd2a6f498d4dba2345f51ad87471f3038
2019-11-16 10:49:09 -08:00
890a3f8b8d Remove input_channels / output_channels / with_bias from ConvOptions (#29838)
Summary:
Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838

Differential Revision: D18531481

Pulled By: yf225

fbshipit-source-id: e48d9e8cf110095f83d9ed18b9fec020ec725f3e
2019-11-16 10:46:50 -08:00
0995929971 Improve legacy QuantizedLinear functions to reduce overhead (#29773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29773

Improve legacy QuantizedLinear functions to reduce overhead.
Separate from the stack of D18381988.

Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- "quant"

Reviewed By: lly-zero-one

Differential Revision: D18494988

fbshipit-source-id: 5627d7e8b0b7a750852eead9e28c5a9b3fa70559
2019-11-16 08:25:11 -08:00
66bd0ed940 Updating submodules
Summary:
GitHub commits:

207328497a
c272123098
cdcd46de4e
1c093d3fa7
e18b3c2e6e
746161a422

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 253c3a9d70da0cbaf34dc38414966ccccf40533c
2019-11-16 06:26:50 -08:00
649e7f057e fix comment index_size->output_size (#29831)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29831

As title. Thanks Aleks Zi for finding this!

Test Plan: Just changing comments

Reviewed By: zlateski

Differential Revision: D18511259

fbshipit-source-id: 5f1ad9ba53db9b22622a556ec214ced361ec016a
2019-11-16 01:49:02 -08:00
58ee61176c SeqBlobReader Implementation (#29888)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29888

Extract some common functions out of class LoadOp.

Reviewed By: yinghai, ipiszy

Differential Revision: D18456785

fbshipit-source-id: d0b8e86ad5709c35f1dc3821376000db1114dc95
2019-11-16 01:18:54 -08:00
455b5c1a7d minor updates to rpc docs (#29857)
Summary:
Small fixes to rpc docs:
- mark as experimental and subject to change
- Reference the distributed autograd design document in pytorch notes page.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29857

Differential Revision: D18526252

Pulled By: rohan-varma

fbshipit-source-id: e09757fa60a9f8fe9c76a868a418a1cd1c300eae
2019-11-15 22:28:08 -08:00
4da509090e Disables TestNN.test_CTCLoss_1d_target (#29841)
Summary:
A variant of this test is flaky in CI. See https://github.com/pytorch/pytorch/issues/29380.

This disables the entire test until a fix is determined.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29841

Differential Revision: D18531542

Pulled By: mruberry

fbshipit-source-id: 3b033e3a7d55418cf459e7664d856d6dd4c98aa5
2019-11-15 22:03:04 -08:00
eb29276623 Update distributed autograd design doc with appropriate links. (#29927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927

With the docs page now up, we can update the links in the design doc
to point to the docs page.
ghstack-source-id: 94055423

Test Plan: waitforbuildbot

Differential Revision: D18541878

fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520
2019-11-15 21:10:53 -08:00
4553d5e69b Fix submodule traversal in insertPackUnpack pass. (#29914)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29914

Currently we're visiting all submodules every time we're visiting a
method of a module.

Test Plan: Imported from OSS

Differential Revision: D18534602

Pulled By: ZolotukhinM

fbshipit-source-id: 38c5b0ab0bdd27599fd0a6af0eaa3603c68a97a8
2019-11-15 20:43:43 -08:00
27afac2134 C++ API parity: Dropout, Dropout2d, Dropout3d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29761

Test Plan: Imported from OSS

Differential Revision: D18530820

Pulled By: pbelevich

fbshipit-source-id: 9d351561692f7de099d7c6aaf2ecb930b5c867e9
2019-11-15 20:32:06 -08:00
fbabf72829 Add ONNX support for Logdet (#29767)
Summary:
Exported as combination of ONNX::Log and ONNX::Det.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29767

Reviewed By: hl475

Differential Revision: D18499762

Pulled By: houseroad

fbshipit-source-id: e6f2298635a995f01b2913d8958b5e1ca9d04058
2019-11-15 20:27:43 -08:00
b730d04ed2 Fix deadlock issues in ThreadPool (#29885)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885

### Summary

Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization.

### Test Plan

1. Generate a Resnet50 model using trace_model.py
2. Run `ios/TestApp/bootstrap.sh` to do the benchmark

cc shoumikhin AshkanAliabadi

Test Plan: Imported from OSS

Differential Revision: D18533505

Pulled By: xta0

fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de
2019-11-15 19:27:52 -08:00
0a33c3f1a1 split module interface tests (#29917)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29917

move test_module_interface to its own file, no code logic change

Test Plan: Imported from OSS

Differential Revision: D18543235

fbshipit-source-id: ab5e233061ba45cb0c05cafdd289b859036c207c
2019-11-15 19:09:36 -08:00
a5b4d78c6d Revert D18499600: Add overload name to JIT prim operators.
Test Plan: revert-hammer

Differential Revision:
D18499600

Original commit changeset: a1b49e64c908

fbshipit-source-id: 73e27b72f53799c0133850d2352ae8cd8a82d87c
2019-11-15 18:36:17 -08:00
2a442f5dca Revert D18499601: Add missing operators for PyText model.
Test Plan: revert-hammer

Differential Revision:
D18499601

Original commit changeset: 8a38d3d809ee

fbshipit-source-id: 4f28f291bd7020f1fc9fc313bc766b5dbf5b1b90
2019-11-15 18:36:11 -08:00
c543034531 add cuda sync when ops running on gpu (#29936)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29936

This diff adds synchronization after op execution to ensure all the cuda streams complete.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 154.412

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 101.115
...

Reviewed By: hl475

Differential Revision: D18542732

fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c80
2019-11-15 18:02:48 -08:00
f1860aea83 fix missing lock in profiling graph compilation (#29886)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29886

Fixes https://github.com/pytorch/pytorch/issues/29764

Test Plan: Imported from OSS

Differential Revision: D18523903

Pulled By: zdevito

fbshipit-source-id: 4e2b04102ee9f6312e4a7b48536392454e6c1b79
2019-11-15 17:51:46 -08:00
5cad7d42ef Enable full error message for mobile builds (#29926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29926

add a macro to enable full error message for mobile

Test Plan: buck build -c project.ignore= //xplat/experimental/pytorch/predictor:predictorAndroid#android-armv7

Reviewed By: dreiss

Differential Revision: D18521937

fbshipit-source-id: 99673b60a03da249236dc916bab3dff88d24bc25
2019-11-15 17:48:47 -08:00
c300f086a4 Turn off scalar_check for diag.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29877

Test Plan: Imported from OSS

Differential Revision: D18521734

Pulled By: gchanan

fbshipit-source-id: 646cc0bca5082a808deca3f5d6646bc6bf180484
2019-11-15 17:17:13 -08:00
a6a31c6dc2 Turn off scalar_check for _th_max, _th_min.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29876

Test Plan: Imported from OSS

Differential Revision: D18521737

Pulled By: gchanan

fbshipit-source-id: aeae6959c778eb6d935bcdb8bcf664a7c2404090
2019-11-15 17:17:08 -08:00
6c7a0c68f9 Turn off scalar_check for lstsq (gels), and test scalars for eig.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29875

Test Plan: Imported from OSS

Differential Revision: D18521740

Pulled By: gchanan

fbshipit-source-id: 98133aadaaa2f2010462517a2704395dad95817b
2019-11-15 17:17:04 -08:00
79f0636718 Turn off scalar_check for sort. (#29874)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29874

It's handled correctly by the op.

Test Plan: Imported from OSS

Differential Revision: D18521744

Pulled By: gchanan

fbshipit-source-id: 0577670bebaec98e6549ad270ff0ebd3ed908231
2019-11-15 17:17:00 -08:00
ee5201cd7c Fix memory leak in CUDA renorm, turn off scalar_check for renorm. (#29873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29873

Renorm requires at least 2-dimensions, so scalar_check could never succeed.

Test Plan: Imported from OSS

Differential Revision: D18521733

Pulled By: gchanan

fbshipit-source-id: 9701c750a14ce67e1bd63dd0753bd8863da42c17
2019-11-15 17:16:55 -08:00
d87655f515 Turn off scalar_checks for cumsum, cumprod.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29872

Test Plan: Imported from OSS

Differential Revision: D18521739

Pulled By: gchanan

fbshipit-source-id: 72d642bcc462e5b1317876bcae8b31f83a98467d
2019-11-15 17:16:51 -08:00
fe575b44ee Turn off scalar_check for fmod. (#29871)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29871

Generated code diff: https://gist.github.com/gchanan/cba4ac79afa00a48eaff0aabc60d17cc

Test Plan: Imported from OSS

Differential Revision: D18521736

Pulled By: gchanan

fbshipit-source-id: 364fc2aeba5315d0729a9f7f74c5e9ad64c30e45
2019-11-15 17:16:47 -08:00
98362977a0 Turn off scalar_check for remainder. (#29870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29870

Codegen diff: https://gist.github.com/gchanan/c7ceb5715e7cfa6266e948d598744131

Test Plan: Imported from OSS

Differential Revision: D18521738

Pulled By: gchanan

fbshipit-source-id: bee23d67e247d4e06fef41243f578247c4817300
2019-11-15 17:16:42 -08:00
61df98a083 Turn off scalar_checks for multinomial_alias_setup_, which requires 1d tensors. (#29869)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29869

codegen changes: https://gist.github.com/gchanan/8e1b5184581fa37b27b6e856a75b470f

Test Plan: Imported from OSS

Differential Revision: D18521741

Pulled By: gchanan

fbshipit-source-id: a2674b55214b84032e7a821e8472d7df9e8a1dcb
2019-11-15 17:16:38 -08:00
92a512b583 Stop generating maybe_zero_dim calls for "scalar_check: false" with multiple outputs. (#29868)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29868

Codegen changes: https://gist.github.com/gchanan/b0db8ec1310d7e10435c75b951e7de83

Test Plan: Imported from OSS

Differential Revision: D18521735

Pulled By: gchanan

fbshipit-source-id: bc4c437b001b754868435fb642ab60415600f0ff
2019-11-15 17:16:33 -08:00
6c39e5033c Add missing operators for PyText model.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29664

Test Plan: Imported from OSS

Differential Revision: D18499601

fbshipit-source-id: 8a38d3d809ee5ef5b73b5a5ce1db612aea680e75
2019-11-15 16:22:52 -08:00
ff4e782e79 Add overload name to JIT prim operators.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29656

Test Plan: Imported from OSS

Differential Revision: D18499600

fbshipit-source-id: a1b49e64c908d16d40a6ddb048182d7bbe80bcd6
2019-11-15 16:22:47 -08:00
3003c5f91b OPN ops TupleConstruct/Unpack and format. (#29635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29635

TupleConstruct/Unpack as OPN ops.

Test Plan: Imported from OSS

Differential Revision: D18499602

fbshipit-source-id: 389b21d3ea532ef6fa729d67ce34214d86700cd2
2019-11-15 16:22:42 -08:00
d22f61432d Update fbjni and enable PyTorch JNI build
Summary:
- Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and
  fbjni.  This is off by default because it adds a dependency on jni.h.
- Update to the latest fbjni so we can inhibit building its tests,
  because they depend on gtest.
- Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we
  can find jni.h in Docker.

Test Plan:
- Built on dev server.
- Verified that libpytorch_jni links after libtorch when both are built
  in a parallel build.

Differential Revision: D18536828

fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66
2019-11-15 13:59:44 -08:00
3f5dc95b57 fix device check in op bench (#29918)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29918

Some of the tests don't specify `device` in the input configs so filter by device won't work for them. This diff fixes that issue.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:qpool_test -- --iterations 1 --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: QAdaptiveAvgPool2dBenchmark
# Mode: Eager
# Name: QAdaptiveAvgPool2dBenchmark_N4_C3_input_size(224,224)_output_size(112,112)_contigTrue_dtypetorch.qint32
# Input: N: 4, C: 3, input_size: (224, 224), output_size: (112, 112), contig: True, dtype: torch.qint32
Forward Execution Time (us) : 2891.172

Reviewed By: hl475

Differential Revision: D18535766

fbshipit-source-id: 09d89cf23b3caab6c0bc3b8a9ae55cc439b98e0f
2019-11-15 13:55:38 -08:00
acb8100810 Updating submodules
Summary:
GitHub commits:

4c18636f6b
efdb8c4731
b8881f9d9a

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 7e3aeb7417c870ec2d8d46c3b83f1b7b5e9a98ec
2019-11-15 13:31:13 -08:00
7807d44934 Add TensorShapeAndType (#29848)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29848

design doc: https://docs.google.com/document/d/15luH8R7a0WMiZzoKxu6cI0a1XDW4C0vyaW3-XQ_3G30/edit#heading=h.cyvbc4wtxkn7

Test Plan: buck build

Reviewed By: ipiszy

Differential Revision: D18513718

fbshipit-source-id: c3e3b30b58360b898528422ba9618b1dd3beb0a8
2019-11-15 13:06:06 -08:00
5ab6635de1 Stop binding _th_resize_as_, which isn't used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29867

Test Plan: Imported from OSS

Differential Revision: D18521743

Pulled By: gchanan

fbshipit-source-id: 0c3f1bfabb29b2d20305657644edb2065a549bc3
2019-11-15 12:50:27 -08:00
8e61287d1b Skip outputting scalar_checks if they are false. (#29866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29866

This is a no-op anyway, so no reason to output.

Test Plan: Imported from OSS

Differential Revision: D18521742

Pulled By: gchanan

fbshipit-source-id: f695e453beeee609dbdf23d26f9b5eaf519e16b2
2019-11-15 12:50:22 -08:00
4442fa59c7 Avoid keeping old histograms in the histogram observer to fix the OOM issue (#29768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29768

The previous histogram observer saves all histograms for new data and merge the histograms in the end. It could cause OOM issue when we want to collect histograms on large amount of data. In this diff, we assume running the histogram observer with a single thread and remap the histogram after seeing new data.

Test Plan:
```
buck test mode/opt caffe2/caffe2/quantization/server:dynamic_histogram_test
```

```
buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger/workflows:int8_static_quantization_exporter -- --model-dir /mnt/public/summerdeng/ads/ --model-name downsized_ins_97293388_0.predictor --run --iter 10  --dataset-path /mnt/public/summerdeng/ads/ctr_instagram_story_int8/dataset/train/dataset_115764229_10 --hive-path="hive://ad_delivery/ig_ad_prefiltered_training_data_orc_injected/ds=2019-09-09/pipeline=ctr_instagram_story_click_only_model_opt_out_df" --collect-histogram --activation-histogram-file=/mnt/public/summerdeng/ads/ctr_instagram_story_int8/activation_histograms/dummy_debug_OOM.txt
```

Reviewed By: jspark1105

Differential Revision: D18458764

fbshipit-source-id: c0e36fffe9bf021efd17d8494deef43727333da2
2019-11-15 12:30:44 -08:00
7889e1e3f9 Add torch.version.hip from cmake (#29815)
Summary:
This adds the HIP_VERSION cmake variable as hip_version.
This should help detecting ROCm, e.g. in https://github.com/pytorch/pytorch/issues/22091.

To parallel CUDA, hip_version is a string.
An alternative variant might be to split by '.' and only take the first two parts.
The method suffers a bit from ROCm not being as monolithic as CUDA.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29815

Differential Revision: D18532267

Pulled By: bddppq

fbshipit-source-id: 1bde4ad0cfacc47bfd1c0945e130921d8575a5bf
2019-11-15 12:03:15 -08:00
69e343f2cc Expose is_signed for dtype (#29511)
Summary:
Changelog:
- Expose is_signed for torch.dtype by modifying torch/csrc/Dtype.cpp
- Allow half, bfloat16 and bool to also been "known" by the isSignedType function
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29511

Test Plan:
- Add tests in test/test_torch.py

Closes https://github.com/pytorch/pytorch/issues/29475

Differential Revision: D18439030

Pulled By: albanD

fbshipit-source-id: 4b1f9da70c1c8dfd0a5bc028b6936acd1c64af47
2019-11-15 11:16:45 -08:00
23fcc409d5 Revert "switch back to azure pipelines" (#29910)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29910

This reverts commit 6de1016f9dbf624f93f8c8d45feb56f8c222b7a6.

Test Plan: Imported from OSS

Differential Revision: D18532474

Pulled By: suo

fbshipit-source-id: 852fdcf21bd4aa7ca94322d64e43aab5a822cabc
2019-11-15 11:00:14 -08:00
a9c719ba82 Set TORCH_CXX_FLAGS in minimal example (#29890)
Summary:
To avoid ABI issue

EDIT: After this PR, the example CMakeLists.txt will always use the `-D_GLIBCXX_USE_CXX11_ABI` value set in `share/cmake/Torch/TorchConfig.cmake`, regardless of the `-D_GLIBCXX_USE_CXX11_ABI` value passed to the `cmake` command by the user.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29890

Differential Revision: D18531391

Pulled By: yf225

fbshipit-source-id: 2db78ae7a33a4088b579e81c60b9a74861f1ccde
2019-11-15 09:57:15 -08:00
9ec1727ea6 Makes test_type_promotion generic (#29417)
Summary:
Test type promotion was already running on CUDA with its own (tiny) version of a generic test framework. This PR makes it use the actual generic test framework.

In addition, the tests previously set the default dtype (and did not reset it). A new decorator replaces the previous style and resets the default dtype after each test. This is still not thread-safe, but at least there's a comment to that effect now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29417

Differential Revision: D18514545

Pulled By: mruberry

fbshipit-source-id: 5aad43481ae71124cba99fb2e4a946894f591d68
2019-11-15 09:54:07 -08:00
0108f473ad Use c10::to_string in more places (#29839)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29839

std::to_string isn't reliably available on Android.  Use c10::to_string
instead in some more files that we want to add to some Android builds.

Test Plan: CI

Reviewed By: linbinyu

Differential Revision: D18509295

fbshipit-source-id: 678af1abbea05777310499634ab01afbe21134d8
2019-11-15 09:22:59 -08:00
60ad2a96f0 Update torchvision in CI (#29853)
Summary:
Update torchvision in CI to include 44a5bae933.

This PR is blocking https://github.com/pytorch/pytorch/pull/29838.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29853

Differential Revision: D18531096

Pulled By: yf225

fbshipit-source-id: 19ed7628d08854108a05e01696e09c9b03a3d5f4
2019-11-15 09:18:35 -08:00
5e53c1501a Update CircleCI config to use Docker images from "pytorch" account (#29835)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29835

Using images from personal accounts restricts our ability to push
updates in a timely manner.

Test Plan: CI

Reviewed By: soumith

Differential Revision: D18524393

Pulled By: dreiss

fbshipit-source-id: f12dd3ce50c8362e152ed265e2d24bcb073dcfd4
2019-11-15 07:30:15 -08:00
510ef4b63a Add nn.quantized.Conv3d (#29813)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29813

Add nn.quantized.Conv3d

Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv"

Reviewed By: jianyuh

Differential Revision: D18467749

fbshipit-source-id: 892f708179e9e836ad902851ac1838847009da15
2019-11-15 04:33:40 -08:00
e1a309a647 Always include autograd context id in rpc/remote requests (#29781)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29781

Even though the request might not contain any requires_grad tensor,
the return value could. Therefore, we should always include the
autograd context id in the request.

closes #28819

Test Plan: Imported from OSS

Differential Revision: D18496709

Pulled By: mrshenli

fbshipit-source-id: 2f870c410291a1300952895b7488ea07e5574228
2019-11-14 23:02:11 -08:00
a34cc01dcc Implement backend level fallback for c10 (#28494)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28494

Allow a backend-level fallback kernel that is called whenever an operator doesn't have a concrete kernel for the backend.
This is needed for lazy.
ghstack-source-id: 93872571

Test Plan: unit tests

Differential Revision: D18081495

fbshipit-source-id: 5f4964249cc226a39fd6e929a5be88a771c401a7
2019-11-14 21:35:49 -08:00
3fa5917530 Simplify c10 dispatcher (#28314)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28314

Simplify the c10 dispatcher, making it more easy to understand.

Also, this moves the dispatch decision from the DispatchTable class into the Dispatcher class.
This is required because DispatchTable only knows things about one operator but the dispatch decision will (in future diffs) also need to look at backend-level fallbacks, for example for lazy.
ghstack-source-id: 93872575

Test Plan: unit tests

Differential Revision: D18018736

fbshipit-source-id: 375729d5e307e0622906f8cc9a0b087b94aea2b1
2019-11-14 21:35:44 -08:00
6dc8d72f94 Change from int64_t to jlong for mac build (#29861)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29861

Follow https://github.com/pytorch/pytorch/issues/6570 to run ./run_host_tests.sh for Mac Build, we saw error below:

```error: cannot initialize a parameter of type 'const facebook::jni::JPrimitiveArray<_jlongArray *>::T *' (aka 'const long *') with an rvalue of type
      'std::__1::vector<long long, std::__1::allocator<long long> >::value_type *' (aka 'long long *')
    jTensorShape->setRegion(0, tensorShapeVec.size(), tensorShapeVec.data());```
ghstack-source-id: 93961091

Test Plan: Run ./run_host_tests.sh and verify build succeed.

Reviewed By: dreiss

Differential Revision: D18519087

fbshipit-source-id: 869be12c82e6e0f64c878911dc12459defebf40b
2019-11-14 21:29:59 -08:00
893105b79e Add reset_parameters to torch::nn modules (#29832)
Summary:
This PR adds `reset_parameters` to the torch::nn modules whose Python version also has `reset_parameters` defined, so that there is better parity between Python and C++ version.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29832

Differential Revision: D18515939

Pulled By: yf225

fbshipit-source-id: 5aa23e5c7ce1026787c04ffeb6c7f167620dd491
2019-11-14 20:58:32 -08:00
831f25c53b add test/mobile/op_deps project for dependency analysis test (#29716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29716

Move out the test project from PR #29550 into this separate PR.

It serves these purposes:
- Defines the ".yaml" format to describe inter-op dependency.
- Can be used as a small testbed for us to quickly experiment, evaluate
  and test different dependency analysis techniques (llvm-pass, linker,
  etc).
- Covers various different c10 operator APIs and builds a runnable binary.

I create a 'mobile' folder under 'test/' because I feel we can create a
few other similar projects here to test mobile specific yet platform
independent stuff, e.g.:
- use host tool chain + mobile build options to do continuous E2E test;
- test custom build workflow for mobile;

Test Plan:
- run build script and verify the binary is runnable:
```
scripts/build_mobile.sh
test/mobile/op_deps/build.sh
```

Differential Revision: D18474641

Pulled By: ljk53

fbshipit-source-id: 3fae9da5e0e3fe6cb17ada8783d5da2f144a6194
2019-11-14 20:41:38 -08:00
b508de6412 add static libraries to TorchConfig.cmake.in (#29837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29837

The current TorchConfig seems only handles shared libraries. When
building static libraries it doesn't provide the list of all needed
static libraries. This is especially a problem for mobile build as we
build static libraries first then link into shared library / binary to
do "gc-sections". Today we have to manually import these dependent
libraries on each callsite.

Test Plan:
- build_mobile.sh builds and runs;
- The baby test project in #29716 builds and runs;
- Will check CI for other platforms;

Differential Revision: D18513404

Pulled By: ljk53

fbshipit-source-id: c3dc2c01004c4c9c4574c71fd9a4253c9e19e1e9
2019-11-14 20:41:33 -08:00
9371b31818 set USE_STATIC_DISPATCH outside cmake (#29715)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29715

Previous we hard code it to enable static dispatch when building mobile
library. Since we are exploring approaches to deprecate static dispatch
we should make it optional. This PR moved the setting from cmake to bash
build scripts which can be overridden.

Test Plan: - verified it's still using static dispatch when building with these scripts.

Differential Revision: D18474640

Pulled By: ljk53

fbshipit-source-id: 7591acc22009bfba36302e3b2a330b1428d8e3f1
2019-11-14 20:41:29 -08:00
60a33cac2b reduce input shapes of long tag in op bench (#29865)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865

For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 28418.926
...

Reviewed By: hl475

Differential Revision: D18520946

fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27
2019-11-14 20:19:09 -08:00
90e3bbf3ab support all with tag_filter to run all shapes (#29864)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864

This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : all

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N32_K256_cpu
# Input: M: 8, N: 32, K: 256, device: cpu
Forward Execution Time (us) : 6798.688

...

Reviewed By: hl475

Differential Revision: D18520249

fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2
2019-11-14 20:19:05 -08:00
5da2bf945e add embeddingbag to benchmark_all_test (#29830)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29830

as title

Test Plan: na

Reviewed By: hl475

Differential Revision: D18506023

fbshipit-source-id: 15693894c0aa736ab3e818bc740099f0d629cb84
2019-11-14 20:13:57 -08:00
371da6acef move get_rpc_timeout to pybind (#29765)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29765

instead of wrapping this C++ function with python that causes
unnecessary overhead, we can move this to pybind and use the `DefaultRpcAgent`
to get the timeout.
ghstack-source-id: 93879236

Test Plan: unit tests pass

Differential Revision: D18493195

fbshipit-source-id: fd0f1f13ee15acb5ea1ae7c696925c9b54304f6d
2019-11-14 19:39:22 -08:00
7a6c3b36a1 Switch ScriptModuleOp to use a unique_ptr
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29856

Test Plan: waitforsadcastle

Reviewed By: dzhulgakov

Differential Revision: D18516553

fbshipit-source-id: d1e2d49ec613d07b21cd30bd777fbd300032cba1
2019-11-14 19:36:00 -08:00
902c1f9ef1 Check for mutable default parameters (#29833)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/21545

We we were silently giving wrong semantics previously:

Python behavior:
```
def test(x=[]):
   x.append(1)
   return len(x)

print(test()) # 1
print(test()) # 2
```

By checking at the python layer, we prevent any new models from serializing this behavior but do not break existing serialized models.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29833

Differential Revision: D18513168

Pulled By: eellison

fbshipit-source-id: 6fe73f28e1f9d39dedeaf67a04718089d14401a1
2019-11-14 18:28:48 -08:00
77bb41c965 Rename dist_autograd_context and dist_autograd_container. (#29696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29696

The paths distributed/autograd/context/dist_autograd_context.h and
distributed/autograd/context/dist_autograd_container.h were repetitive.

Therefore renaming these to distributed/autograd/context/context.h and
distributed/autograd/context/container.h
ghstack-source-id: 93850266

Test Plan: waitforbuildbot

Differential Revision: D18467624

fbshipit-source-id: bbf3905396f553006851af296c880c1bd106ec47
2019-11-14 14:49:34 -08:00
06ef4a757d Add docs for RPC, dist autograd, and RRef modules (#29276)
Summary:
Closes https://github.com/pytorch/pytorch/issues/28983. Documentation for `torch.distributed.rpc` and `torch.distributed.autograd` modules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation.

Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: https://github.com/pytorch/pytorch/issues/23110, https://github.com/pytorch/pytorch/issues/26759
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29276

Differential Revision: D18478754

Pulled By: rohan-varma

fbshipit-source-id: e9a7089baf5275304e5408d319eb9bf98e53fff8
2019-11-14 14:32:03 -08:00
ce7058337c Remove two unused TH definitions of rsqrt.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29688

Differential Revision: D18465222

Pulled By: VitalyFedyunin

fbshipit-source-id: a33880db389b82a8242c79723830e0a3afd3d498
2019-11-14 14:28:17 -08:00
bfedace5e3 Expose miniz to Python (#29228)
Summary:
Stacked PRs
 * https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization
 * https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC
 * **https://github.com/pytorch/pytorch/issues/29228 - Expose miniz to Python**

This adds the miniz wrapper to Python along with some functionality so that it can operate on both files and buffers. Python's `zipfile` module is pretty slow (see https://github.com/pytorch/pytorch/issues/26573), but miniz solves most of the perf issues.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29228

Differential Revision: D18330945

Pulled By: driazati

fbshipit-source-id: 455a19bcb23b871d56e4233edbf897134b2c2f1d
2019-11-14 13:37:31 -08:00
eef349a679 host build gradle publishing (#29749)
Summary:
To publish snapshots:
`gradle -p android pytorch_host:uploadArchives`
(for test changed version to 0.0.1-SNAPSHOT)
Result:
https://oss.sonatype.org/#nexus-search;quick~pytorch_java_only

https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/
jar:
https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/pytorch_java_only-0.0.1-20191113.211446-1.jar

sources:
https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/pytorch_java_only-0.0.1-20191113.211446-1-sources.jar
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29749

Differential Revision: D18496644

Pulled By: IvanKobzarev

fbshipit-source-id: 136213c23b9ab1e3e22059ad9c8b53822c026b3b
2019-11-14 11:44:02 -08:00
65bb34d885 Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653

I didn't remove is_variable from Tensor for BC reasons, but I did
remove as many uses as I could from the codebase.
at::impl::variable_excluded_from_dispatch got moved to TensorBody.h
so that it's more widely accessible.

This diff is NOT semantics preserving.  Here are the major differences:

- In a number of native operator implementations, we tested that arguments
  are not variable.  I replaced these with asserts that variable is
  excluded from dispatch.  I actually don't think these asserts are really
  necessary now (they should certainly be true, but it's hard to get
  it wrong), but I've kept them for old time's sake.  At least, they'll detect
  if you call these functions before you've processed variable (indicating
  a bug in your kernel.)

- There are a number of places where we do a per-tensor test for being a
  variable, for better error reporting when someone commits Tensor/Variable
  confusion.  Although these tests are substantively the same as the
  tests above, in these cases I decided to *delete* the test entirely.
  The reasoning is that in these cases, we didn't really care about
  dispatch (also, see above; I'm not too sure we really need the dispatch
  asserts), we cared about Tensor/Variable confusion.  Since Tensor/Variable
  confusion is impossible now, we don't need the tests.  One of the key
  factors which pushed me one way or another was whether or not a function
  was doing per-tensor validation; if I kept the assert in such functions,
  I'd repeatedly access the TLS.  Even if we want to bring back the asserts,
  they would have to go somewhere else.

  Another similar idiom is the number of places we do !x.defined() ||
  x.is_variable(); I treated this equivalently.

- nuclear_norm's computation of compute_uv is a bit weird, but I think
  it's OK to just delete the is_variable case (I *suspect* that it is
  always the case that self.is_variable(), but it doesn't really matter.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18496168

Pulled By: ezyang

fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e
2019-11-14 11:41:02 -08:00
8d23f7a3a8 Only print original SourceRange on highlight
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29708

Test Plan: Imported from OSS

Differential Revision: D18472089

Pulled By: jamesr66a

fbshipit-source-id: 89cbe8edf4e3c90d3795a1f3ea55cb234e2682e0
2019-11-14 11:38:02 -08:00
7f4d4254c3 Make sure we only run Profiling Graph Executor tests on windows (e.g. no simple, no legacy)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29782

Differential Revision: D18496848

Pulled By: Krovatkin

fbshipit-source-id: 9d5dbf0fc6a350138a0094f79eef2f9f25b308f5
2019-11-14 11:25:54 -08:00
90ac35b7bd Fix tracing of autograd functions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29791

Test Plan: Imported from OSS

Differential Revision: D18499142

Pulled By: jamesr66a

fbshipit-source-id: 6c2864dfbfa0419c8c888d55e082a619d058b3ee
2019-11-14 11:18:07 -08:00
747233e3bd minir edit to fix benchmark_all_test cuda error (#29829)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29829

This diff replaces the if check cuda with to(device...) which is a much cleaner interface.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 129.548

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 48.313
...

Reviewed By: bddppq

Differential Revision: D18507568

fbshipit-source-id: 32534e76b2e27d59a631a4d76a0d93700e975ea4
2019-11-14 11:13:36 -08:00
Jie
c5ac70a0ea AdaptiveAvgPooling nhwc cuda update (#29700)
Summary:
1. Add clip on grid launch configs (Tests added in test_nn.py)
2. Assert on shared memory requirement, gives better hint when error out;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29700

Differential Revision: D18482556

Pulled By: VitalyFedyunin

fbshipit-source-id: df3f653185d7b477b2241f2ef4779670e9a78899
2019-11-14 11:02:48 -08:00
ad95099f45 fix benchmark_all_test when running on gpu (#29818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818

When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 165.241

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 56.546
...

Reviewed By: hl475

Differential Revision: D18506269

fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6
2019-11-14 10:10:48 -08:00
b70d571233 add embeddingbag operator the the benchmark suite (#29784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29784

Add embeddingbag operator to the benchmark suite with different number of embeddings, dims, and inputs.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:embeddingbag_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Forward Execution Time (us) : 624.838

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size64_offset0_sparseTrue
# Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 64, offset: 0, sparse: True
Forward Execution Time (us) : 636.744

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True
Backward Execution Time (us) : 2325.291

# Benchmarking PyTorch: embeddingbag
# Mode: Eager
# Name: embeddingbag_embeddingbags80_dim64_modesum_input_size16_offset0_sparseTrue
# Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True
Backward Execution Time (us) : 2528.658
...

Reviewed By: bddppq

Differential Revision: D18496340

fbshipit-source-id: 157dcff2ea4ec13416fe161382fcefd47ce4cc01
2019-11-14 10:05:47 -08:00
e53b510773 add addmm op to the benchmark suite (#29783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783

Add addmm operator which reuses existing input shapes for the add operator.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 759.237

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 922.764

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 4689.546

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1700.093

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd2
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2947.427

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K64_cpu_bwd3
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 2518.043

# Benchmarking PyTorch: addmm
# Mode: Eager
# Name: addmm_M64_N64_K128_cpu_bwdall
# Input: M: 64, N: 64, K: 128, device: cpu
Backward Execution Time (us) : 5848.369

Reviewed By: bddppq

Differential Revision: D18496476

fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a
2019-11-14 10:02:55 -08:00
dfa9c9e227 Replace make with cmake --build . in the docs (#29798)
Summary:
Inspired by https://discuss.pytorch.org/t/issues-with-tutorial-installing-c-distributions-of-pytorch/33295/11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29798

Differential Revision: D18504951

Pulled By: ezyang

fbshipit-source-id: 8e80d8891ca85196f00611fe784b2f55659e52ab
2019-11-14 08:23:19 -08:00
01d76145fc Fix typo: Caffe2_MAIN_LIB to Caffe2_MAIN_LIBS (#29746)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29746

I don't know if this actually broke anything because I just discovered
the typo while reading the cmake.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18504546

Pulled By: ezyang

fbshipit-source-id: 6cb5fb1e71721e5cf8fc2f7b5552dc7c514f065f
2019-11-14 07:55:09 -08:00
bf80664515 Add quantized conv3d function (#29686)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29686

Add quantized conv3d function

Test Plan: buck test mode/dev-nosan //caffe2/test:quauntized -- "conv"

Reviewed By: hl475

Differential Revision: D18463090

fbshipit-source-id: f9c3d2920c3fc015bbb2b6a583a582c9f8397b08
2019-11-14 03:04:51 -08:00
2d7d53cd87 Updating submodules
Summary:
GitHub commits:

dc64b842b8
c31d13303f
2a71bbc69e
4bb251a6af
15b4a705e6

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 70fff211005d374d558de25eb4342b84b7bcba25
2019-11-14 01:48:43 -08:00
4a1fcc0b83 Allow rpc.remote to create RRef on self (#29634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29634

This implementation supports rpc.remote to self by doing the
following steps:

1. create an owner RRef
2. add the owner RRef to owners_ in RRefContext, and keep it alive
   by using RRefId as the ForkId.
3. Go through serde and insert the message to the caller's thread-pool
4. When the response message gets processed, remove the itself from
   RRef fork map.

Test Plan: Imported from OSS

Differential Revision: D18445812

Pulled By: mrshenli

fbshipit-source-id: e3b9aa98962c388acbc2ce294101a236d5cb2da6
2019-11-14 00:10:24 -08:00
9fd7db616a Disable Caffe2 RCCL tests (#29792)
Summary:
They are flaky on master
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29792

Differential Revision: D18500737

Pulled By: bddppq

fbshipit-source-id: 18a39b2d6117a7c3b48e1d6a635f24acb35fc497
2019-11-13 23:56:21 -08:00
ba74be0d3e Update CODEOWNERS for distributed rpc framework. (#29788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29788

ghstack-source-id: 93889545

Test Plan: waitforbuildbot

Differential Revision: D18498997

fbshipit-source-id: e1419f1a487f7fe4d5f6af9de66e930da067b70e
2019-11-13 23:42:09 -08:00
4a27d2be18 Enabling intra-op parallelism for fbgemm_linear_int8_weight_fp32_activation op (#29532)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29532

As we are migrating from `torch.jit.quantized` to `torch.quantization.quantize_dynamic` API, we still need to temporarily add the intra-op parallelism support in the legacy ` fbgemm_linear_int8_weight_fp32_activation` API for the parallelization of RNN operators and help the performance debugging for some legacy serialized models with the old API.

```
from __future__ import absolute_import, division, print_function, unicode_literals

import time

import torch

K, N = 1024, 1024

print("M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16")

for M in (2, 20, 200, 500, 1024,):
    print(M, sep=",", end=", ")
    for num_threads in (1, 2, 4, 8, 16):

        torch.set_num_threads(num_threads)

        x = torch.rand(M, K)
        w = torch.rand(K, N)
        b = torch.rand(N)

        NITER = 20

        W_int8, col_offsets, W_scale, W_zp = torch.fbgemm_linear_quantize_weight(w)
        W_prepack = torch.fbgemm_pack_quantized_matrix(W_int8, W_int8.size(1), W_int8.size(0))

        s = time.time()
        for _ in range(NITER):
            Y_fp32 = torch.fbgemm_linear_int8_weight(x, w, W_prepack, col_offsets, W_scale, W_zp, b)
        elapsed_per_iter_dyn_quant = (time.time() - s) / NITER

        print(
            "{:0.2f}".format(2.0 * M * N * K / elapsed_per_iter_dyn_quant / 1e9),
            end=", ",
        )
    print("\n", end="")
```

On SKL T1 server:

Before the Diff:
```
[root@rtptest33418.frc2 ~/jhuang_test]# ./torch_fbgemm_linear_int8_weight_fp32_activation.par
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
2, 41.01, 51.51, 51.63, 51.49, 52.10,
20, 80.94, 81.43, 82.35, 82.27, 82.24,
200, 87.94, 87.61, 88.53, 88.43, 88.52,
500, 88.76, 89.60, 89.80, 89.65, 89.76,
1024, 88.01, 89.58, 90.11, 90.39, 89.96,
```
After the Diff:
```
[root@rtptest33418.frc2 ~/jhuang_test]# ./torch_fbgemm_linear_int8_weight_fp32_activation.par
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
2, 45.08, 70.38, 72.22, 61.59, 44.15,
20, 83.09, 137.86, 205.58, 254.19, 201.08,
200, 87.86, 157.85, 287.24, 420.26, 476.16,
500, 88.57, 162.19, 296.52, 500.91, 530.25,
1024, 88.34, 147.47, 296.78, 534.45, 482.10,
```

ghstack-source-id: 93666880

Test Plan: CI

Differential Revision: D18421371

fbshipit-source-id: 22cc1031ec9ee914c1508ba2aa9ed0281dfcd076
2019-11-13 23:12:06 -08:00
f3b15727c5 fix op benchmark OOM issue (#29794)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794

Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 52.493

# Benchmarking PyTorch: relu
# Mode: Eager
# Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8
# Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8
Forward Execution Time (us) : 44.945
...

Reviewed By: hl475

Differential Revision: D18500103

fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e
2019-11-13 22:22:58 -08:00
aa6e992ffb Subscribe for record function and if android do atrace (#28708)
Summary:
ghstack-source-id: 5edaf471557c25098ca0547229f2763760866887
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28708

Some cpp formatting changes as I run `clang-format -i`

Testing on devserver:
make assets (models):
```
pushd android/test_app/; python make_assets.py; popd
```
Build test_app apk:
```
TRACE_ENABLED=1 sh android/build_test_app.sh

find . -type f -name *apk
./android/test_app/app/build/outputs/apk/mobNet2Quant/debug/test_app-mobNet2Quant-debug.apk
./android/test_app/app/build/outputs/apk/resnet18/debug/test_app-resnet18-debug.apk
```

Install apk:
`adb install -r test_app-mobNet2Quant-debug.apk`
Run app on the device.
Systrace:
```
$ANDROID_HOME/platform-tools/systrace/systrace.py -t 10 -a org.pytorch.testapp.mobNet2Quant sched freq idle am wm gfx view binder_driver hal dalvik camera input res -o trace.html
```
trace.html contains sections like `jni::Module::forward`

![Screenshot 2019-11-12 18 36 30](https://user-images.githubusercontent.com/6638825/68728156-5d245580-057b-11ea-9e71-e47681894fe4.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28712

Differential Revision: D18495898

Pulled By: IvanKobzarev

fbshipit-source-id: 0bced4a442f9dd90525520972a2c1f5d51f57df3
2019-11-13 20:55:40 -08:00
a68c52494c Use F::*FuncOptions for embedding/embeddingbag functionals (#29673)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29673

Following https://github.com/pytorch/pytorch/pull/29364 and https://github.com/pytorch/pytorch/pull/29404, this PR makes `F::EmbeddingFuncOptions` and `F::EmbeddingBagFuncOptions` separate classes from `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`, so that it's easier to enforce that arguments such as `num_embeddings` and `embedding_dim` are required for `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`.

Test Plan: Imported from OSS

Differential Revision: D18462540

Pulled By: yf225

fbshipit-source-id: f2abf431e48675b0a9d7f6f398cdb90ff9037c35
2019-11-13 18:47:22 -08:00
9ee6fa0145 Use NNPACK for strided convolutions. (#29595)
Summary:
Use NNPACK for strided convolutions.

ResNet50 on Pixel 3:
- Before: 552.956 ms
- After: 402.947 ms
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29595

Reviewed By: houseroad

Differential Revision: D18457472

Pulled By: AshkanAliabadi

fbshipit-source-id: 51f22ce120c39f197cd564bcc71bbad2951edf85
2019-11-13 17:10:41 -08:00
ed788ec780 Linearizable Label: Class Weights, Allow Missing Label, and Average by Batch Size (#29707)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29707

In D17885977, Linearizable label (a multi-class classification) was implemented in MTML.

In this diff, we add several items for Linearizable label:

- Assigning different weights to each class through ```model_def.tasks[i].class_weights```.

  - This option is a dictionary, the keys of which are indices of the classes and the values of which are weights for each class.

  - For example, if a linearizable-label task has 4 classes and its ```class_weights = {"0": 1, "1": 0.1, "2": 0.1, "3": 0.01}```, it means that in the loss function of this task, we assign weight 1 to its first class, weight 0.1 to its second and third class, and weight 0.01 to its forth class. The index/order of classes follows the logic of linearizable label.

  - Note that when you assign different weights to different classes, you need to correct the calibration by setting an appropriate ```model_def.tasks[i].calibration.linearizable_class_weight```. Basically, the class weights in calibration should be the reciprocals of the class weights in loss function. So the ```calibration.linearizable_class_weight = {"0": 1, "1": 10, "2": 10, "3": 100}``` for the example above.

  - Example FBLearner job: f150763093

- We also support ```model_def.allow_missing_label_with_zero_weight``` for linearizable label, which will ignore those examples with first label missing, by assigning zero weights to them in loss function.

  - We need to set ```allow_missing_label_with_zero_weight = true``` to enable it.

  - Example FBLearner job: f150763093

- Last but not least, we update caffe2 operator ```SoftmaxWithLoss``` to support loss averaged by batch size.

  - We need to set ```model_def.tasks[i].loss.softmaxLoss.average_by_batch_size = true``` to enable it.

  - Previously, the loss was averaged by weight sum of examples in batch, which is still the default behavior now (when ```average_by_batch_size = null``` or ```average_by_batch_size = false```).

  - Without this new feature, the calibration will be incorrect when applying non-equal-weight training among different classes to a linearizable task.

  - Example FBLearner job with ```average_by_batch_size = true``` results in a correct calibration: f150763093

  - Example FBLearner job with ```average_by_batch_size = null``` results in an incorrect calibration: f150762990

Test Plan:
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_class_weights
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_zero_weight
buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_average_by_batch_size

All tests passed.

full canary: https://fburl.com/fblearner/troznfgh

Reviewed By: chenshouyuan

Differential Revision: D18461163

fbshipit-source-id: aaf3df031406ae94f74e2e365b57e47409ef0bfe
2019-11-13 16:52:27 -08:00
b8dca04f73 Add error message if CUDA startup fails (#29670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670

This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included.

Test Plan: Build without gpu code.  Run the binary.  Check that the new error message exists.

Reviewed By: yfeldblum

Differential Revision: D18453798

fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e
2019-11-13 16:48:40 -08:00
5654eccfe2 Add pytorch_jni_lite for lite interpreter. (#29621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29621

Add pytorch_jni_lite for lite interpreter.
ghstack-source-id: 93867325

Test Plan:
buck build xplat/caffe2/android:pytorch-jni

buck build xplat/caffe2/android:pytorch

buck install -r fb4a

Reviewed By: dreiss

Differential Revision: D18438343

fbshipit-source-id: 7d4dee11d352cc9a67339c45d9d7f4a2ba285ebc
2019-11-13 16:16:29 -08:00
681b610f35 use new overload mechanism for rnns (#29614)
Summary:
Uses new overload mechanism for rnns, making it so that python & torchscript go through the same path and using an API that is in line with the one specified
in https://docs.python.org/3/library/typing.html#typing.overload

This brings the TorchScriptable rnns closer to the base implementation; unifying them should be done in a follow up PR but there are still a few limitations that make it difficult to do so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29614

Differential Revision: D18486982

Pulled By: eellison

fbshipit-source-id: aaaea66a4a7f12d2e46199ca254f9e8f7475500e
2019-11-13 15:44:25 -08:00
91bef3d189 Simplify copy kernel with static_cast_with_inter_type (#29631)
Summary:
After https://github.com/pytorch/pytorch/pull/29612 get merged, `static_cast_with_inter_type` can now automatically convert complex types to its real values, therefore there is no need to do it inside copy kernel.

This should wait until https://github.com/pytorch/pytorch/pull/29612 get merged, otherwise it won't pass CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29631

Differential Revision: D18485676

Pulled By: ezyang

fbshipit-source-id: 0bbfd551e3d3010f87eef0fce23a1f8a094b7d31
2019-11-13 15:36:22 -08:00
65f691f2c2 Add more tests for torch::arange
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29689

Test Plan: Imported from OSS

Differential Revision: D18465818

Pulled By: yf225

fbshipit-source-id: 0cf0aaa7febcf4318abdaae7d17a43ab3acde017
2019-11-13 15:17:16 -08:00
2bcac59a30 Use default dtype for torch::tensor(floating_point_values) and torch::tensor(empty braced-init-list) when dtype is not specified (#29632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29632

This PR is BC-breaking in the following way:

Previously, C++ `torch::tensor` with a floating-point literal with no suffix (e.g. `torch::tensor(1.1)`) or a (nested) braced-init-list of
floating-point literals with no suffix (e.g. `torch::tensor({{1.1, 2.2}})` produces a tensor with dtype `at::kDouble`. After this PR, it produces a tensor with dtype `torch::get_default_dtype()`, matching Python `torch.tensor` behavior.

Test Plan: Imported from OSS

Differential Revision: D18465819

Pulled By: yf225

fbshipit-source-id: 6834fe50335c677bc3832f2a5e9cf8d1ede9f665
2019-11-13 15:17:11 -08:00
3fb9bbc99b refactor and move createException function (#29605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29605

Adds a wrapper around the existing createException function that
allows passing of an error string, instead of a regular C++ exception. This
allows us to createExceptions for errors that aren't necessarilu c++
exceptions. This function is used by
https://github.com/pytorch/pytorch/pull/29601 and
https://github.com/pytorch/pytorch/pull/26336.
ghstack-source-id: 93819039

Test Plan: Unit tests pass

Differential Revision: D18439216

fbshipit-source-id: 70b6a2e4f107304e322cdd2630847ad0071bc0c1
2019-11-13 14:53:22 -08:00
78bd0069d3 enable back 2 tests for simple exec
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29661

Differential Revision: D18456143

Pulled By: Krovatkin

fbshipit-source-id: 9e4ae3ae681e3c9a81ada1e8b39da1e1342ce394
2019-11-13 14:22:19 -08:00
71aacf7b82 Gradle build offline dependencies #2 (#29738)
Summary:
The issue with previous build was that after phabricators lint error about double quotes I changed:
`$GRADLE_PATH $GRADLE_PARAMS` -> `"$GRADLE_PATH" "$GRADLE_PARAMS"`
which ended in error:
```
Nov 13 17:16:38 + /opt/gradle/gradle-4.10.3/bin/gradle '-p android assembleRelease --debug --stacktrace --offline'
Nov 13 17:16:40 Starting a Gradle Daemon (subsequent builds will be faster)
Nov 13 17:16:41
Nov 13 17:16:41 FAILURE: Build failed with an exception.
Nov 13 17:16:41
Nov 13 17:16:41 * What went wrong:
Nov 13 17:16:41 The specified project directory '/var/lib/jenkins/workspace/ android assembleRelease --debug --stacktrace --offline' does not exist.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29738

Differential Revision: D18486605

Pulled By: IvanKobzarev

fbshipit-source-id: 2b06600feb9db35b49e097a6d44422f50e46bb20
2019-11-13 13:56:37 -08:00
2b05ae0704 Revert "Enable test_distributed for ROCm but only with nccl backend" (#29736)
Summary:
This reverts commit 7073ee209000a7781c0c863c4ef39bb3bfdb4932.

They are flaky on master:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6830//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6824//console
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6802//console

cc jithunnair-amd
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29736

Differential Revision: D18480543

Pulled By: bddppq

fbshipit-source-id: 9a1dd9aa5f5959dc6fbbfdab0df997514221217a
2019-11-13 13:53:05 -08:00
c800591030 Update ATen/native/README.md about broadcasting (#29742)
Summary:
Is this description still true? I have never seen any `s_` ops.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29742

Differential Revision: D18485707

Pulled By: ezyang

fbshipit-source-id: c5ce2587bb499561706c3c2817571ee11f7eb63c
2019-11-13 13:46:54 -08:00
b37c235d86 C++/Python API parity for Conv{1,2,3}d layers, and add F::conv{1,2,3}d functionals (#28917)
Summary:
This PR changes the implementation of C++ Conv{1,2,3}d layers to exactly match the Python version, and add F::conv{1,2,3}d functionals. For more thorough testing, I will rely on the parity test mechanism which uses values from `common_nn.py` to generate the inputs and options that we are interested in testing.

This PR is BC-breaking in the following way:

In `Conv{1,2,3}dOptions`:
- `with_bias` is renamed to `bias`.
- `input_channels` is renamed to `in_channels`.
- `output_channels` is renamed to `out_channels`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28917

Differential Revision: D18471526

Pulled By: yf225

fbshipit-source-id: 7a33f60654ad93cc2e043245e7ff9e0ef9da15b3
2019-11-13 12:53:31 -08:00
7f485121a6 Avoid MSVC _cvtsh_ss() workaround with clang-cl (#29726)
Summary:
We (me fnabulsi bmcdb) have a handful of fixes used locally to build and run with clang-cl. I am aware of https://github.com/pytorch/pytorch/issues/8784 but it has not been touched in almost a year.

It may be more practical to upstream the non-controversial fixes piecewise. For example, this one.

Here, the dummy version of `_cvtsh_ss` for MSVC is not required (and hence causes conflicts) when using clang-cl so can be #ifdef'd out.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29726

Differential Revision: D18478120

Pulled By: ezyang

fbshipit-source-id: cdcd94251e68347446f2ad1ac5a0e71089f7d0ab
2019-11-13 12:49:13 -08:00
ed215b1c03 named tensor support for torch.equal (#29322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29322

torch.equal checks if two tensors are equal in both size and values. For
named tensors, it also checks that the names are exactly equal. There is
an argument to be made for alternative semantics (check that the names
*match*), but for an API that is called "equal" I would expect it to
check equality on names as well.

Test Plan: - new tests

Differential Revision: D18453387

Pulled By: zou3519

fbshipit-source-id: d52bde4e3fdd7f331eef097a3b31d35c89c78049
2019-11-13 12:45:06 -08:00
5e64cfa663 Make TensorName::unifyFromRight in-place for efficiency (#29307)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29307

In our name inference functions we currently create an extra TensorNames
every time we unify names. This isn't completely necessary.

To do this, I made the following changes:
- TensorName now has two states, initialized and uninitialized
- Renamed unifyFromRight to unifyFromRightInplace.

Test Plan: - `pytest test/test_namedtensor.py -v`

Differential Revision: D18453388

Pulled By: zou3519

fbshipit-source-id: 96c3c6fd9478d57e92e1cf770c864aeac6d29dd2
2019-11-13 12:45:01 -08:00
6de1016f9d switch back to azure pipelines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29740

Test Plan: Imported from OSS

Differential Revision: D18482697

Pulled By: suo

fbshipit-source-id: 72a454457a005f82683079b79a77343e20c34021
2019-11-13 11:50:38 -08:00
73a926fd5d Updating submodules
Summary:
GitHub commits:

756806e65b
9feea971d1
3eeb12badf
bb23bfe63c
4e8cee1305

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 7b3adb4e20270aa7210e1a178ab26b0f47920861
2019-11-13 11:15:27 -08:00
f0dd7517f2 Add option to clean up allocated activations between c2 runs (#29619)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29619

att

Reviewed By: houseroad

Differential Revision: D18415190

fbshipit-source-id: 739aaf436578fac635df10de42b35e2b4368df37
2019-11-13 10:30:10 -08:00
03d021ddb8 Allow unrelated histories when rebasing to master (#29699)
Summary:
Some prs are refused to be merged.

For example, https://github.com/pytorch/pytorch/pull/29595
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29699

Reviewed By: hl475

Differential Revision: D18473531

Pulled By: houseroad

fbshipit-source-id: e7a4eb1b4be9d9da6dc281575eeb4d7ae685b531
2019-11-13 09:50:43 -08:00
5635a72069 Revert D18451046: CPU-Strided-Complex Fixes for real and imag ops
Test Plan: revert-hammer

Differential Revision:
D18451046

Original commit changeset: b9dcd8e25e91

fbshipit-source-id: efd30957fc551fe8bf335d66b69e30af63b71752
2019-11-13 09:00:16 -08:00
6d54c5ddd2 Missing host device (#29547)
Summary:
Missing `__device__` and `__host__` annotations in the complex case. Make it less UB.

Note that this still rather unsavory code: `std::real` is only `constexpr` from C++14 on onwards ( https://en.cppreference.com/w/cpp/numeric/complex/real2 ) which is the requirement for `__device__`.

What I am trying to say is: this particular piece of code should not have passed review and not been merged, IMHO, as it tries to codify UB.

Also note that the benchmarks referenced in source were CPU and CUDA-only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29547

Differential Revision: D18428156

Pulled By: bddppq

fbshipit-source-id: 855ced903ef91bd7f82fcd3a2167ae59bdd30d8b
2019-11-13 08:32:08 -08:00
9b1ff8090d CPU-Strided-Complex Fixes for real and imag ops (#29607)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

- [x]  Replaced std:real(a) with a.real() in kernel level code.
- [x]  Fixed Vec256_base implementation of complex ops so that it works correctly on Non-AVX devices.
- [ ]  Clean up CopyKernel after https://github.com/pytorch/pytorch/issues/29612 is approved.
zasdfgbnm is fixing this issue in https://github.com/pytorch/pytorch/issues/29612. This should be added first.

cc: iotamudelta, ezyang, bddppq
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29607

Differential Revision: D18451046

Pulled By: ezyang

fbshipit-source-id: b9dcd8e25e91cab13bd131b070d027b090cdedc9
2019-11-13 08:19:40 -08:00
0c91ebb694 Delete all trivial uses of make_variable. (#29213)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29213

A trivial use of make_variable is one where requires_grad=False.  This
transformation is not technically semantics preserving, as make_variable
will create a shallow copy of the tensor in question; however, I
am guessing that we have the invariant that we don't actually make
use of this shallow copy in a nontrivial way.

There were some cases where the surrounding code expected a Variable proper
to be returned; I retained those sites.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18353503

Pulled By: ezyang

fbshipit-source-id: 57fe34d82e009c0cc852266fb0b79d6d9c62bb03
2019-11-13 07:43:41 -08:00
89e187a2f5 Miscellaneous follow up for code review comments (#29204)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29204

Code review comments from #28620

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18353506

Pulled By: ezyang

fbshipit-source-id: 0432ce513eff257fd85cddff8bc3e41935127ed8
2019-11-13 07:43:36 -08:00
30092df15e Rename getNonVariableDeprecatedTypeProperties to getDeprecatedTypeProperties (#29203)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29203

There is no more Variable/Tensor distinction, so fix the misleading name.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18353505

Pulled By: ezyang

fbshipit-source-id: dadc394d533ab7746f70bc186c6645441a784518
2019-11-13 07:43:32 -08:00
7da9ac5afd Revert D18455666: Gradle build with offline dependencies
Test Plan: revert-hammer

Differential Revision:
D18455666

Original commit changeset: 8fb0b54fd94e

fbshipit-source-id: 559903b42cf7e5763099cf33f02940035c8505df
2019-11-13 07:24:13 -08:00
715e951e3c Revert D18458751: use new overload mechanism for rnns
Test Plan: revert-hammer

Differential Revision:
D18458751

Original commit changeset: 07c71838f21c

fbshipit-source-id: 86acb02f3e022e93ea6c1ef23fe39c80ad43978f
2019-11-13 07:21:31 -08:00
e870a9a870 More checks on MSVC (#29709)
Summary:
The flags `/sdl` and `/permissive-` are switched on automatically when using the VS GUI. Adding those checks will ensure that those annoying errors won't appear when users use the VS GUI to build their project.

More info:
https://docs.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2017
https://docs.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=vs-2017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29709

Differential Revision: D18473888

Pulled By: bddppq

fbshipit-source-id: 21156b0232a5dc3b566d14491d00bacb11493254
2019-11-13 00:15:40 -08:00
7b86199fc0 Switch XLA to only override abstract functions (#29636)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29636

This is a followup of li-roy 's work https://github.com/pytorch/pytorch/pull/23282. (I messed up the rebase there :(

After https://github.com/pytorch/xla/issues/1225 is done we are good to move the integration to only override abstract functions.

This PR contains a TODO which I'll remove in next 2 followup PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29438

Reviewed By: ljk53

Differential Revision: D18445927

Pulled By: ailzhang

fbshipit-source-id: 52ea98626d6d6140241b5a4796a5c0d0c1b922ba
2019-11-13 00:09:37 -08:00
3a72662d01 Restructure comparison ops so as to better support XLA dispatch (#29591)
Summary:
Per ailzhang's suggestion in https://github.com/pytorch/pytorch/pull/28162#discussion_r344361926, this PR changes the implementation of binary comparison and logical ops
to those of unary ops in UnaryOps.cpp. The reason is that the call should eventually go through
at::op_out (e.g., at::logical_xor_out).

The check for Boolean output tensor is also removed, because:

- This check should only apply to _out functions but not on other variants. However, other variants
  must go through the _out variant eventually.
- It does not have a clear motivation and seems unnecessary.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29591

Differential Revision: D18460113

Pulled By: ailzhang

fbshipit-source-id: 58d501e59335186b3b8cc7d80ee9eed74efeeac8
2019-11-12 23:42:30 -08:00
09d359dfd9 Changed default args in quantization observers
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29640

Test Plan: Imported from OSS

Differential Revision: D18447297

Pulled By: z-a-f

fbshipit-source-id: 7c86a5bb467a2fad8fe30c935d9c031c69868296
2019-11-12 23:32:05 -08:00
d2aa4c611f observer benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29508

Test Plan: Imported from OSS

Differential Revision: D18415171

Pulled By: z-a-f

fbshipit-source-id: 5ebedee8c17448e36853e0c1bf778bb128975678
2019-11-12 23:28:10 -08:00
d8732b3b43 Gradle build with offline dependencies (#29262)
Summary:
https://github.com/pytorch/pytorch/issues/29159

Introducing GRADLE_OFFLINE environment variable to use '--offline' gradle argument which will only use local gradle cache without network.

As it is cache and has some expiration logic - before every start of gradle 'touch' files to update last access time.

Deploying new docker images that includes prefetching to gradle cache all android dependencies, commit with update of docker images: df07dd5681

Reenable android gradle jobs on CI (revert of 54e6a7eede)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29262

Differential Revision: D18455666

Pulled By: IvanKobzarev

fbshipit-source-id: 8fb0b54fd94e13b3144af2e345c6b00b258dcc0f
2019-11-12 22:48:23 -08:00
20fb8a814c PackedSequence support for quantized LSTM
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29585

Test Plan: Imported from OSS

Differential Revision: D18436569

Pulled By: jamesr66a

fbshipit-source-id: 0f32c0fcc897894e30d8e7ff203392c1a961ce60
2019-11-12 20:13:38 -08:00
87363a8102 Revert D18466043: Pin Linux image and modules version to 4.4.0-166
Test Plan: revert-hammer

Differential Revision:
D18466043

Original commit changeset: d3c69c9ab3bf

fbshipit-source-id: 49365be7edf82923ade9c17b862f6e942c62b1ac
2019-11-12 19:08:44 -08:00
5a8ad66354 Do not show cuda stats in autograd profiler when use_cuda=False (#29666)
Summary:
Example
```python
import torch
x = torch.randn(1)
with torch.autograd.profiler.profile(use_cuda=False) as prof:
    x + x
print(prof.key_averages().table(sort_by='cpu_time_total'))
```

Before:
```
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     CUDA total %     CUDA total       CUDA time avg    Number of Calls
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
add      100.00%          25.781ms         100.00%          25.781ms         25.781ms         NaN              0.000us          0.000us          1
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 25.781ms
CUDA time total: 0.000us
```

After:
```
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Name     Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
add      100.00%          25.037ms         100.00%          25.037ms         25.037ms         1
-------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------
Self CPU time total: 25.037ms
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29666

Differential Revision: D18458828

Pulled By: bddppq

fbshipit-source-id: d96ef4cec8b1e85b77c211292a3099048882734d
2019-11-12 17:53:20 -08:00
95cad57340 Turn on named tensors for all builds (#29603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29603

Previously, named tensors were off for the internal caffe2 xplat builds.
However, we have since excised the caffe2 xplat build's dependencies on
PyTorch. This makes it so that we can turn on named tensors for all
builds.

Test Plan: - Wait for CI

Differential Revision: D18439084

Pulled By: zou3519

fbshipit-source-id: f1cc405d0ce9ffe991eff1bbb80575ce87c02d4a
2019-11-12 17:03:26 -08:00
907a29de70 Pin Linux image and modules version to 4.4.0-166 (#29690)
Summary:
When installing the 4.4.0-168 version, the following error is thrown (e.g. in https://app.circleci.com/jobs/github/pytorch/pytorch/3577840):
```
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 linux-image-generic : Depends: linux-image-4.4.0-168-generic but it is not going to be installed or
                                linux-image-unsigned-4.4.0-168-generic but it is not installable
                       Depends: linux-modules-extra-4.4.0-168-generic but it is not installable
                       Recommends: thermald but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
```
The (temporary) solution is to pin the Linux image and modules version to 4.4.0-166.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29690

Differential Revision: D18466043

Pulled By: yf225

fbshipit-source-id: d3c69c9ab3bf505c6eb3a2edd138e9789b62b6d6
2019-11-12 17:00:29 -08:00
29e509ff1d Fix a missing comma in quantized benchmark
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29685

Test Plan: Imported from OSS

Differential Revision: D18463246

Pulled By: z-a-f

fbshipit-source-id: c21fd7892f3701afcc5faa8bc03f98b6f6550d0f
2019-11-12 16:50:46 -08:00
8875120b54 Make dropout condition on training.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29436

Reviewed By: bddppq

Differential Revision: D18438288

Pulled By: ailzhang

fbshipit-source-id: d9c6fe4bd734dc87b2154b0ccd80efcb61740ec9
2019-11-12 16:32:02 -08:00
422fbfb108 Fix some issues for lite interpreter internal build. (#29620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29620

Modify buck for lite interpreter to build successfully on internal integration.
ghstack-source-id: 93733618

Test Plan: buck build xplat/caffe2:torch_mobile_coreAndroid

Reviewed By: iseeyuan

Differential Revision: D18438105

fbshipit-source-id: d6f6615623a385383105763733607c3872c89c42
2019-11-12 16:16:42 -08:00
bd0394d473 Add op bitwise_xor to replace __xor__ and __ixor__ (#25665)
Summary:
We define `bitwise_xor` instead of
`__xor__` and `__ixor__`. The reason is that (a) it is not idiomatic to call
functions starting and ending with double underscores, and that (b) the
types of argument that we can add is limited (e.g., no out), and that (c) consistent with the naming of `bitwise_not` and numpy.

Fix https://github.com/pytorch/pytorch/issues/24513,  Fix https://github.com/pytorch/pytorch/issues/24517, Fix https://github.com/pytorch/pytorch/issues/24660, Fix https://github.com/pytorch/pytorch/issues/24664
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25665

Differential Revision: D17577143

Pulled By: VitalyFedyunin

fbshipit-source-id: 042f6385f9305bd66d50a8ce82e28f40a23a7266
2019-11-12 16:14:04 -08:00
8e7b406773 use new overload mechanism for rnns (#29614)
Summary:
Uses new overload mechanism for rnns, making it so that python & torchscript go through the same path and using an API that is in line with the one specified
in https://docs.python.org/3/library/typing.html#typing.overload

This brings the TorchScriptable rnns closer to the base implementation; unifying them should be done in a follow up PR but there are still a few limitations that make it difficult to do so.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29614

Differential Revision: D18458751

Pulled By: eellison

fbshipit-source-id: 07c71838f21cb5425e8d6dbd4a512f774c8c2970
2019-11-12 16:12:04 -08:00
433baf1b90 Change arg dtype from float to double in LPPool and nn/utils/clip_grad.h (#29584)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29584

In Python, `float` dtype is always 64-bit (https://stackoverflow.com/a/8216110), and the C++ equivalent APIs should take `double` dtype to match the bit length.

Test Plan: Imported from OSS

Differential Revision: D18436616

Pulled By: yf225

fbshipit-source-id: ece510bba6f089ccada03af216f4805bbd03f5f2
2019-11-12 16:05:35 -08:00
65bfcde05e Use c10::variant-based enums for SmoothL1Loss module and functional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29536

Test Plan: Imported from OSS

Differential Revision: D18432272

Pulled By: yf225

fbshipit-source-id: fa355145962e93025b7de98b99b0a4fc82e8c871
2019-11-12 16:05:31 -08:00
57eab22c6a Use c10::variant-based enums for F::grid_sample
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29535

Test Plan: Imported from OSS

Differential Revision: D18432273

Pulled By: yf225

fbshipit-source-id: 11476f0431a9b544dfb62bc7a89bab84399f9b83
2019-11-12 16:05:26 -08:00
9f879ef532 Make all non-input arguments to functionals part of its options (#29404)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29404

This PR makes all non-input arguments to functionals part of its options parameters, so that we won't break backward compatibility even if we add or reorder some of the non-input arguments to functionals in the future.

Test Plan: Imported from OSS

Differential Revision: D18378526

Pulled By: yf225

fbshipit-source-id: f5cf6bdfb844e75bf94fdee58c121e0955631b6e
2019-11-12 16:05:22 -08:00
c3b2c2e353 Design doc for distributed autograd. (#29175)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29175

Updates our docs to include a design doc for distributed autograd.
Currently, this doc only covers the FAST mode algorithm. The Smart mode
algorithm section just refers to the original RFC.

There is a section for Distributed Optimizer that we can complete once we've
finalized the API for the same.
ghstack-source-id: 93701129

Test Plan: look at docs.

Differential Revision: D18318949

fbshipit-source-id: 670ea1b6bb84692f07facee26946bbc6ce8c650c
2019-11-12 15:04:23 -08:00
b0c245d52d Consolidate the places that find pybind11 include dirs (#29659)
Summary:
Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659

Differential Revision: D18458208

Pulled By: bddppq

fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d
2019-11-12 14:51:56 -08:00
fd8f74e688 Remove observer module after insert_quant_dequant (#29622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29622

Remove the observer module in the quantized model

Test Plan: python test/test_jit.py 'TestJit.test_insert_quant_dequant'

Differential Revision: D18442888

Pulled By: jerryzh168

fbshipit-source-id: 22c777569af0e814661fe51f76341b39600fae0d
2019-11-12 14:48:40 -08:00
fbe90b65fa Cleanup special handling of Containers, allowing custom forwards (#28988)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28988

Make ModuleList, Sequential, ModuleDict go through the same pathway as other modules, cleaning up a bunch of code and allowing them to define custom forwards and other methods.

EDIT: Previously, we would ignore an nn.Sequential attribute if it was not in `__constants__` ("did you forget to add it to Constants"). This PR scripts it even if it is not in `__constants__`. Is that what we want?

Test Plan: Imported from OSS

Differential Revision: D18402821

Pulled By: eellison

fbshipit-source-id: dd4f28fb0df0d1ba4ad1b3bc34ba141959a433f7
2019-11-12 14:10:38 -08:00
3175f5543a Make nn.Sequential iterable (#28987)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28987

We have `__iter__` defined on nn.ModuleList. Chainer's `Sequential` defines `__iter__`. This will also be helpful in modules which extend `nn.Sequential` and define a custom forward, because they can use the `for x in self` syntax that is supported in both python & TorchScript.

Test Plan: Imported from OSS

Differential Revision: D18402822

Pulled By: eellison

fbshipit-source-id: 1ece0f891a9d37f401e232320f58b056d5481856
2019-11-12 14:10:34 -08:00
eeb7199ccc updated name_inference doc for cumsum and cumprod (#29453)
Summary:
cumsum/cumprod  perform their own respective operations over a desired dimension, but there is no reduction in dimensions in the process, i.e. they are not reduction operations and hence just keep the input names of the tensor on which the operation is performed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29453

Differential Revision: D18455683

Pulled By: anjali411

fbshipit-source-id: 9e250d3077ff3d8f3405d20331f4b6ff05151a28
2019-11-12 13:43:47 -08:00
9bb0e2834d Fixing data type in quantized pool benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29663

Test Plan: Imported from OSS

Differential Revision: D18456671

Pulled By: z-a-f

fbshipit-source-id: b36fc56e4f29937e458308f4c13f7a5e37665269
2019-11-12 13:22:53 -08:00
82913a266d Skip copy_same_type_transpose_ for quantized tensor (#29609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29609

We can enable this path later if there is a need.
trying to fix: https://github.com/pytorch/pytorch/issues/29435

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D18453723

fbshipit-source-id: 3dc774f6b7da5cdf33deb6676d8612d21ed4b5a9
2019-11-12 13:16:38 -08:00
3b43cfde80 Benchmarking per channel quantization
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29627

Test Plan: Imported from OSS

Differential Revision: D18443929

Pulled By: z-a-f

fbshipit-source-id: a0345cc5e259b4ce98589252719b8885326d43a3
2019-11-12 11:33:42 -08:00
5db361bd32 Quantized interpolation benchmarks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29509

Test Plan: Imported from OSS

Differential Revision: D18415367

Pulled By: z-a-f

fbshipit-source-id: 84d0aaa81b131b49762edde6ade27e61acb99a42
2019-11-12 11:23:03 -08:00
9c9c361f67 Separate out pytorch_jni into pytorch_jni_jit and pytorch_jni_common. (#29617)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29617

As for internal build, we will use mobile interpreter instead of full jit, so we will need to separate the existing pytorch_jni.cpp into pytorch_jni_jit.cpp and pytorch_jni_common.cpp. pytorch_jni_common.cpp will be used both from pytorch_jni_jit.cpp(open_source) and future pytorch_jni_lite.cpp(internal).
ghstack-source-id: 93691214

Test Plan: buck build xplat/caffe2/android:pytorch

Reviewed By: dreiss

Differential Revision: D18387579

fbshipit-source-id: 26ab845c58a0959bc0fdf1a2b9a99f6ad6f2fc9c
2019-11-12 11:13:44 -08:00
f95e8ea1be Benchmarking quantized methods (#29625)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625

This PR also adds a template for benchmarking methods that require no input.

Test Plan: Imported from OSS

Differential Revision: D18443485

Pulled By: z-a-f

fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3
2019-11-12 11:08:55 -08:00
f111f1b1a7 Suppress implicit int-float conversion warning in ROCm build (#29604)
Summary:
```
c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion]
  return f < limit::lowest() || f > limit::max();
                                  ~ ^~~~~~~~~~~~
c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here
  if (!std::is_same<To, bool>::value && overflows<To, From>(f)) {
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604

Differential Revision: D18440713

Pulled By: bddppq

fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e
2019-11-12 10:44:28 -08:00
949d6ae184 Fix jit tracing namedtuple (#29477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29477

When passing in a namedtuple as trcing input, __clone_inputs will call into `torch.autograd.function._nested_map` and https://github.com/pytorch/pytorch/blob/593bb14/torch/autograd/function.py#L256 will run into error (because namedtuple doesn't support this style of constructor).
ghstack-source-id: 93586773

Differential Revision: D18405504

fbshipit-source-id: 8d0135cff0bdaaabcf6e06fac63df0f75c0c50b9
2019-11-12 10:38:20 -08:00
450949c7fe Complex support on GPU for dynamic casting (#29612)
Summary:
Currently, the dynamic casting mechanism is implemented assuming no support of complex on GPU. This will no longer be true in the soon future.

https://github.com/pytorch/pytorch/pull/29547 could clear some clang warning but the complex support on GPU is still not complete:
- fetch is not supported
- casting between complex64 and complex128 is not supported
- complex scalar types are not tested

This PR is what should be done for type promotion in order to add support to complex dtype on GPU, as suggested in https://github.com/pytorch/pytorch/issues/755#issuecomment-552631381

Note that what is newly added here in this PR is not tested due to the lack of basic support of complex dtypes (I can not construct a complex tensor). But his PR shouldn't break any existing part of PyTorch.

For the merge this PR, consider two options:
- We could merge this PR now so that dylanbespalko could conveniently work based on master, if there is something wrong here not found by code review, dylanbespalko would find when adding complex integration.
- Or, we could just leave this PR open, don't merge it. But then dylanbespalko might need to manually apply this to his branch in order to support type promotion of complex.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29612

Differential Revision: D18451061

Pulled By: ezyang

fbshipit-source-id: 6d4817e87f0cc2e844dc28c0355a7e53220933a6
2019-11-12 09:57:16 -08:00
7073ee2090 Enable test_distributed for ROCm but only with nccl backend
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28814

Differential Revision: D18437300

Pulled By: ezyang

fbshipit-source-id: bf1ab68e0fde683e0082f6c9fe2fc20e2bc8fc06
2019-11-12 07:52:30 -08:00
3b452ca428 quantized topk benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29505

Test Plan: Imported from OSS

Differential Revision: D18414851

Pulled By: z-a-f

fbshipit-source-id: 23999ef95c2f087066c4da36b2bf35516ebc0421
2019-11-12 00:33:47 -08:00
a0d4d5062b Quantized unary ops benchmarking (mostly template)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29503

Test Plan: Imported from OSS

Differential Revision: D18414589

Pulled By: z-a-f

fbshipit-source-id: ab5af490359b3e0a51642a46aef86f7be720deff
2019-11-11 23:48:36 -08:00
e651494d47 Updating submodules
Summary:
GitHub commits:

e27d9b5733

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 78ff76d3a979182d0f943bba85461ce80aa4b790
2019-11-11 23:26:47 -08:00
2fb4059652 change drop_on_export warning category (#29610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29610

`DeprecationWarning` is intended for developers (and so is ignored in
certain circumstances). `FutureWarning` is the user-facing deprecation
warning. This fixes fbcode failures.

Test Plan: Imported from OSS

Differential Revision: D18446393

Pulled By: suo

fbshipit-source-id: ded11a007f0a62132a9839b733157a97cf9006e9
2019-11-11 23:24:27 -08:00
bbff06ee96 Convert conv_prepack to conv2d_prepack and conv_unpack to conv2d_unpack (#29529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529

Pull Request resolved: https://github.com/pytorch/glow/pull/3771

We would like to replace `conv_prepack` with `conv2d_prepack` and  `conv_unpack` with `conv2d_unpack`.

This makes the naming consistent between 2D and 3D conv:
```
torch.ops.quantized.conv2d_prepack
torch.ops.quantized.conv2d_unpack
torch.ops.quantized.conv2d
torch.ops.quantized.conv3d_prepack
torch.ops.quantized.conv3d_unpack
torch.ops.quantized.conv3d
```

We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering.

The replacement bash command is as the follows:
```
find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \;
find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \;
```
ghstack-source-id: 93661879

Test Plan: CI

Reviewed By: jackm321

Differential Revision: D18421079

fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12
2019-11-11 21:54:10 -08:00
2acca09e1a Add Support for ONNX scripting Interpolate with missing shape (#29489)
Summary:
- Add support for missing case where interpolate is exported with missing shape information in scripting
- Add warnings
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29489

Reviewed By: hl475

Differential Revision: D18438872

Pulled By: houseroad

fbshipit-source-id: d01f833bec0cc4e881ddc18e7054d22f54e9886b
2019-11-11 21:20:14 -08:00
8db06732bf Updating submodules
Summary:
GitHub commits:

a5381c4d13
a19de78da5
b5a7d0259c
8c4e217115

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 26a91452c36caab109dad713fb04b71551f36a90
2019-11-11 19:12:55 -08:00
0c9e672727 Apply the latest master docker images(jni.h in every image) (#29588)
Summary:
Applying the latest docker images from master: (the latest PR of docker images: https://github.com/pytorch/pytorch-ci-dockerfiles/pull/2 )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29588

Differential Revision: D18442848

Pulled By: IvanKobzarev

fbshipit-source-id: bcb9cca54632d1e83f1b922ebb267b1122c1f56e
2019-11-11 18:41:38 -08:00
8b53515b8a Add ONNX Export Support for torch.scalar_tensor (#28713)
Summary:
Support exporting torch.scalar_tensor() to ONNX.
This will allow making operations on dynamic scalars (like x.size(dim) where x is a tensor of dynamic shape) and exporting them to ONNX.

This is a dummy example of operations that could not be exported dynamically before this PR:

```
size_x = x.size(0)
size_y = y.size(0)
size_x_y_static = torch.tensor([size_x , size_y])  # size_x_y_static is traced as constant

size_x = torch.scalar_tensor(size_x).unsqueeze(0)
size_y = torch.scalar_tensor(size_y).unsqueeze(0)
size_x_y_dynamic = torch.cat((size_x , size_y))  # size_x_y_dynamic is dynamic and depends on x and y's size
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28713

Reviewed By: hl475

Differential Revision: D18438880

Pulled By: houseroad

fbshipit-source-id: c1651e480a41602c7c7452ffc4acba40a2b3827c
2019-11-11 18:27:49 -08:00
5249c43d93 Disable android gradle jobs (#29606)
Summary:
# disabled until fixing https://github.com/pytorch/pytorch/issues/29159
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29606

Differential Revision: D18443452

Pulled By: IvanKobzarev

fbshipit-source-id: 5a12d7b3d214203037e78552b6289752ac1b8192
2019-11-11 18:27:44 -08:00
fb07098e2b Creating a base benchmarking class for activations.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29182

Test Plan: Imported from OSS

Differential Revision: D18319456

Pulled By: z-a-f

fbshipit-source-id: d2314bb30a584551b5f1c8610b36c4c10c27ac85
2019-11-11 18:24:44 -08:00
7df854bddd explicitly provide memory format when calling to clone() at prune.py (#29593)
Summary:
Currently clone() has parameter memory_format with default value as Contiguous.
In the future it will be changed to different default memory format - Preserve.
To avoid any potencial issues, specify memory_format explicitly
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29593

Differential Revision: D18439783

Pulled By: ifedan

fbshipit-source-id: e7ed6c19ee227990214d44c562c26a7250981324
2019-11-11 18:07:06 -08:00
bf61405ed6 explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29387

Test Plan: Imported from OSS

Differential Revision: D18429729

Pulled By: VitalyFedyunin

fbshipit-source-id: c71264ed5d64ed7e5d8ea907413b6b8e7b67769a
2019-11-11 17:57:34 -08:00
8df602400b explicitly provide memory format when calling to *_like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29386

Test Plan: Imported from OSS

Differential Revision: D18429727

Pulled By: VitalyFedyunin

fbshipit-source-id: 2e9d72d9168a81f7d7cc8f07d3be3a6480faec52
2019-11-11 17:57:30 -08:00
858b2010ae Updating submodules
Summary:
GitHub commits:

3f47103c72
72f73d40d8
5082d158b3
03ce7fb292
f0d0e0dc38
807685d4eb

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 30634d39f7f50212793d7abf3c0488c8822e17f5
2019-11-11 17:37:44 -08:00
1bb5209f7e Back out "Revert D18299298: [pytorch][PR] Migrate conv3d from TH to ATen (CPU)" (#29286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29286

Original commit changeset: 33057d5a91d1
ghstack-source-id: 93638554

Test Plan: sandcastle and ossci

Differential Revision: D18349945

fbshipit-source-id: 9d9ddb0c185248a2073ade1063bb69ffbfa48b46
2019-11-11 17:33:14 -08:00
ddeeb561c3 Revoking mutually exclusive requirement on channels last and contiguous tensor (#28466)
Summary:
The old implementation assumed `is_channels_last_contiguous_` to be mutually
exclusive to `is_contiguous_`, which is not true.
Properly set the flag by checking strides.

Original Pull Request resolved: https://github.com/pytorch/pytorch/pull/24113
Original GitHub Author: jjsjann123 <jiej@nvidia.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28466

Differential Revision: D16860715

Pulled By: VitalyFedyunin

fbshipit-source-id: facd19d3501b6566d77c46199567e0cd051a6b49
2019-11-11 17:29:39 -08:00
70f886ffa4 Revert D18253777: Remove observer module after insert_quant_dequant
Test Plan: revert-hammer

Differential Revision:
D18253777

Original commit changeset: 26081c4c3fd3

fbshipit-source-id: 88f330c34976030c9310e7982fa6ae74e093ebbf
2019-11-11 17:09:58 -08:00
af3468a1c7 change op bench input shape to reduce execution time (#29616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616

1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here.
2. Chang the input shapes of several ops

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
200 256.044864655
400 165.850520134
800 163.579881191
1600 162.871927023
3200 160.3128016
# Mode: Eager
# Name: add_cpu_M64_K64_bwd1_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 164.715

# Benchmarking PyTorch: add
200 170.650482178
400 168.895125389
800 169.867575169
1600 163.400024176
3200 168.658420444
# Mode: Eager
# Name: add_cpu_M64_K64_bwd2_N64
# Input: device: cpu, K: 64, M: 64, N: 64
Backward Execution Time (us) : 168.777

Reviewed By: hl475

Differential Revision: D18438540

fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2
2019-11-11 16:58:27 -08:00
7374dd0d52 remove SkipInputShape flag (#29615)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615

Remove that flag as it's not needed any more.

Test Plan: na

Reviewed By: hl475

Differential Revision: D18440271

fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2
2019-11-11 16:56:40 -08:00
fdcb203e8e Identify weights and bias by argument position in aten call (#29147)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29147

Previously we use a vector of weight and bias to record the values of weight/bias and we
assume we'll get them by GetAttr nodes, then we propagate these values through the function calls

However, it doesn't work if we also do some transformations on these values right now, we'll need
to mark all the values that's produced by weight/bias as weight/bias, e.g.
```
%w = GetAttr[name="weight"](%conv)
%wt = aten::transpose(%w)
%r = aten::conv2d(..., %wt, ...)
```
we'll mark both %w and %wt as weight. This is a bit over compilicated to support this.

Alternatively, we can identify weights by argument positions, e.g.
for call %r = aten::conv2d(..., %w, ...), we know the argument 1 is weight, argument 2 is bias.

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18362839

fbshipit-source-id: afbf07f48bab8d01c5be1c882561a0255730a6b9
2019-11-11 16:40:56 -08:00
587996ef04 Remove observer module after insert_quant_dequant (#28985)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28985

Remove the observer module in the quantized model

Test Plan:
python test/test_jit.py 'TestJit.test_insert_quant_dequant'

Imported from OSS

Differential Revision: D18253777

fbshipit-source-id: 26081c4c3fd3dc049cafa8c0383219bc4c233589
2019-11-11 16:31:01 -08:00
81116fd7cd Updating submodules
Summary:
GitHub commits:

2bdb5a4a7c
dfd5219816
66f868b745
0c4130d051
c912150192
c17384fea4
e0b2156829
7aef78fb2e

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 82552466afa665f0e335d5dce385dfcae9247b0b
2019-11-11 16:18:18 -08:00
a9308f9d8b py2 fix
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29613

Test Plan: Imported from OSS

Differential Revision: D18440212

Pulled By: suo

fbshipit-source-id: 4e25a599ea2c649d0b6b4531da5df9b00e7f6380
2019-11-11 16:15:51 -08:00
b5a38fa98e update op bench readme (#29596)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596

as title

Test Plan: na

Reviewed By: hl475

Differential Revision: D18437811

fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c
2019-11-11 15:33:29 -08:00
a09197561e correctly share types between traced modules (#29583)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29583

The normal flow for type sharing assumes that we will construct the
`ConcreteModuleType`, then use `operator==` to decide whether or not to
reuse an existing JIT type. In this case, `jitType_` is not populated,
so it doesn't make sense to compare it.

However, there is one exception to this flow: for traced modules, we
pre-compute the JIT type and poke it into the `ConcreteModuleType`
manually. To handle this case, we should compare the `jitType_`s in
`operator==` like everything else.

Test Plan: Imported from OSS

Differential Revision: D18435949

Pulled By: suo

fbshipit-source-id: 44b7672a686015aaf02f6664c6aff00e165fde65
2019-11-11 15:01:35 -08:00
1a9e5dad81 Improve ConcreteModuleType::dump() (#29582)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29582

Give it more info, fix a segfault

Test Plan: Imported from OSS

Differential Revision: D18435950

Pulled By: suo

fbshipit-source-id: 43c695ffe1f13f33df69c6e51caa531f8b993208
2019-11-11 15:01:31 -08:00
00c224f0f2 move quantized tests from benchmark_all_test to benchmark_all_quantized_test (#29590)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29590

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iteration 1
Parsing buck files: finished in 1.0 sec
Creating action graph: finished in 43.0 sec
Building: finished in 16.0 sec (100%) 10053/10053 jobs, 1 updated
  Total time: 01:00.0 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 45419.667
...

buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test
Parsing buck files: finished in 1.0 sec
Building: finished in 6.0 sec (100%) 10053/10053 jobs, 1 updated
  Total time: 7.0 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: QReLU
# Mode: Eager
# Name: QReLU_dims(1,)_permute_dimsFalse_inplaceFalse_dtypetorch.quint8
# Input: dims: (1,), permute_dims: False, inplace: False, dtype: torch.quint8
Forward Execution Time (us) : 137.685
...

Reviewed By: hl475

Differential Revision: D18436727

fbshipit-source-id: 317ec0e4bd2a6e33c9a60830f01ed805ae412449
2019-11-11 14:59:29 -08:00
137eea5938 change module_name in chunk_test (#29589)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29589

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:chunk_test  -- --iteration 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M256_N512_chunks2_cpu
# Input: M: 256, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 148.345

# Benchmarking PyTorch: chunk
# Mode: Eager
# Name: chunk_M512_N512_chunks2_cpu
# Input: M: 512, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 125.239

Reviewed By: hl475

Differential Revision: D18436532

fbshipit-source-id: e7100f4605471e27703b2e2e863b971a93229854
2019-11-11 14:59:24 -08:00
6104f4e37c reduce input shapes for matmul (#29587)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29587

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:matmul_test  -- --iteration 1

Reviewed By: hl475

Differential Revision: D18436317

fbshipit-source-id: 564143edc3d4400bcfafa0da11b7479562661b0c
2019-11-11 14:59:20 -08:00
0e5299a441 fix list_ops and list_tests (#29586)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29586

This diff is fixing the list_ops and list_tests issues caused by D18412342.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_tes
t  -- --iteration 1 --list_tests
Parsing buck files: finished in 0.9 sec
Creating action graph: finished in 37.2 sec
Building: finished in 15.9 sec (100%) 10053/10053 jobs, 1 updated
  Total time: 54.0 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# List of tests:
# add_M8_N2_K1_cpu
# add_M8_N2_K8_cpu
..

buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test  -- --iteration 1 --list_ops
Parsing buck files: finished in 1.0 sec
Building: finished in 5.3 sec (100%) 10053/10053 jobs, 0 updated
  Total time: 6.3 sec
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# List of Operators to run:
# add
# batchnorm
# cat
# chunks
# Conv1d
# ConvTranspose1d
...

Reviewed By: hl475

Differential Revision: D18435994

fbshipit-source-id: 89ecfd55339b6e7687cdf8d90433d4767252e09f
2019-11-11 14:59:16 -08:00
85752df4a1 reduce conv_test input shapes (#29580)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29580

The input shapes of Conv benchmark generates too many tests which could took  40+GB memory. This diff reduces the input shapes to fix that issue.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:conv_test  -- --iteration 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: Conv3d
# Mode: Eager
# Name: Conv3d_in_c64_out_c64_kernel3_stride1_N8_D4_H16_W16_cpu
# Input: in_c: 64, out_c: 64, kernel: 3, stride: 1, N: 8, D: 4, H: 16, W: 16, device: cpu
Forward Execution Time (us) : 383376.096

Reviewed By: hl475

Differential Revision: D18434627

fbshipit-source-id: a91a239394b034ff7b42e1b09e2f744a8ad671e9
2019-11-11 14:59:11 -08:00
01ad2bc5da Improving BinaryOpsKernel.cu (#29428)
Summary:
- Building `BinaryOpsKernel.cu` takes extremely long. Split the original file into 3 pieces, and copy-paste code into these files.
- Remove some useless logic
- change some wrong ops name `*_cpu` -> `*_cuda`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29428

Differential Revision: D18408858

Pulled By: VitalyFedyunin

fbshipit-source-id: 29323b0bc40a928ae698345ad1ffe46c5851b012
2019-11-11 14:45:26 -08:00
6bfa7c0471 FakeQuantize benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29507

Test Plan: Imported from OSS

Differential Revision: D18415084

Pulled By: z-a-f

fbshipit-source-id: f758e45d5178ee5f80157772ab701a69f074a78b
2019-11-11 14:41:58 -08:00
627f2823e0 remove _register_* bindings from python (#29499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29499

This changes how DataParallel and trace module creation works so that
we no longer need to mutate Module class after it has been created.

The only remaining usage of register_* functions are now inside C++
tests.

Test Plan: Imported from OSS

Differential Revision: D18413652

Pulled By: zdevito

fbshipit-source-id: f039e5400cd016632768be4547892f6a69645c20
2019-11-11 13:52:46 -08:00
4e4e29a511 Simplify ScriptModule bindings. (#29432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29432

This removes a lot of the private methods on torch._C.ScriptModule,
and instead implements functionality in terms of slot_dict_impl views
to implement _parameter, _buffers, and _modules in nn.Module.

A followup PR should also remove the _register_attribute,
_register_module, and _register_parameter methods, but this requires
more refactoring of the way tracing creates modules and replication
for data parallel works.

Test Plan: Imported from OSS

Differential Revision: D18387963

Pulled By: zdevito

fbshipit-source-id: f10d47afeb30c1e05d704ae5ac4166830933125c
2019-11-11 13:52:36 -08:00
5b702ab52b switching to a simple/full executor
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29230

Differential Revision: D18402229

Pulled By: Krovatkin

fbshipit-source-id: 62f4bc9bc89c0c7369359bba1359c22a2fa80f46
2019-11-11 13:41:35 -08:00
cedca377bd Re-enable TestNamedTensor.test_big_tensor_repr (#29407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29407

Fixes https://github.com/pytorch/pytorch/issues/27753.

The bug was that random tensors print subtly differently. This causes
the "names=" tag to appear in slightly different places; sometimes it is
on the same line as the data, sometimes it is on different lines.

For this test, we wanted to know the following:
- printing a big named tensor's repr doesn't crash
- a big named tensor's repr shows the names

This PR changes the test to check those two things.

Test Plan: - run existing tests

Differential Revision: D18428657

Pulled By: zou3519

fbshipit-source-id: 6bcf247ffba010520878a175e766a496028f87d9
2019-11-11 13:32:32 -08:00
b3b8f522e8 Disabling 'contig' in quantized arithmetic test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29576

Test Plan: Imported from OSS

Differential Revision: D18433052

Pulled By: z-a-f

fbshipit-source-id: 8082303faa368646ef6370b6cf348275526fd33b
2019-11-11 13:30:13 -08:00
5b43becfc5 per-tensor quantize/dequantize benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29506

Test Plan: Imported from OSS

Differential Revision: D18415017

Pulled By: z-a-f

fbshipit-source-id: 92a50706aafabdcaa79dd1f226f7f4ac63606c74
2019-11-11 13:19:46 -08:00
c49b324cbf Enable test_stress_light_rpc in rpc_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29473

Test Plan: Imported from OSS

Differential Revision: D18404820

Pulled By: mrshenli

fbshipit-source-id: de0f18db208d83794507c162483bb948056af533
2019-11-11 12:22:10 -08:00
bb90c18791 Enable test_py_rref_args_user_share in rpc_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29472

Test Plan: Imported from OSS

Differential Revision: D18404818

Pulled By: mrshenli

fbshipit-source-id: 1fcd19b178dc20540a210601cbb2c974be14a7cc
2019-11-11 12:22:05 -08:00
b885eff4be Enable test_multi_py_udf_remote in rpc_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29471

Test Plan: Imported from OSS

Differential Revision: D18404819

Pulled By: mrshenli

fbshipit-source-id: 8cf3e32d7980e34c48bfd8fb61cfd9a0acc9bd46
2019-11-11 12:22:01 -08:00
bc4457f5b6 Enable test_py_built_in in rpc_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29470

Test Plan: Imported from OSS

Differential Revision: D18404822

Pulled By: mrshenli

fbshipit-source-id: 01cb87dee39c3579a2e0961d67b627ca1dc87fc2
2019-11-11 12:21:56 -08:00
93b5c9d723 Allow to create local RRef with value (#28948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28948

Add the constructor RRef(value) in python. This allows to wrap a local object with RRef an pass or return this RRef to users.
This enables returning, for example, a list of RRefs containing the parameters of a module to the user of the module.
ghstack-source-id: 93565010

Test Plan: unit test.

Differential Revision: D18241227

fbshipit-source-id: b9e9b958f40623348d62ee6fc9e7f0414b4215b7
2019-11-11 12:19:45 -08:00
17b0ab4727 Add python API for get_gradients() method. (#28926)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28926

The get_gradients method was a pybind only method without any
documentation for this method for users.

I've moved this method to our python distributed autograd API and ensured that
we have appropriate docs for this method.
ghstack-source-id: 93558845

Test Plan: waitforbuildbot

Differential Revision: D18234443

fbshipit-source-id: 317267d8c2416da75afd3f9d900a3cd74bb78dfb
2019-11-11 12:19:41 -08:00
9276cd449d qadaptive_avgpool2d benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29274

Test Plan: Imported from OSS

Differential Revision: D18343569

Pulled By: z-a-f

fbshipit-source-id: e5ab9c79965caa59a8e17069e70304c01be46104
2019-11-11 12:17:44 -08:00
b0cf43b2dd Simple distributed optimizer (#29304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29304

Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized.
It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel.
ghstack-source-id: 93564364

Test Plan: unit tests.

Differential Revision: D18354586

fbshipit-source-id: 85d4c8bfec4aa38d2863cda704d024692511cff5
2019-11-11 12:02:24 -08:00
604fc9ec41 F::embedding, F::embedding_bag, moved Embedding and EmbeddingBag options to embedding.h in options
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28669

Differential Revision: D18377609

Pulled By: anjali411

fbshipit-source-id: 6a2c547368849ebd1a2f8828cfbe7252152b26a2
2019-11-11 11:51:26 -08:00
65f3b98c35 explicitly provide memory format when calling to clone() at ProcessGroupGloo.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28688

Test Plan: Imported from OSS

Differential Revision: D18333382

Pulled By: ifedan

fbshipit-source-id: b698b647eaa1e318210f445c864d6333e7d46a15
2019-11-11 11:48:53 -08:00
310343e946 Properly shutdown RPC even in the case of clean_shutdown=False. (#29148)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29148

We would skip rpc.join_rpc() in the case of `clean_shutdown=False`.
This would exit the process without properly cleaning up the local RPCAgent
resulting in a crash.

As a result, to fix this we still call rpc.join_rpc() even in an unclean
shutdown. Note that, rpc.join_rpc() needs to be replaced with a local
`shutdown` call eventually since we need a way to shutdown the local RPC agent
properly.

Test Plan: waitforbuildbot

Reviewed By: xush6528

Differential Revision: D18306941

fbshipit-source-id: 2685db3924f7aa4516f3b28f58d6c127bcd55ba9
2019-11-11 11:30:48 -08:00
e01fc56ecb move type inference for arange into c++ (#27629)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/17662

I'm not sure if `arange` needs to be in python_arg_parser at all, given the schemas in native_functions.yaml. In any case this at least fixes the dytpe mismatch.

In follow up PRs I will try to handle some of the other ops that do type inference at the python level, like randint.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27629

Differential Revision: D17885939

Pulled By: eellison

fbshipit-source-id: f97a8bc722b7ab77de1c42a992e49a4a3175ad60
2019-11-11 11:26:21 -08:00
9de0b63554 Updating submodules
Summary:
GitHub commits:

cce78c4f99
0fced3e95c
79505f3059

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: bef569172d04f781b068e86c5246cf55dbde0321
2019-11-11 11:22:15 -08:00
72eff0021e Declare Dimname's kWildcard as extern instead of static (#29384)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27627

The variable being declared in a header as `static` meant that the global variable is initialized in every source file that includes it. This is particularly problematic when included in AVX source files as it generates SIGILL on older hardware.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29384

Differential Revision: D18380379

Pulled By: zou3519

fbshipit-source-id: 0dcd87db01c468a5c9ddb2c695528b85ed2e1504
2019-11-11 10:14:16 -08:00
344e7c26c4 Delete USE_CUDA macro use from data_parallel.h (#29483)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29483

Somehow, these macros were not necessary!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18427851

Pulled By: ezyang

fbshipit-source-id: 86e1d75d98342461c9a5afa1c30c14346188f7cc
2019-11-11 09:21:12 -08:00
b141754b7f Give a better error message when people accidentally use unsupported devices (#29409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29409

Fixes #27875

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18396828

Pulled By: ezyang

fbshipit-source-id: 3f53cbbe620cd3445852273be90ff5744aa7a8cb
2019-11-11 08:10:53 -08:00
bb119d957e Move torch.cuda's atfork handler into C++ (#29101)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/23401

We cannot rely on `multiprocessing.util.register_after_fork` since it is only
called for processes created by the `multiprocessing` module and not `os.fork()`.

Moving to `pthread_atfork` does always get called. However, I don't think it's safe to call python functions inside of the `atfork` handler so the python code has to be a bit more careful when checking `_initialized`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29101

Differential Revision: D18355451

Pulled By: ezyang

fbshipit-source-id: 4d4253a3669796212c099dad4e5bdfdb0df40469
2019-11-11 07:34:27 -08:00
be757957ba Support softmax with D == 0 (#29167)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29167

As titled.

This fix is crucial as multi_channel splitting would create history that has no items (i.e., D == 0), which leads to flow failure.

Test Plan:
Unittest

flow test:

before fix: f148783160

after fix: f149082299

buck test mode/dev-nosan caffe2/caffe2/python/operator_test:softmax_ops_test

Reviewed By: xianjiec

Differential Revision: D18296081

fbshipit-source-id: e0bb2dc2c4e5b465e213f31e5c5ced3a7e1fd574
2019-11-11 00:46:10 -08:00
23483406aa Fix missing space in lr_scheduler warning msg
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29527

Differential Revision: D18422662

Pulled By: ngimel

fbshipit-source-id: 80191232ee0b639274ba3561e0d89ddcb40434e7
2019-11-10 22:51:35 -08:00
3e5af22650 Disable flaky RPC tests (#29485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29485

The flakiness is likely due to the problem with OMP and fork. We
should disable fork tests for good, but that would have negative
impact on internal test coverage. This commit disables the most
buggy nested tests for now, until we find a way to turn fork test
off.

Test Plan: Imported from OSS

Differential Revision: D18407529

Pulled By: mrshenli

fbshipit-source-id: dcbe49a9d104fcf1eaf83107d58904d49dc18aff
2019-11-10 21:33:27 -08:00
f6f428b675 Make smoke tests depend on s3 html update, to avoid race condition. (#29481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29481

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18422265

Pulled By: ezyang

fbshipit-source-id: b483cd5f688676444c83174a38c99cb1777a60b0
2019-11-10 19:08:50 -08:00
46c4ae5719 Fix BC CI (#29533)
Summary:
skip _nnpack_spatial_convolution for now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29533

Reviewed By: hl475

Differential Revision: D18421636

Pulled By: houseroad

fbshipit-source-id: 74ceaa753cf2faa16db89ea028fe275497b673c1
2019-11-10 16:04:01 -08:00
466ab93ef5 Revert D18286473: Use NNPACK for strided convolutions.
Test Plan: revert-hammer

Differential Revision:
D18286473

Original commit changeset: accdfafa2c24

fbshipit-source-id: dc1347eb2738009c7f44699fc46b6cb80c54e2e3
2019-11-10 08:11:11 -08:00
2032482eb9 Use handle pool to manage cuparse handles (#29426)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29352

The newly added test fails consistently with illegal memory access without this PR, and now it succeeds consistently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29426

Differential Revision: D18407784

Pulled By: ngimel

fbshipit-source-id: 6cabb9a6674c25f7d7a3dc7b3bac99002018d8ee
2019-11-09 23:12:34 -08:00
5c9eae075f qavgpool benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29268

Test Plan: Imported from OSS

Differential Revision: D18342589

Pulled By: z-a-f

fbshipit-source-id: cc6f0153a927672e0831200b58f5413c7db7bdb1
2019-11-09 22:30:24 -08:00
958d0cd4df Adding short tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29257

Test Plan: Imported from OSS

Differential Revision: D18340536

Pulled By: z-a-f

fbshipit-source-id: dce470fd0c7ef9c6f639de40f7e0713b335408d1
2019-11-09 21:33:41 -08:00
5ba9209755 Use NNPACK for strided convolutions. (#29084)
Summary:
Use NNPACK for strided convolutions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29084

Differential Revision: D18286473

Pulled By: AshkanAliabadi

fbshipit-source-id: accdfafa2c247f2750208a7af84c9e2c0374920b
2019-11-09 21:21:55 -08:00
cc6af45944 Fix writeable strings warnings.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29512

Differential Revision: D18417715

Pulled By: mruberry

fbshipit-source-id: 7029f0a73bcdf0ce8594b90b6f5af8be4e8b5e02
2019-11-09 21:16:35 -08:00
86fee25d99 nll_loss (cpu): Simplify index checking: rely on exception propagation in parallel_for (#29454)
Summary:
Replace the custom thread-safe invalid index checking and instead rely on the internal exception propagation of parallel_for. Use the `TORCH_CHECK_INDEX` macro when checking indices.

Align index check in `nll_loss` implementation with `nll_loss2d`, see https://github.com/pytorch/pytorch/issues/28304.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29454

Differential Revision: D18418169

Pulled By: ezyang

fbshipit-source-id: 273da5230dd4b66a51bf02386718b31d2dd41e66
2019-11-09 20:23:30 -08:00
c7ed89cf65 Migrate nll_loss2d from TH to ATen (CPU) (#28304)
Summary:
Added check for indicies in Reduction::None case.

### Benchmark results

Note: Due to the size of the input tensors this time the random number generation is responsible for a significant portion of the total time. It is better to look at the individual net time-outputs (which do not include the input preparation).
Script used for benchmark.: [nnl_loss2d_benchmark.py](https://gist.github.com/andreaskoepf/5864aa91e243317cb282c1e7fe576e1b)

#### WITH PR applied
```
using reduction:  none
CPU forward 1000 took 7.916500908322632e-05
CPU forward 10000 took 0.0002642290201038122
CPU forward 100000 took 0.003828087996225804
CPU forward 1000000 took 0.037140720000024885
CPU forward 10000000 took 0.33387596398824826
CPU forward TOTAL time 7.218988707987592

using reduction:  mean
CPU forward 1000 took 9.165197843685746e-05
CPU forward 10000 took 0.0005258890159893781
CPU forward 100000 took 0.0050761590246111155
CPU forward 1000000 took 0.047345594997750595
CPU forward 10000000 took 0.4790863030066248
CPU forward TOTAL time 7.9106070210109465
CPU for- & backward 1000 took 0.0005489500181283802
CPU for- & backward 10000 took 0.0015284279943443835
CPU for- & backward 100000 took 0.015138130984269083
CPU for- & backward 1000000 took 0.15741890601930209
CPU for- & backward 10000000 took 1.6703072849777527
CPU for- & backward TOTAL time 9.555764263990568

using reduction:  sum
CPU forward 1000 took 8.789298590272665e-05
CPU forward 10000 took 0.000514078012201935
CPU forward 100000 took 0.005135576997417957
CPU forward 1000000 took 0.04715992201818153
CPU forward 10000000 took 0.4821214270195924
CPU forward TOTAL time 7.9119505700073205
CPU for- & backward 1000 took 0.00047759301378391683
CPU for- & backward 10000 took 0.0015945070190355182
CPU for- & backward 100000 took 0.018208994006272405
CPU for- & backward 1000000 took 0.15904426100314595
CPU for- & backward 10000000 took 1.5679037219961174
CPU for- & backward TOTAL time 9.495157692988869
```

#### WITHOUT original TH impl
```
using reduction:  none
CPU forward 1000 took 0.0003981560003012419
CPU forward 10000 took 0.0035912430030293763
CPU forward 100000 took 0.035353766987100244
CPU forward 1000000 took 0.3428319719969295
CPU forward 10000000 took 3.364342701010173
CPU forward TOTAL time 11.166179805004504

using reduction:  mean
CPU forward 1000 took 8.63690220285207e-05
CPU forward 10000 took 0.0004704220045823604
CPU forward 100000 took 0.0045734510058537126
CPU forward 1000000 took 0.046232511987909675
CPU forward 10000000 took 0.4191019559802953
CPU forward TOTAL time 7.846049971994944
CPU for- & backward 1000 took 0.0005974550149403512
CPU for- & backward 10000 took 0.0014057719963602722
CPU for- & backward 100000 took 0.013776941981632262
CPU for- & backward 1000000 took 0.13876214998890646
CPU for- & backward 10000000 took 1.3666698939923663
CPU for- & backward TOTAL time 9.10526105100871

using reduction:  sum
CPU forward 1000 took 7.598899537697434e-05
CPU forward 10000 took 0.00046885499614290893
CPU forward 100000 took 0.0044489419960882515
CPU forward 1000000 took 0.04495517900795676
CPU forward 10000000 took 0.418376043002354
CPU forward TOTAL time 7.789334400993539
CPU for- & backward 1000 took 0.0004464260127861053
CPU for- & backward 10000 took 0.0017732900159899145
CPU for- & backward 100000 took 0.01626713399309665
CPU for- & backward 1000000 took 0.11790941300569102
CPU for- & backward 10000000 took 1.4346664609911386
CPU for- & backward TOTAL time 9.294745502003934
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28304

Differential Revision: D18350157

Pulled By: ezyang

fbshipit-source-id: e9437debe51386a483f4265193c475cdc90b28e4
2019-11-09 18:31:20 -08:00
a47fe40729 qpool benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29250

Test Plan: Imported from OSS

Differential Revision: D18339142

Pulled By: z-a-f

fbshipit-source-id: 1d2a3dda15ab300ffa63719158a4788b7fb17df5
2019-11-09 17:52:31 -08:00
aa658a2a68 Adding inplace quantized relu6
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29245

Test Plan: Imported from OSS

Differential Revision: D18334541

Pulled By: z-a-f

fbshipit-source-id: 25b12cc88ee81434d96cf5c44c008c6f85da0673
2019-11-09 14:53:42 -08:00
4874120804 Added all binary arithmetic tests in QFunctional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29424

Test Plan: Imported from OSS

Differential Revision: D18385689

Pulled By: z-a-f

fbshipit-source-id: 5947e0edfcbe2b6eba984dc9da187e9fce5cd40f
2019-11-09 14:49:57 -08:00
687ea7460a quantized comparators benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29437

Test Plan: Imported from OSS

Differential Revision: D18389909

Pulled By: z-a-f

fbshipit-source-id: e007b50fc3905747f0e0a70ab438b790e63b023e
2019-11-09 14:23:41 -08:00
fb2eb01955 qadd benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29420

Test Plan: Imported from OSS

Differential Revision: D18383402

Pulled By: z-a-f

fbshipit-source-id: 8ea2f689b7df676ffb8adef0cbb058a7a2123938
2019-11-09 14:20:28 -08:00
f5074ccafe set the no_deadline for the adaptive_avg_pool_nhwc test (#29502)
Summary:
It is reported this test is flaky due to the time expiration. This pr flags it as no_deadline test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29502

Differential Revision: D18416632

Pulled By: lly-zero-one

fbshipit-source-id: 27cd7b28139f3f16ee0cf5802a0709385719d487
2019-11-09 09:30:46 -08:00
6c020673c9 Migrate acos from TH to ATen (CUDA) (#29323)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29323

Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):

```python
import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```

Before:

```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```

After:

```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```

Close #24532

Test Plan: Imported from OSS

Differential Revision: D18406806

Pulled By: VitalyFedyunin

fbshipit-source-id: 2d012485f4747fae0ddbcf2e08b1d75ef5274a19
2019-11-09 09:11:53 -08:00
ebfe846ad2 Clean up many unused declaration/definitions in TH
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29046

Test Plan: Imported from OSS

Differential Revision: D18302767

Pulled By: VitalyFedyunin

fbshipit-source-id: 65f4df515426274b92f4405ed7aad44bd1c9141e
2019-11-09 09:11:49 -08:00
4606deb2be Migrate frac from TH to ATen (CUDA) (#28953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28953

Close #24566

Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc
7.4):

```python
import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```

Before:

```
torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3608182370007853
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3647012189976522
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.3889585220022127
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.622635444997286
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9595754649999435
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5590267750012572
```

After:

```
torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3675256470014574
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3703597319981782
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.372184894993552
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.60767333900003
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9645607889979146
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5542530329985311
```

Test Plan: Imported from OSS

Differential Revision: D18302768

Pulled By: VitalyFedyunin

fbshipit-source-id: 24198838dc903d455155f0819d0c7d58974aaecd
2019-11-09 09:11:45 -08:00
d00579da93 Revert D18399922: Switch XLA to only override abstract functions
Test Plan: revert-hammer

Differential Revision:
D18399922

Original commit changeset: b01761673f51

fbshipit-source-id: 2e19ca58f93dd05be3c3a2a125a154d8288db672
2019-11-09 00:08:38 -08:00
cb74ede59e Pass F::*FuncOptions instead of torch::nn::*Options to functionals, and make F::*FuncOptions a different class when necessary (#29364)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29364

Currently, we use `torch::nn::*Options` both as module options and functional options. However, this makes it very hard to manage the parameters in `torch::nn::*Options`, because a module's constructor can take a different set of arguments than the module's equivalent functional (e.g. `torch.nn.BatchNorm1d` takes `num_features, eps=1e-5, momentum=0.1, affine=True,
track_running_stats=True`, while `F::batch_norm` takes `running_mean, running_var, weight=None, bias=None, training=False, momentum=0.1, eps=1e-5`).

This PR resolves the above problem by making `F::*FuncOptions` a different class from `torch::nn::*Options` when necessary (i.e. when a module's constructor takes a different set of arguments than the module's equivalent functional). In the rest of the cases where the module constructor takes the same set of arguments as the module's equivalent functional, `F::*FuncOptions` is an alias of `torch::nn::*Options`.

Also as part of this PR, we change all functional options to pass-by-value, to make the semantics consistent across all functionals.

Test Plan: Imported from OSS

Differential Revision: D18376977

Pulled By: yf225

fbshipit-source-id: 8d9c240d93bfd5af0165b6884fdc912476b1d06b
2019-11-08 22:38:21 -08:00
5c29160c7c Switch XLA to only override abstract functions (#29438)
Summary:
This is a followup of li-roy 's work https://github.com/pytorch/pytorch/pull/23282. (I messed up the rebase there :(

After https://github.com/pytorch/xla/issues/1225 is done we are good to move the integration to only override abstract functions.

This PR contains a TODO which I'll remove in next 2 followup PRs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29438

Reviewed By: ljk53

Differential Revision: D18399922

Pulled By: ailzhang

fbshipit-source-id: b01761673f519dfb240681180d3f18a4518273ca
2019-11-08 22:33:09 -08:00
f31d6c70fe reduce op bench binary size (#29496)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496

This diff reduces the binary size of op benchmark by avoiding creating all tests at once.

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : long

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K1_cpu
# Input: M: 8, N: 2, K: 1, device: cpu
Forward Execution Time (us) : 160.781

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N2_K8_cpu
# Input: M: 8, N: 2, K: 8, device: cpu
Forward Execution Time (us) : 158.941

Reviewed By: hl475

Differential Revision: D18412342

fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae
2019-11-08 22:15:12 -08:00
8e8a5e0664 Pruning Functionality (#24076)
Summary:
Provides implementation for feature request issue https://github.com/pytorch/pytorch/issues/20402.

Adds pruning functionalities (structured and unstructured, local and global, as well as pruning from user-provided mask).

Associated tutorial here: https://github.com/pytorch/tutorials/pull/605

cc: soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24076

Differential Revision: D18400431

Pulled By: mickypaganini

fbshipit-source-id: a97bd6ca61f8600ae411da9ff6533c232aae1a51
2019-11-08 19:38:00 -08:00
3657df3836 don't set DEBUG=1 in py3.6-gcc5.4 CI build (#29491)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29491

Setting DEBUG=1 causes tests to run super slow. There are two reasons
why you might do it:
1. Testing `#NDEBUG` stuff. We don't really use this macro.
2. https://github.com/pytorch/pytorch/issues/4119. This is valid,
but I would prefer to allow internal contbuilds to test this, as the
infra is better there.

Test Plan: Imported from OSS

Differential Revision: D18411635

Pulled By: suo

fbshipit-source-id: 54e1d0f9cddaa448cd2dd11fe263d5001845bdd8
2019-11-08 16:53:12 -08:00
91e1f07967 Check for unrolled loop in break & continue (#29474)
Summary:
For the same reason we don't allow iteration over heterogenous types (modulelists/tuples) with types that don't have a static length, we also can't break/continue within them - we need to statically know all types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29474

Differential Revision: D18406097

Pulled By: eellison

fbshipit-source-id: 70ed3fc4947b6237cdd6703135a988a5c13ce786
2019-11-08 15:51:13 -08:00
4da3ac91b7 Add functional overloads for fold, linear, loss, normalization, padding (#29360)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29360

This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals:
- fold
- linear
- loss
- normalization
- padding

These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance.

Test Plan: Imported from OSS

Differential Revision: D18376975

Pulled By: yf225

fbshipit-source-id: 233cd940834dc9d0b5d4b89339ab7082ec042c3c
2019-11-08 15:10:49 -08:00
e80f7506c2 In torch::save(), make padding computation faster. (#29425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29425

This change saves roughly 5-6% in the TorchSaveSmallTensor benchmark
(torch::save() on a tensor with 64 random floats) by reusing the
padding string across records.
ghstack-source-id: 93517961

Test Plan:
Correctness: buck test mode/dev-nosan caffe2/test/...
   Benchmark buck build mode/opt experimental/jeremyl/c2/...
     buck-out/opt/gen/experimental/jeremy/c2/SerializationBench

Differential Revision: D18385731

fbshipit-source-id: 20bcbe1efd2fb7e3012dd68080542f2a74a7d4f2
2019-11-08 15:03:25 -08:00
675a4cb9fb Extracted quantize/dequantize out of linear.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29173

Test Plan: Imported from OSS

Differential Revision: D18318561

Pulled By: z-a-f

fbshipit-source-id: 89317bb5f56e31221ed9ed02bf727ce39f44ebf8
2019-11-08 14:35:15 -08:00
eae4a69069 Add quantized fbgemm headers to torch target (#29418)
Summary:
We dont have ATen/native/quantized/cpu/*.h in torch target before, and we would like it to be exposed for external use.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29418

Differential Revision: D18383534

Pulled By: zrphercule

fbshipit-source-id: 72c06ae2c10e8cc49e7256c9e9b89288263bbfde
2019-11-08 14:32:19 -08:00
c1140f20dc Rename PyTorch JNI library to pytorch_jni (#29412)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29412

Originally, this was going to be Android-only, so the name wasn't too
important.  But now that we're planning to distribute it with libtorch,
we should give it a more distinctive name.

Test Plan:
Ran tests according to
https://github.com/pytorch/pytorch/issues/6570#issuecomment-548537834

Reviewed By: IvanKobzarev

Differential Revision: D18405207

fbshipit-source-id: 0e6651cb34fb576438f24b8a9369e10adf9fecf9
2019-11-08 14:29:13 -08:00
0cfa4965a2 Clean up pytorch_android_torchvision test (#29455)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29455

- Don't need to load native library.
- Shape is now private.

Test Plan: Ran test.

Reviewed By: IvanKobzarev

Differential Revision: D18405213

fbshipit-source-id: e1d1abcf2122332317693ce391e840904b69e135
2019-11-08 14:29:10 -08:00
abf55eb3a8 Pickler: convert std::stringstream cases. (#29351)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29351

When torch::save()ing a smallish tensor, we spend ~5% of the time
still in std::stringstream constructors.

This removes the last couple of cases. Benchmark shows ~5% improvement:
  TorchSaveSmallTensor Pre: 13.12us
  TorchSaveSmallTensor Post: 12.48us
ghstack-source-id: 93517928

Test Plan:
buck build mode/opt experimental/jeremyl/c2:
   buck-out/opt/gen/experimental/jeremyl/c2/SerializationBench  --bm_regex=TorchSaveSmallTensor

Differential Revision: D18365066

fbshipit-source-id: a3284bec004751cedae1cdadf27f969422faff8e
2019-11-08 14:26:40 -08:00
92b9de1428 Test application for profiling, CMake params for debug symbols (#28406)
Summary:
Reason:
To have one-step build for test android application based on the current code state that is ready for profiling with simpleperf, systrace etc. to profile performance inside the application.

## Parameters to control debug symbols stripping
Introducing  /CMakeLists parameter `ANDROID_DEBUG_SYMBOLS` to be able not to strip symbols for pytorch (not add linker flag `-s`)
which is checked in `scripts/build_android.sh`

On gradle side stripping happens by default, and to prevent it we have to specify
```
android {
  packagingOptions {
       doNotStrip "**/*.so"
  }
}
```
which is now controlled by new gradle property `nativeLibsDoNotStrip `

## Test_App
`android/test_app` - android app with one MainActivity that does inference in cycle

`android/build_test_app.sh` - script to build libtorch with debug symbols for specified android abis and adds `NDK_DEBUG=1` and `-PnativeLibsDoNotStrip=true` to keep all debug symbols for profiling.
Script assembles all debug flavors:
```
└─ $ find . -type f -name *apk
./test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk
./test_app/app/build/outputs/apk/resnet/debug/test_app-resnet-debug.apk
```

## Different build configurations

Module for inference can be set in `android/test_app/app/build.gradle` as a BuildConfig parameters:
```
    productFlavors {
        mobilenetQuant {
            dimension "model"
            applicationIdSuffix ".mobilenetQuant"
            buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_MOBILENET_QUANT'))
            addManifestPlaceholders([APP_NAME: "PyMobileNetQuant"])
            buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-mobilenet\"")
        }
        resnet {
            dimension "model"
            applicationIdSuffix ".resnet"
            buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_RESNET18'))
            addManifestPlaceholders([APP_NAME: "PyResnet"])
            buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-resnet\"")
        }
```

In that case we can setup several apps on the same device for comparison, to separate packages `applicationIdSuffix`: 'org.pytorch.testapp.mobilenetQuant' and different application names and logcat tags as `manifestPlaceholder` and another BuildConfig parameter:
```
─ $ adb shell pm list packages | grep pytorch
package:org.pytorch.testapp.mobilenetQuant
package:org.pytorch.testapp.resnet
```

In future we can add another BuildConfig params e.g. single/multi threads and other configuration for profiling.

At the moment 2 flavors - for resnet18 and for mobilenetQuantized
which can be installed on connected device:

```
cd android
```
```
gradle test_app:installMobilenetQuantDebug
```
```
gradle test_app:installResnetDebug
```

## Testing:
```
cd android
sh build_test_app.sh
adb install -r test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk
```

```
cd $ANDROID_NDK
python simpleperf/run_simpleperf_on_device.py record --app org.pytorch.testapp.mobilenetQuant -g --duration 10 -o /data/local/tmp/perf.data
adb pull /data/local/tmp/perf.data
python simpleperf/report_html.py
```

Simpleperf report has all symbols:
![Screenshot 2019-10-22 11 06 21](https://user-images.githubusercontent.com/6638825/67315740-0bc50100-f4bc-11e9-8f9e-2499be13d63e.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28406

Differential Revision: D18386622

Pulled By: IvanKobzarev

fbshipit-source-id: 3a751192bbc4bc3c6d7f126b0b55086b4d586e7a
2019-11-08 14:19:04 -08:00
52456b2eba add hasattr() (#29332)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29332

Even though we're statically typed, this can be useful, e.g. as
shorthand when iterating through a module list.

Test Plan: Imported from OSS

Differential Revision: D18393097

Pulled By: suo

fbshipit-source-id: aa42e955f88d1b8a876d0727055eb596453b9839
2019-11-08 13:58:14 -08:00
7a63728d5f kill pytorch_linux_xenial_cuda9_cudnn7_py2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29479

Test Plan: Imported from OSS

Differential Revision: D18406234

Pulled By: suo

fbshipit-source-id: fb142b61ba39d0478632b3a4f7e9d96fe6efede9
2019-11-08 13:55:30 -08:00
98bb1d1f03 remove non-onnx caffe2 builds (#29478)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29478

caffe2 is still tested internally, but removing the OSS configurations.

ONNX remains, however. I will look at migrating them to the pytorch
docker images so we can kill the entire caffe2 part of the config

Test Plan: Imported from OSS

Differential Revision: D18406233

Pulled By: suo

fbshipit-source-id: c3a7d1c58a2828f04778497faa1b5d13b67acbbb
2019-11-08 13:55:26 -08:00
991c2ac383 Disables flaky test_rand_quantization (#29463)
Summary:
See https://github.com/pytorch/pytorch/issues/28550.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29463

Differential Revision: D18405669

Pulled By: mruberry

fbshipit-source-id: 2984c3896a9260a06fbf052afb06e0cb8d28b53d
2019-11-08 13:51:22 -08:00
3ab44c48d1 Add functional overloads for pixelshuffle, pooling, upsampling, vision (#29359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29359

This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals:
- pixelshuffle
- pooling
- upsampling
- vision

These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance.

Test Plan: Imported from OSS

Differential Revision: D18376978

Pulled By: yf225

fbshipit-source-id: 4ea8d359e7efde0d741eff79faad6b24b2a5d804
2019-11-08 13:48:47 -08:00
5b1a1a17ed remove FunctionType as an allowed constant (#29405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29405

We never actually used this (function attributes are a separate pathway
in ConcreteModuleType).

Test Plan: Imported from OSS

Differential Revision: D18378392

Pulled By: suo

fbshipit-source-id: b06c4b6d70f0b2534be78a215125cffd22ab44f0
2019-11-08 13:38:02 -08:00
a4b872b65e Inline graph before writing the bytecode file. (#29421)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29421

Inline graph before writing the bytecode file, so that all the instructions are emitted from the top-level methods.

Test Plan: Imported from OSS

Differential Revision: D18404180

fbshipit-source-id: 4759474a8dba3813616ebce8253bea09941f6bbb
2019-11-08 13:23:32 -08:00
f362ae1f72 Updating submodules
Summary:
GitHub commits:

e112d61a25
15098906d8
f59ddd8ca2
aa5a68f285
61b1f9d489
acddad22ce
ac0829cd6b
8fee33907f

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 9150a027e1ba74386cd5d1c1b0e43b1299b52023
2019-11-08 12:54:41 -08:00
2e5fc034fb Quantized concat benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29431

Test Plan: Imported from OSS

Differential Revision: D18387765

Pulled By: z-a-f

fbshipit-source-id: a14f69d1ceb0f63ce5eddfda8af342f672dfec69
2019-11-08 12:48:55 -08:00
3bc014ecf2 Implementation of cosine learning rate training policy (#29440)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29440

as titled. same as diff: D18195868.
We fix the windows compiling issue by changing the marco, inspired from: D15511736

Test Plan:
buck test -v 2 caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test  -- test_composite_cosine_lr_policy
canary: https://fburl.com/fblearner/ky7wh3vg

Differential Revision: D18392276

fbshipit-source-id: 83c84c985cd23b1cc43efedfef176ff3c67acb6e
2019-11-08 12:27:59 -08:00
edcf659e42 Remove default values from functional overloads for activation, batchnorm, distance, embedding
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29456

Test Plan: Imported from OSS

Differential Revision: D18401483

Pulled By: yf225

fbshipit-source-id: 638ff72a60fb69e41bec6f468835654b208c2896
2019-11-08 12:24:51 -08:00
2cd4f86422 Support process_group_agent "sending to itself" (#29253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29253

Some operations can be simpler if a worker can send an rpc to itself.
The main reason for not doing previous was that Gloo doesn't support
self-sending.

That said, this changes the process_group_agent to skip the assert
check, and simply enqueue the rpc message in its receiving queue.
ghstack-source-id: 93518076

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18339715

fbshipit-source-id: 08ade40e81da378b003a550c898a726e99d50e34
2019-11-08 12:11:55 -08:00
64a66e8320 fixed random gerenation export (#29354)
Summary:
Fixed random generator symbolics, and added rand_like.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29354

Reviewed By: hl475

Differential Revision: D18400995

Pulled By: houseroad

fbshipit-source-id: 4a891e91b6c87ebce57c35b2bfa11e32ab93a149
2019-11-08 11:43:30 -08:00
5e1983f90f Fix distributed autograd initialization. (#29069)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29069

Distributed autograd was initialized after RPC and this would cause a
race in some scenarios where one node might have initialized distributed
autograd, calls backward() but other nodes have not initialized distributed
autograd yet.

Moving this before `_init_rpc` fixes the problem since `_init_rpc` implicitly
has a sync between processes via the store.
ghstack-source-id: 93535922

Test Plan: waitforbuildbot

Differential Revision: D18280875

fbshipit-source-id: 739a1c22dec21df859738d074e6e497fa43257fd
2019-11-08 11:20:15 -08:00
36b73d5a1b Hipify contrib/nccl (#29385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29385

hipify contrib/gloo

Test Plan: OSS & sandcastle build

Reviewed By: bddppq

Differential Revision: D18373308

fbshipit-source-id: 39c232db36318af116c341f64d03642639575ecd
2019-11-08 10:39:17 -08:00
740c9da267 explicitly provide memory format when calling to clone() at SparseTensorUtils.h
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28697

Test Plan: Imported from OSS

Differential Revision: D18333347

Pulled By: ifedan

fbshipit-source-id: 5340e1829fed068976266089c55d91aa90afee22
2019-11-08 10:29:48 -08:00
c69c243d88 explicitly provide memory format when calling to clone() at spectral_norm.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28691

Test Plan: Imported from OSS

Differential Revision: D18333381

Pulled By: ifedan

fbshipit-source-id: 0f562fb6f5c728b93a20fbbe53135ae5ae25c875
2019-11-08 10:24:46 -08:00
587ec3f55f Decouple JIT and autograd codes (#28900)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28900

Decouple the JIT and autograd codes (and their dependencies). After this decoupling, the compressed torch mobile size is 548 KB total (comparing to 2.98 MB with full JIT).
ghstack-source-id: 93447313

Test Plan: buck build fbandroid/mode/dev_clang_libcxx //xplat/experimental/pytorch/mobile:lite_predictorAndroid#android-armv7 -c project.ignore= -c user.ndk_cxxflags=-g0 --show-output

Differential Revision: D18226237

fbshipit-source-id: a188329274b450f63eb6448f42adec28517e14fd
2019-11-08 10:16:18 -08:00
ab47465384 Remove SchemaRegistrationHandleRAII. (#29379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29379

ghstack-source-id: 93452912

Test Plan: buck build caffe2:aten_cpu

Differential Revision: D18365671

fbshipit-source-id: 0141930e50a4b519df866ce70b724d17601e29dd
2019-11-08 09:51:31 -08:00
f441bb1c20 check error status of CUDA launch after Magma kernels (#29003)
Summary:
as part of https://github.com/pytorch/hub/issues/62 I found that the stack-trace of a failed kernel launch was being recorded elsewhere, even with CUDA_LAUNCH_BLOCKING=1.

So, I started debugging, and found that magma launches don't do error checking.

I eventually found the issue to be that I didn't compile-in sm37 SASS into the magma binary and the failure was on `x.inverse()`, and that's somehow a problem for magma 2.5.1 (but not 2.5.0).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29003

Differential Revision: D18397358

Pulled By: soumith

fbshipit-source-id: 04baca68eac209d7af773daddd0193697d4ab0d9
2019-11-08 09:43:51 -08:00
4e21157e01 Revert "Revert D18171156: Merge Tensor and Variable." (#29299)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29299

This reverts commit 9c43b16df9dad3dfb4da1efab68d8c88e6437e8f, but also
with the changes from D18348622.  Comments there:

thpp-compatibility is used by admarket/adreview/service:adreviewservice and
libtorch is too big for the service to deal with.

thpp-compatibility doesn't support autograd, so we hack around dispatching
variables by using AutoNonVariableTypeMode everywhere we call into ATen,
so we never attempt to call into Variable stubs.  If you get it wrong,
you'll get an error like:

```
what():  Could not run 'aten::empty' with arguments from the 'VariableTensorId' backend. 'aten::empty' is only available for these backends: [SparseCPUTensorId, CPUTensorId, MkldnnCPUTensorId]. (lookup_ at caffe2/aten/src/ATen/core/dispatch/DispatchTable.h:298)
```

Test Plan:
Imported from OSS

```
buck test //thpp-compatibility/...
buck build mode/opt-clang admarket/adreview/service:adreviewservice
```

adreviewservice canary: https://our.intern.facebook.com/intern/ads/canary/422290029716387895 (comparing against parent comment due to current breakage) ==> experiment store https://our.intern.facebook.com/intern/experiment_store/experiment/43990006/
adfinder canary: https://our.intern.facebook.com/intern/ads/canary/422268535840333934
adindexer canary: https://our.intern.facebook.com/intern/ads/canary/422268550559034675

adreview second canary:  https://our.intern.facebook.com/intern/ads/canary/422307863515591925

canary without thpp-compat fixups https://our.intern.facebook.com/intern/ads/canary/422308951649168772

Reviewed By: dreiss

Differential Revision: D18353504

Pulled By: ezyang

fbshipit-source-id: 65feaba39fa07bb66762810909aeb38868668a30
2019-11-08 09:11:20 -08:00
b24b967e00 Add functional overloads for activation, batchnorm, distance, embedding (#29358)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29358

This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals:
- activation
- batchnorm
- distance
- embedding

These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance.

Test Plan: Imported from OSS

Differential Revision: D18376976

Pulled By: yf225

fbshipit-source-id: 0b254dc6340b6d6ac08c9f95d2b1c02b791b2f38
2019-11-08 08:34:10 -08:00
63675b1969 Revert RRef.to_here()/local_value() return type (#29396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29396

The return types of RRef.to_here()/local_value() were recently
changed to Future, which triggers flakiness as the RRef could be
deleted before the future.wait() finishes. While we are still
discussing how we'd like to solve it, this commit reverts the
return type to stop bleeding in tests.

closes #28885

Test Plan: Imported from OSS

Differential Revision: D18375571

Pulled By: mrshenli

fbshipit-source-id: 354dbf38b15ab804e44fc9968dd30888415c1fab
2019-11-08 08:31:18 -08:00
d75222f3f5 Dump operator names of a module and its submodules.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29374

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D18372073

Pulled By: ljk53

fbshipit-source-id: cf2df0d44ffe74dd24dc63f1f07f395e36b5393d
2019-11-08 08:22:05 -08:00
b7fc26a9ef Clean up the stale item in bc white list (#29439)
Summary:
keep the list clean
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29439

Reviewed By: hl475

Differential Revision: D18392445

Pulled By: houseroad

fbshipit-source-id: 2cfe66620e0e9275a0f9590e453c9be10c82124a
2019-11-08 07:06:48 -08:00
255b2340fc don't copy ignored/unused methods to ScriptModule (#29342)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29342

This is not necessary, as we use `lazy_bind` to retrieve those methods
from the class anyway.

Test Plan: Imported from OSS

Differential Revision: D18383381

Pulled By: suo

fbshipit-source-id: e8b7c9e696087cc1e707ac38f7ae85f569f08371
2019-11-07 22:54:29 -08:00
5f03ad9698 Add note to docs of torch.unique (#29165)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/19151
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29165

Differential Revision: D18319890

Pulled By: soumith

fbshipit-source-id: 162afaecd5371446bec2a1769e0a8848ecffb002
2019-11-07 22:03:15 -08:00
baef925d5d Skips CUDA handle tests on Python2 (#29430)
Summary:
Per title. These tests aren't Python2 compatible.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29430

Differential Revision: D18391211

Pulled By: mruberry

fbshipit-source-id: a3516796f6bd333de0415dd0ff0a2a161f963109
2019-11-07 21:33:20 -08:00
4bcf4796aa Make HistogramObserver scriptable with @torch.jit.ignore (#27950)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27950

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18360139

fbshipit-source-id: 5459ae49c087886e4990de136198773a75b1c572
2019-11-07 18:02:44 -08:00
19d3a7ad02 fix negative string indexing (#22700)
Summary:
strings allow negative indexing in python
Pull Request resolved: https://github.com/pytorch/pytorch/pull/22700

Differential Revision: D18382382

Pulled By: eellison

fbshipit-source-id: 05c3fa0890be6234ee1467da0e65697f51236523
2019-11-07 17:28:16 -08:00
e66626ae5c Lift rpc_timeout to RpcAgent, for other RpcAgents to reuse. (#29341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29341

So that other RpcAgent could use this timeout setting as well.

ghstack-source-id: 93481902

Differential Revision: D5681951

fbshipit-source-id: 569c768dc342e8a2d9faf142ceccf696e12e41dc
2019-11-07 17:05:45 -08:00
7da11f4967 Export weight_norm (#28618)
Summary:
Export _weight_norm
Caffe2 tests are inplace

Looks like there is a conflicting behavior in torch.nn.utils.weight_norm regarding None dim.
Where dim could be negative for backwards axes, but when dim = None, it's overwitten to -1
0c48092b22/torch/nn/utils/weight_norm.py (L10)

For now, this symbolic to matches the current behavior. But this might need to be changed in the torch module.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28618

Reviewed By: hl475

Differential Revision: D18354270

Pulled By: houseroad

fbshipit-source-id: 0d64ee9ee1156bb96d36ed0a25b2e8cc5058ce90
2019-11-07 16:55:56 -08:00
782e80e6e7 Make jit.trace_module reentrant (#29411)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29411

Fixes https://github.com/pytorch/pytorch/issues/29367

Test Plan: Imported from OSS

Differential Revision: D18380559

Pulled By: jamesr66a

fbshipit-source-id: 5caf606ccbc5dc79dac14e3c28cc02dec19ce695
2019-11-07 16:29:06 -08:00
90f28c2756 enable fast path for TensorIterator for contiguous inputs/no broadcast (#29180)
Summary:
As title. Also, replaces output allocation by `empty` instead of `empty_strided` in the regular path when possible, thus avoiding resizing of outputs and taking additional DeviceGuard for that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29180

Test Plan: covered by existing tests

Differential Revision: D18327836

Pulled By: ngimel

fbshipit-source-id: e8d925f0fe915f327ec41aba83fd6857b09772b5
2019-11-07 16:23:33 -08:00
8a33f1150d Use nativeloader instead of system loader to load JNI library for soloader compatibility. (#29350)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29350

ghstack-source-id: 93491099

Test Plan: P121597890

Reviewed By: dreiss

Differential Revision: D18352773

fbshipit-source-id: 712c3f5d10a3d4c815c5554bb62e1a95563ba7ff
2019-11-07 16:09:29 -08:00
fa66a1498e Simplify _calculate_fan_in_and_fan_out (#29370)
Summary:
The code checking `if dimensions == 2` is not needed
because the case of a 2D tensor (Linear) is already handled
by the statement:
`receptive_field_size = 1`
and this conditional:
`if tensor.dim() > 2:`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29370

Differential Revision: D18372987

Pulled By: albanD

fbshipit-source-id: fcb4dddbc76b9f4414c6d88c0aa2fb4435bf3385
2019-11-07 15:53:05 -08:00
de9a54466d clone should preserve the type of attribute (#29269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29269

Hit this bug when I have an attribute of type `Optional[Tensor]` which
is initialized to None and reassigned later to some tensor.

Test Plan:
.

Imported from OSS

Differential Revision: D18364338

fbshipit-source-id: d8e1277a84ab7d80331cba83f5639469d398632e
2019-11-07 15:25:20 -08:00
5a44107146 fix pytorch mobile build (#29414)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29414

add a missing file and fix a std::to_string call.

Test Plan: buck build //xplat/caffe2:torchAndroid#android-armv7,shared

Reviewed By: ljk53

Differential Revision: D18351498

fbshipit-source-id: 41225bff974058eef485a9991d0cc16c67a4074a
2019-11-07 15:20:04 -08:00
0be2f12ef9 Updating submodules
Summary:
GitHub commits:

f80050fa8f
7acd9b86d2

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 8c0603b72028220d3ac2254b752cfc9c9f6011a4
2019-11-07 14:54:07 -08:00
821f8bfc2f Fix tracing for dynamic quantized LSTM (#29331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331

Closes #27954

This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:

1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.

The `forward` of the traced LSTM before and after this change are as follows:

Before
```
def forward(self,
    input: Tensor,
    argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
  hx, hx0, = argument_2
  _0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
  return (_0, (_1, _2))
```

After

```
def forward(self,
    input: Tensor,
    argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
  _0 = self.cell._all_weight_values
  _1 = getattr(_0, "0").param
  _2 = getattr(_0, "1").param
  hx, hx0, = argument_2
  _3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
  return (_3, (_4, _5))

```

Test Plan: Imported from OSS

Differential Revision: D18374904

Pulled By: jamesr66a

fbshipit-source-id: f1a9b58998bc365b9baad38c21fd4bb510dd639c
2019-11-07 13:45:39 -08:00
5bb35fe923 Updating submodules
Summary:
GitHub commits:

07a0ad3c29
aa35e8c58b

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: fe673a44e0a23ba0a4dc588a9ae036857874f203
2019-11-07 13:42:45 -08:00
1dd3c8e539 Skip flaky test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29403

Test Plan: Imported from OSS

Differential Revision: D18377162

Pulled By: jamesr66a

fbshipit-source-id: 69052a7466d03468146e99da45f1ee2c9e85dfa8
2019-11-07 12:52:47 -08:00
02921e7985 Use cuDNN's handle pool mechanism to manage cublas handles (#29233)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/6962

The PR implements the handle pool mechanism for cublas as suggested by mcarilli  in https://github.com/pytorch/pytorch/issues/6962#issuecomment-530563872.

~~I didn't add any unit test here yet because as mcarilli mentioned:~~
> ~~On my local machine, out of curiosity I also rewrote that test to use gemms instead of convolutions. The race condition seemed rarer, but the test did show that cublas use is not thread safe. I can share the script if you want.~~

~~Please share your script with me mcarilli. And if the race condition is rare, would it still be possible for the CI to detect it?~~

cc: colesbury
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29233

Differential Revision: D18372007

Pulled By: ezyang

fbshipit-source-id: 3492bf13410598e8452e89cf4e3e63e8df9c8c3d
2019-11-07 12:50:18 -08:00
b008b34bd8 explicitly provide memory format when calling to clone() at SparseTensor.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28699

Test Plan: Imported from OSS

Differential Revision: D18333354

Pulled By: ifedan

fbshipit-source-id: e5634cd6f2e5d24867f4bb6730670303e70dea52
2019-11-07 12:26:50 -08:00
09822a1d62 explicitly provide memory format when calling to clone() at SparseTensorMath.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28700

Test Plan: Imported from OSS

Differential Revision: D18333349

Pulled By: ifedan

fbshipit-source-id: 23780c6a60f366de2b8f563b477df35cf52f88b4
2019-11-07 12:15:15 -08:00
564384fe12 Automatic update of fbcode/onnx to fea8568cac61a482ed208748fdc0e1a8e47f62f5 (#29363)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29363

Previous import was 2891e1459745933f4bba9a8cb3371cf3c9eb1d16

Included changes:
- **[fea8568c](https://github.com/onnx/onnx/commit/fea8568c)**: minor changes to NonZero and Slice (#2429) <Ashwini Khade>
- **[79bd5042](https://github.com/onnx/onnx/commit/79bd5042)**: fix test bugs for resize op version 11 (#2425) <Ashwini Khade>
- **[3ea3b0e0](https://github.com/onnx/onnx/commit/3ea3b0e0)**: Add shape existence check in GatherElements shape inference logic (#2402) <Hariharan Seshadri>
- **[192ad8c8](https://github.com/onnx/onnx/commit/192ad8c8)**: add invite for next workshop (#2407) <Prasanth Pulavarthi>
- **[eea60812](https://github.com/onnx/onnx/commit/eea60812)**: Fix missing comma in exception message. Causes invalid message depending on what's in memory prior to the constant char string. (#2403) <Scott McKay>
- **[dd082c99](https://github.com/onnx/onnx/commit/dd082c99)**: Add section headers for easier linking (#2400) <Prasanth Pulavarthi>
- **[ca1d5b7e](https://github.com/onnx/onnx/commit/ca1d5b7e)**: Add type check for node inputs (#2367) <RandySheriffH>
- **[e5600091](https://github.com/onnx/onnx/commit/e5600091)**: Update doc loop op (#2337) <G. Ramalingam>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D18365923

fbshipit-source-id: 8ac138e3ff9d4fbc5fdf85d06785190334c346a1
2019-11-07 12:09:17 -08:00
255505f232 Updating submodules
Summary:
GitHub commits:

d752e52a31
d7cc18d7c7
6e26fa9d03
76fcc9469a
1da1f04231
17ff03e136
d8fde1c7fc
8a07c4270c
50fdf05973
affb36ec7c

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: dfcf050e5eaa6e5077ea9b4c21326f127ec6066c
2019-11-07 11:58:24 -08:00
d5d524dadb explicitly provide memory format when calling to clone() at TensorShape.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28698

Test Plan: Imported from OSS

Differential Revision: D18333352

Pulled By: ifedan

fbshipit-source-id: cb31d4bbda50a6bfa7a25d0cae9953bec03e7c46
2019-11-07 11:55:42 -08:00
Jie
fdab1cf0d4 NHWC support in cuDNN BatchNorm & Conv2d (#29361)
Summary:
This reverts the 9a9bb448ee49a1493f22bbbeed4af92b1364fce9

Fixing the broken case which reverts the previous commit.
details about fix:
	modified:   aten/src/ATen/native/Convolution.cpp

called contiguous on 3D input tensor. This avoids the code path to accidentally
recognize the input as channel_last stride, due to unsqueezing of permuted 3d
tensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29361

Differential Revision: D18371964

Pulled By: VitalyFedyunin

fbshipit-source-id: a5985f4687b37e183649fa35b8ccdb50368ebfdf
2019-11-07 10:39:58 -08:00
0aba5ba13c Add unsafeRemoveAttr and unsafeRemoveSlot to ivalue::Object (#29048)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29048

In order to support remove_attribute in module, we need to support
remove slot in ivalue::Object, it's caller's responsiblity to gaurantee
the safety of the remove operation

Test Plan:
build/bin/test_jit

Imported from OSS

Differential Revision: D18343464

fbshipit-source-id: c1ba3a06afc40d928e59500b7b35c9e6c8720028
2019-11-07 10:35:57 -08:00
abbe6347ff CPU-strided-complex support for ComplexFloat (#29294)
Summary:
Re-submit of https://github.com/pytorch/pytorch/issues/29133

In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

Changes
- [x]  Fixed Vec256 Permute operations for Complex Float
- [x]  Fixed copy_kernel_cast between complex data types
  -  copy_kernel_cast should not call std::real during inter-complex dtype conversion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29294

Differential Revision: D18371928

Pulled By: ezyang

fbshipit-source-id: a80a894eeaeb68540054ccfe405c4d0338fa4350
2019-11-07 09:35:19 -08:00
86c64440c9 Make PyTorch Python 3.8 compatible (#29302)
Summary:
PEP 590 modifies the `tp_print` offset to `tp_vectorcall_offset` - which requires a Py_ssize_t object.
Passing a nullptr caused compatibility issues for Python 3.8.

Changelog:
- Modify all occurrences of `nullptr  /* tp_print */` to 0  /* tp_vectorcall_offset */
- Minor formatting changes
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29302

Test Plan:
- Local fresh build with Python 3.8 completed successfully.

Fixes https://github.com/pytorch/pytorch/issues/28060.
Fixes https://github.com/pytorch/pytorch/issues/29162.

Supersedes https://github.com/pytorch/pytorch/pull/28364

Differential Revision: D18372022

Pulled By: ezyang

fbshipit-source-id: 8e9a15b0d0f72101ccc69bd489f5efa216b880bb
2019-11-07 09:20:19 -08:00
ca20b569be Move unboxed dispatch decision into dispatcher (#29200)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29200

Before, the dispatch key for unboxed operators from native_functions.yaml was generated in codegen and passed to the c10 dispatcher.
Now, we generate it inside of the dispatcher, right next to where the same thing happens for boxed calls.
ghstack-source-id: 93371022

Test Plan: unit tests

Differential Revision: D18282747

fbshipit-source-id: 96a97fe83778d0a9e61b4441d6e2aed10d73209c
2019-11-07 09:03:19 -08:00
43d4d019c4 explicitly provide memory format when calling to clone() at rprop.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28693

Test Plan: Imported from OSS

Differential Revision: D18333379

Pulled By: ifedan

fbshipit-source-id: 4430efc0602a3fc6ef05adac07df845a696449f7
2019-11-07 09:00:37 -08:00
2704af0970 AsyncIf op implementation
Summary:
This diff adds the following:
- An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops.
- Unit tests targeting asynchronicity and error/cancellation handling.

Test Plan:
New unit tests

With --stress-runs=2000:
https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325

Reviewed By: ilia-cher

Differential Revision: D18051357

fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f
2019-11-07 08:51:31 -08:00
b14c5943d4 Handle warning in torchscript (#27154)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27154

Fix for #25859

* #28283 Fix clang-tidy errors in csrc/Module.cpp

Test Plan: Imported from OSS

Differential Revision: D18249631

Pulled By: albanD

fbshipit-source-id: 4e9bbad07cc39e7c7f0546ef7587bd4ab2dd644e
2019-11-07 08:35:16 -08:00
0ff1696c75 add pybind version of HANDLE_TH_ERRORS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26614

Test Plan: Imported from OSS

Differential Revision: D18249634

Pulled By: albanD

fbshipit-source-id: 25503f368926e0f3633c5af0f222c9bb4729f342
2019-11-07 08:35:11 -08:00
9b875e1256 Buffer python warning to avoid deadlocks
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26613

Test Plan: Imported from OSS

Differential Revision: D18249633

Pulled By: albanD

fbshipit-source-id: 863f52400e1b97943a67a9e1abb09ae8d045e7f0
2019-11-07 08:35:06 -08:00
cb3232fdb9 Fix clang-tidy errors in csrc/Module.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28283

Test Plan: Imported from OSS

Differential Revision: D18249632

Pulled By: albanD

fbshipit-source-id: 0c7c71b3b7c74d338a90850e06c841b399f5709f
2019-11-07 08:34:58 -08:00
528a0cfb96 Allow setting tolerations in testing math functions. (#29297)
Summary:
May be needed by https://github.com/pytorch/pytorch/issues/25287
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29297

Differential Revision: D18371907

Pulled By: ezyang

fbshipit-source-id: 4b90ae2b9867d21401498b780428dd009741b6bc
2019-11-07 08:26:53 -08:00
b05e9d4521 explicitly provide memory format when calling to clone() at lbfgs.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28692

Test Plan: Imported from OSS

Differential Revision: D18333356

Pulled By: ifedan

fbshipit-source-id: ca0de6b721f695893c0756ea1b3b469df1a2b249
2019-11-07 08:20:11 -08:00
5d70b11d36 Fix the issue when NHWC Tensor has height or width larger then max cuda grid (#28931)
Summary:
When NHWC Tensor has height or width larger then max CUDA grid size, max_pool fails with error code 0

The example is: https://github.com/pytorch/pytorch/issues/28714

This change should limit grid size to the CUDA max possible size and chunk the input to be able to process it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28931

Differential Revision: D18358892

Pulled By: ifedan

fbshipit-source-id: 2fd65448bd644f1588a0e208edaaea5bcb6a7d52
2019-11-07 08:17:54 -08:00
4926a51010 explicitly provide memory format when calling to clone() at parameter.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28690

Test Plan: Imported from OSS

Differential Revision: D18333355

Pulled By: ifedan

fbshipit-source-id: e02bd556e7b336bb02cd9ec89029a0e5f4f7cbe7
2019-11-07 07:38:44 -08:00
8498a1555b Add some non-contiguous tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28905

Test Plan: Imported from OSS

Differential Revision: D18357843

Pulled By: nairbv

fbshipit-source-id: d411517d702023618dce7f501d3e2a4eea8901ff
2019-11-07 07:10:22 -08:00
9dcf5191d5 explicitly provide memory format when calling to clone() at batchnorm.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28689

Test Plan: Imported from OSS

Differential Revision: D18333368

Pulled By: ifedan

fbshipit-source-id: e440c80ce8a64e1aae709cd935b14c7024a17787
2019-11-07 06:42:14 -08:00
75309b45f3 explicitly provide memory format when calling to clone() at Indexing.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28660

Test Plan: Imported from OSS

Differential Revision: D18333346

Pulled By: ifedan

fbshipit-source-id: 06590205d883a5096388a4ae318389244130972d
2019-11-07 05:38:32 -08:00
78a34d3205 Revert D18350353: dump operator names of a module and its sub-modules.
Test Plan: revert-hammer

Differential Revision:
D18350353

Original commit changeset: 2026c8ab7650

fbshipit-source-id: 401f34cb276c3ea34a5439de4c3415969a04ab2a
2019-11-07 05:28:33 -08:00
58005382c8 fix @property (#28395)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28395

Currently property methods are broken in TorchScript because we
basically treat it as an attribute in the existing path: we'll evaluate
the method once and store that as the value forever.

Since lack of property support is easily worked around (just make it
a method), I've opted to just explicitly error to avoid confusion. If
people want it, they can file an issue and we can look at their use
case.

This also helps us nicely clean up some parts of the ScriptModule conversion
path.

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D18054946

Pulled By: suo

fbshipit-source-id: 7e927836ae687cd2f13a94b9f0af399437fae422
2019-11-06 23:51:07 -08:00
796363147f Implement more of of the nn.Module API (#28828)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28828

This updates torch::script::Module to more closely match the behavior
of nn.Module. In particular, it implements the (optionally recurisive)
iterators that retrieve submodules, parameters, and buffers and makes
their names match the python versions.

This also removes the individual accessors for Parameter, Module, Buffer, etc.
and replaces them with a single `attr` function which is equivalent to
writing `a.foo` in Python (`setattr` emulates `a.foo = v`).
As we build out the user-facing API for TorchScript values this will end
up matching how an  attribute is accessed on general objects.

This PR preservers the python bindings for script::Module by emulating the
old API at the binding level. A followup will clean up the usage to more
directly match the C++ API.

Test Plan: Imported from OSS

Differential Revision: D18197611

Pulled By: zdevito

fbshipit-source-id: 7ee4dcbb258605d1c988314b05d938423f1ccee5
2019-11-06 22:58:25 -08:00
509d9630ca Disabling ONNX IR v4 sematics for opset 8 or lower. (#28990)
Summary:
Currently, `keep_initializers_as_input` argument in `torch.onnx.export` API can be used to choose whether to export an ONNX model with IR v3 or v4 semantics. Currently, the implementation does not check for which opset is being used for export. This is an issue because ONNX IR v4 is valid only for opset 9 and above (as listed [here](https://github.com/onnx/onnx/releases/tag/v1.4.0)), and opset 8 or lower export with `keep_initializers_as_input=False` will create a illegal ONNX graph.

This change fixes this by introducing a check on opset version when deciding whether to export ONNX IR v3 or v4.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28990

Reviewed By: hl475

Differential Revision: D18352523

Pulled By: houseroad

fbshipit-source-id: 7e9055d405c3faf52b80a8de0d04186d4c350c15
2019-11-06 21:57:21 -08:00
4515edfe15 Disable QNNPACK tests on MacOS (#29328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29328

Tests are flaky as seen in issue #29326.
Disable until we fix the kernels.

Test Plan:
python test/test_quantized.py TestQNNPackOps

Imported from OSS

Differential Revision: D18358200

fbshipit-source-id: 58f1981799fe8253234fcc7b0540e1c0b6babc15
2019-11-06 21:30:11 -08:00
84a6583ba1 Revert D18359880: Fix tracing for dynamic quantized LSTM
Test Plan: revert-hammer

Differential Revision:
D18359880

Original commit changeset: 0ff2cad294a1

fbshipit-source-id: 834cd43b39fb754f90c8b18b8ab9b837f2b511ab
2019-11-06 21:10:33 -08:00
dc7552f9ca dump operator names of a module and its sub-modules. (#29208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29208

A binary to dump operator names from a script model and its sub models.
Usage:
dump_operator_names path/to/script_model.pt path/to/output.yaml

Test Plan: Imported from OSS

Differential Revision: D18350353

fbshipit-source-id: 2026c8ab765069ad059ab2ca44fc27b79315b973
2019-11-06 20:57:28 -08:00
6572d0d174 add a new flag to select machine for op benchmark (#29349)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349

This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices.

Test Plan:
```
buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 124.283
...
# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cuda_bwdall
# Input: M: 64, N: 64, K: 128, device: cuda
Backward Execution Time (us) : 176.592

buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 121.884

buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cuda
# Input: M: 64, N: 64, K: 64, device: cuda
Forward Execution Time (us) : 26.002

Reviewed By: hl475

Differential Revision: D18363942

fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065
2019-11-06 20:13:25 -08:00
fff4f16e45 Clean up file opening for serialization (#29221)
Summary:
Stacked PRs
 * https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization
 * https://github.com/pytorch/pytorch/issues/29228 - Expose miniz to Python
 * **https://github.com/pytorch/pytorch/issues/29221 - Clean up file opening for serialization**

This is a small refactor to get things started for zipfile-based serialization
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29221

Differential Revision: D18330932

Pulled By: driazati

fbshipit-source-id: ce91542faf987ae5aa6dfd322e633a0c7335e678
2019-11-06 18:41:40 -08:00
ae12630508 getFuncName take func_value as argument (#29146)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29146

getFuncName takes the Value that represents the function for argument
e.g.
for CallFunction(%1, %a, %b, %c), it takes %1 for argument

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18362840

fbshipit-source-id: fc90ebe7db702aec9b50cec6db454d0eb8ee5612
2019-11-06 18:20:04 -08:00
9a9bb448ee Revert cudnn changes #23861 (#29329)
Summary:
Broken case:

```python
x = torch.randn(192,16,50).cuda()
x = x.permute(0,2,1).contiguous().permute(0,2,1)
m = torch.nn.Conv1d(
       in_channels=16,
       out_channels=32,
       kernel_size=2,
       bias=True,
  ).cuda()

m(x)
```

This reverts commit 8160f390cf678b3b98e0c1f73bd289ee3c96afcb.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29329

Differential Revision: D18357674

Pulled By: VitalyFedyunin

fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245
2019-11-06 17:38:46 -08:00
f17e02fd94 Fix tracing for dynamic quantized LSTM (#29331)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331

Closes #27954

This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance:

1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load.
2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (9c43b16df9/torch/csrc/jit/tracer.cpp (L285)) can see these parameters and create a `Value*` for them in the traced graph.
3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business.

The `forward` of the traced LSTM before and after this change are as follows:

Before
```
def forward(self,
    input: Tensor,
    argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
  hx, hx0, = argument_2
  _0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
  return (_0, (_1, _2))
```

After

```
def forward(self,
    input: Tensor,
    argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]:
  _0 = self.cell._all_weight_values
  _1 = getattr(_0, "0").param
  _2 = getattr(_0, "1").param
  hx, hx0, = argument_2
  _3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True)
  return (_3, (_4, _5))

```

Test Plan: Imported from OSS

Differential Revision: D18359880

Pulled By: jamesr66a

fbshipit-source-id: 0ff2cad294a1871123015dfc704eaf73a7ac1d9e
2019-11-06 17:02:12 -08:00
6c4fd602ff Revert D18350224: Fixed export for random
Test Plan: revert-hammer

Differential Revision:
D18350224

Original commit changeset: 540a07f7def3

fbshipit-source-id: c5755c819191b858f0de4aab8196cf5a46b8f750
2019-11-06 16:01:49 -08:00
309b28ee3a Trace module calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29261

Test Plan: Imported from OSS

Differential Revision: D18343363

Pulled By: jamesr66a

fbshipit-source-id: 0c6394205e2c0ea8708028d20df83fe17b466ff4
2019-11-06 15:05:49 -08:00
0f4b226afb API for finding a common ancestor block for a pair of nodes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28864

Test Plan: Imported from OSS

Differential Revision: D18219786

Pulled By: jamesr66a

fbshipit-source-id: fb19ed5732dd714cef7a924bc42c156065b926d5
2019-11-06 15:05:45 -08:00
adb7df7117 Consistently use TORCH_CUDA_API for all files that live in cuda targets. (#29158)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29158

My plan is to split out libtorch_cuda.so from libtorch.so.  To do this,
I need accurate _API annotations for files in these directories.

I determined the correct set of annotations by looking at
tools/build_variables.py and making sure every file that was a member
of the libtorch_cuda/ATen-cu targets had these annotations.  (torch-cpp-cuda
doesn't count since that's going to be where the stuff that has explicit
USE_CUDA lives, so it's going to be in a separate dynamic library).

As future work, it would be good to setup a lint rule to help people
understand what the correct _API annotation to use in a file is; it
would also be good to reorganize folder structure so that the library
structure is clearer.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18309593

Pulled By: ezyang

fbshipit-source-id: de710e721b6013a09dad17b35f9a358c95a91030
2019-11-06 15:02:07 -08:00
a5d356cb39 Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29143

THP_CORE macro is a very old macro that appeared to have served
two purposes:

1. The torch-python equivalent of CAFFE2_BUILD_MAIN_LIB, to toggle
   symbol visibility headers

2. Some sort of ad hoc way of hiding certain definitions from headers
   so external clients can't get at them.

It did (2) in a very confusing manner, because we set THP_CORE in both
torch and torch-python (it shouldn't do anything in torch).  In this
PR I just get rid of use case (2) entirely (so everything shows up in
headers all the time), and then redo (1) using a new THP_BUILD_MAIN_LIB
macro.  This cleans up some of the macro definitions and makes my life
easier for working on #27215.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18309594

Pulled By: ezyang

fbshipit-source-id: adcb6d7cb387cd818480137e2b94e5e761dbfefc
2019-11-06 15:02:02 -08:00
f227530c88 Clean up named tensor propagate_names API (#29239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29239

There were a few main changes, summarized below.

Rename `propagate_names`
----------------------------------------------

There are two main APIs now, `propagate_names_if_nonempty(Tensor&,
ArrayRef<Dimname>)` and `propagate_names(Tensor&, ArrayRef<Dimname>)`

The former propagates names if they are not empty and the latter
unconditionally tries to propagate names.

`names` can be empty if name inference did not occur (see the next
section).

Removed usages of `optional` in name inference
----------------------------------------------

Previously, we used `optional<ArrayRef<Dimname>>` and
`optional<vector<Dimname>>`. `nullopt` represens that no name inference
happened.

The problem with this is that these types are not implicitly convertible
to each other and dealing with them is painful as a result (users have
to manually unwrap `optional<vector>` and convert to
`optional<arrayref>`.

To fix this, I rewrote most named inference functions to use an empty array as an
indicator value:
- If an array is empty, then no name inference occured
- If an array is not empty, then name inference occured.

Removed `vector<Dimname>&&` overloads
----------------------------------------------

These were originally meant for efficiency: instead of copying a vector
of names we could move it directly inside the tensor and replace the old
names. However, looking around the code base, we do copies for
`IntArrayRef` for sizes and strides instead of optimizing them, so the
perf gain is probably not critical. I removed `vector<Dimname>&&` overloads
to stop optimizing prematurely.

Furthermore, one potential design for a faster named inference api is
to construct names directly on a tensor's names object; in this design
there is also no `vector<Dimname>&&` overload.

Plans
----------------------------------------------

After this PR I'll keep attempting to cleaning up `propagate_names`
functions. There are a lot of `propagate_names_for_{blah}` functions
that exist that probably don't need to.

Test Plan: - `python test/test_namedtensor.py -v`

Differential Revision: D18350090

Pulled By: zou3519

fbshipit-source-id: eb5dd6cbd2d4f1838431db5edbdb207204c5791d
2019-11-06 14:45:39 -08:00
364e525f55 Fixed export for random (#28470)
Summary:
[ONNX] Fixed export for random generator ops
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28470

Reviewed By: hl475

Differential Revision: D18350224

Pulled By: houseroad

fbshipit-source-id: 540a07f7def335f66808af8c360b72261d15635b
2019-11-06 14:42:20 -08:00
8ed84a9123 skip broken custom op test (#29334)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29334

As title

Test Plan: Imported from OSS

Differential Revision: D18358592

Pulled By: suo

fbshipit-source-id: d7afbce52ddd008ae9c42aeda6be24e35086ef01
2019-11-06 14:33:01 -08:00
7d01d5efd7 update op bench readme (#29289)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29289

as title

Test Plan: na

Reviewed By: hl475

Differential Revision: D18350580

fbshipit-source-id: 80f41cbbfda9cbcd8988b451cdfb199f2b89e49b
2019-11-06 14:08:02 -08:00
e51d937e91 move cuda abs to Aten (#25857)
Summary:
VitalyFedyunin, this PR fix the https://github.com/pytorch/pytorch/issues/24531

Benchmark script :
```
import timeit

device = "cuda"
for n, t in [(10, 100000),(1000, 10000)]:
    print('a.abs() (a.numel() == {}) for {} times'.format(n, t))
    for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.float', 'torch.double', 'torch.half'):
        print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
        print(timeit.timeit(f'a.abs()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t))
```
Device: **Tesla P4**
Cuda verison: **9.0.176**

Before this change:
```
a.abs() (a.numel() == 10) for 100000 times
device: cuda, dtype: torch.int8, 100000 times           1.8391285985708237
device: cuda, dtype: torch.uint8, 100000 times          1.8831938095390797
device: cuda, dtype: torch.int16, 100000 times          1.8131775446236134
device: cuda, dtype: torch.int32, 100000 times          1.832334715873003
device: cuda, dtype: torch.int64, 100000 times          1.8218239657580853
device: cuda, dtype: torch.float, 100000 times          1.7942761108279228
device: cuda, dtype: torch.double, 100000 times         1.8193779103457928
device: cuda, dtype: torch.half, 100000 times           1.796515878289938
a.abs() (a.numel() == 1000) for 10000 times
device: cuda, dtype: torch.int8, 10000 times            0.18348361551761627
device: cuda, dtype: torch.uint8, 10000 times           0.1892806850373745
device: cuda, dtype: torch.int16, 10000 times           0.18253886327147484
device: cuda, dtype: torch.int32, 10000 times           0.18509215489029884
device: cuda, dtype: torch.int64, 10000 times           0.18291602283716202
device: cuda, dtype: torch.float, 10000 times           0.1796952784061432
device: cuda, dtype: torch.double, 10000 times          0.18088893592357635
device: cuda, dtype: torch.half, 10000 times            0.18222836777567863
```
After change:
```a.abs() (a.numel() == 10) for 100000 times
device: cuda, dtype: torch.int8, 100000 times           1.7365420907735825
device: cuda, dtype: torch.uint8, 100000 times          1.7433889284729958
device: cuda, dtype: torch.int16, 100000 times          1.7034666128456593
device: cuda, dtype: torch.int32, 100000 times          1.6825932636857033
device: cuda, dtype: torch.int64, 100000 times          1.6896217577159405
device: cuda, dtype: torch.float, 100000 times          1.7211194895207882
device: cuda, dtype: torch.double, 100000 times         1.6823345720767975
device: cuda, dtype: torch.half, 100000 times           1.7027524448931217
a.abs() (a.numel() == 1000) for 10000 times
device: cuda, dtype: torch.int8, 10000 times            0.17180879414081573
device: cuda, dtype: torch.uint8, 10000 times           0.17316896095871925
device: cuda, dtype: torch.int16, 10000 times           0.16990498825907707
device: cuda, dtype: torch.int32, 10000 times           0.1681906059384346
device: cuda, dtype: torch.int64, 10000 times           0.16994905844330788
device: cuda, dtype: torch.float, 10000 times           0.1719626784324646
device: cuda, dtype: torch.double, 10000 times          0.16886932775378227
device: cuda, dtype: torch.half, 10000 times            0.16957201063632965
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25857

Differential Revision: D18299368

Pulled By: VitalyFedyunin

fbshipit-source-id: 173eb0f6ca5a12a27f3d53466ff373a5f81f1da8
2019-11-06 13:41:32 -08:00
74b2d9ed2e Skips test_equiv_recurrent (#29255)
Summary:
This test is flaky, per issue https://github.com/pytorch/pytorch/issues/10322.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29255

Differential Revision: D18350782

Pulled By: mruberry

fbshipit-source-id: 53a7d33e17428c2484211618cb71e870ce2d6a03
2019-11-06 13:29:23 -08:00
cc457ca30f split remaining "easy" tests (#29249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29249

This splits out all the tests that are "easy", leaving `TestJit`,
`TestScript`, the autogenerated tests, and a small docs test.

Splitting those into reasonable chunks is more effort which is less
mechanical.

Differential Revision: D18339007

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 69164b9f9a2c379fe8923a846c98dd3c37ccb70e
2019-11-06 13:23:01 -08:00
f93a6e54b9 Add removeAttribute to ClassType (#28984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28984

Support removing an attribute in `ClassType`, `ClassType` is
considered as a low level API and user of this function
need to guarantee the safety of calling this method.

Test Plan:
tbd

Imported from OSS

Differential Revision: D18253776

fbshipit-source-id: 5814baa3fdf6de6c71d3cc1be225ded9116c961a
2019-11-06 11:49:29 -08:00
7069eee227 update gloo submodule (#29248)
Summary:
Update gloo submodule to use the new APIs introduced in https://github.com/facebookincubator/gloo/pull/232. Done by `cd third_party/gloo && git checkout 7c54124` which is gloo's latest commit.

Next step would be to consume the introduced APIs in `ProcessGroup::Work`. Then we can use this layer to be able to interrupt `ProcessGroupAgent` (only for the gloo backend).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29248

Reviewed By: xush6528

Differential Revision: D18350654

Pulled By: rohan-varma

fbshipit-source-id: e41f7446bbb500087a0ca3919173b2e8379c7ce7
2019-11-06 11:33:50 -08:00
eb46d64740 Remove CollisionChecker from typeids (#29242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29242

I don't really know why this is crashing, but it is crashing on ios with a EXC_BAD_ACCESS / KERN_INVALID_ADDRESS.
(see attached task).

Removing it.

ghstack-source-id: 93304255

Test Plan: waitforsandcastle

Differential Revision: D18333464

fbshipit-source-id: 166012fabe1e1b1d84c10f3d3dcc2c1e24bff3aa
2019-11-06 11:28:38 -08:00
ab855d06fb Print aars content detailed size info (#28438)
Summary:
Output:
```
Oct 22 20:22:04 + find . -type f -name '*.a'
Oct 22 20:22:04 + xargs ls -lah
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  12M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libc10.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libclog.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libcpuinfo.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  67M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libeigen_blas.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libpytorch_qnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libtorch.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  12M Oct 22 19:45 ./build_android/install/lib/libc10.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 19:44 ./build_android/install/lib/libclog.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 19:44 ./build_android/install/lib/libcpuinfo.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  67M Oct 22 19:45 ./build_android/install/lib/libeigen_blas.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 19:45 ./build_android/install/lib/libnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 19:45 ./build_android/install/lib/libpytorch_qnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:06 ./build_android/install/lib/libtorch.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  12M Oct 22 19:45 ./build_android/lib/libc10.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 19:44 ./build_android/lib/libclog.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 19:44 ./build_android/lib/libcpuinfo.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 274K Oct 22 19:44 ./build_android/lib/libcpuinfo_internals.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  67M Oct 22 19:45 ./build_android/lib/libeigen_blas.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 19:45 ./build_android/lib/libnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  69K Oct 22 19:45 ./build_android/lib/libnnpack_reference_layers.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins  51K Oct 22 19:44 ./build_android/lib/libpthreadpool.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 19:45 ./build_android/lib/libpytorch_qnnpack.a
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:06 ./build_android/lib/libtorch.a
Oct 22 20:22:05 ++ find . -type f -name '*.aar'
Oct 22 20:22:05
Oct 22 20:22:05 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 + for f in '`find . -type f -name "*.aar"`'
Oct 22 20:22:05 + echo
Oct 22 20:22:05 + echo ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 + ls -lah ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 20K Oct 22 20:22 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 + zipinfo -l ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 Archive:  ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 Zip file size: 20260 bytes, number of entries: 7
Oct 22 20:22:05 -rw-r--r--  2.0 unx      281 b-      177 defN 80-Feb-01 00:00 AndroidManifest.xml
Oct 22 20:22:05 -rw-r--r--  2.0 unx    81895 b-    14629 defN 80-Feb-01 00:00 R.txt
Oct 22 20:22:05 -rw-r--r--  2.0 unx     4816 b-     4632 defN 80-Feb-01 00:00 classes.jar
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 res/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 res/values/
Oct 22 20:22:05 -rw-r--r--  2.0 unx      128 b-      106 defN 80-Feb-01 00:00 res/values/values.xml
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 values/
Oct 22 20:22:05 7 files, 87120 bytes uncompressed, 19550 bytes compressed:  77.6%
Oct 22 20:22:05
Oct 22 20:22:05 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 + for f in '`find . -type f -name "*.aar"`'
Oct 22 20:22:05 + echo
Oct 22 20:22:05 + echo ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 + ls -lah ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 6.6M Oct 22 20:21 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 + zipinfo -l ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 Archive:  ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 Zip file size: 6827798 bytes, number of entries: 12
Oct 22 20:22:05 -rw-r--r--  2.0 unx      269 b-      171 defN 80-Feb-01 00:00 AndroidManifest.xml
Oct 22 20:22:05 -rw-r--r--  2.0 unx    81895 b-    14629 defN 80-Feb-01 00:00 R.txt
Oct 22 20:22:05 -rw-r--r--  2.0 unx    16007 b-    14295 defN 80-Feb-01 00:00 classes.jar
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 res/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 res/values/
Oct 22 20:22:05 -rw-r--r--  2.0 unx      116 b-      100 defN 80-Feb-01 00:00 res/values/values.xml
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/x86/
Oct 22 20:22:05 -rw-r--r--  2.0 unx  1017704 b-   326504 defN 80-Feb-01 00:00 jni/x86/libfbjni.so
Oct 22 20:22:05 -rw-r--r--  2.0 unx 22309852 b-  6470885 defN 80-Feb-01 00:00 jni/x86/libpytorch.so
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 values/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 x86/
Oct 22 20:22:05 12 files, 23425843 bytes uncompressed, 6826596 bytes compressed:  70.9%
Oct 22 20:22:05 + for f in '`find . -type f -name "*.aar"`'
Oct 22 20:22:05 + echo
Oct 22 20:22:05 + echo ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05 + ls -lah ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05
Oct 22 20:22:05 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.2M Oct 22 20:21 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05 + zipinfo -l ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05 Archive:  ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
Oct 22 20:22:05 Zip file size: 1172812 bytes, number of entries: 16
Oct 22 20:22:05 -rw-r--r--  2.0 unx      246 b-      162 defN 80-Feb-01 00:00 AndroidManifest.xml
Oct 22 20:22:05 -rw-r--r--  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 R.txt
Oct 22 20:22:05 -rw-r--r--  2.0 unx    12582 b-     9896 defN 80-Feb-01 00:00 classes.jar
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/arm64-v8a/
Oct 22 20:22:05 -rw-r--r--  2.0 unx   997768 b-   288617 defN 80-Feb-01 00:00 jni/arm64-v8a/libfbjni.so
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/armeabi-v7a/
Oct 22 20:22:05 -rw-r--r--  2.0 unx   599848 b-   219234 defN 80-Feb-01 00:00 jni/armeabi-v7a/libfbjni.so
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/x86/
Oct 22 20:22:05 -rw-r--r--  2.0 unx  1017704 b-   326504 defN 80-Feb-01 00:00 jni/x86/libfbjni.so
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 jni/x86_64/
Oct 22 20:22:05 -rw-r--r--  2.0 unx  1055384 b-   326713 defN 80-Feb-01 00:00 jni/x86_64/libfbjni.so
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 x86_64/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 x86/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 arm64-v8a/
Oct 22 20:22:05 drwxr-xr-x  2.0 unx        0 b-        2 defN 80-Feb-01 00:00 armeabi-v7a/
Oct 22 20:22:05 16 files, 3683532 bytes uncompressed, 1171146 bytes compressed:  68.2%
Oct 22 20:22:05 + xargs tar cfvz /var/lib/jenkins/workspace/android/artifacts.tgz
Oct 22 20:22:05 + find . -type f -name '*aar' -print
Oct 22 20:22:05 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar
Oct 22 20:22:05 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar
Oct 22 20:22:05 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28438

Differential Revision: D18153674

Pulled By: IvanKobzarev

fbshipit-source-id: dce51c61e59a8423fe390405d0c71efc8ffa7deb
2019-11-06 11:24:48 -08:00
9c43b16df9 Revert D18171156: Merge Tensor and Variable.
Test Plan: revert-hammer

Differential Revision:
D18171156

Original commit changeset: 5b6a045beba3

fbshipit-source-id: f5581d902c2305018ea49f8473592be2a465560b
2019-11-06 10:57:00 -08:00
6a4b51aec1 Add the intra-op parallelism for equal operator (#28810)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28810

Similar to https://github.com/pytorch/pytorch/pull/28464 and https://github.com/pytorch/pytorch/pull/28477, we would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model.

Test Plan: CI

Differential Revision: D18165752

fbshipit-source-id: 354cede4c36893acbd69711f49aa6a51dc94397f
2019-11-06 10:30:44 -08:00
9ae6fd2599 explicitly provide memory format when calling to clone() at TensorFactories.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28665

Test Plan: Imported from OSS

Differential Revision: D18333361

Pulled By: ifedan

fbshipit-source-id: 88f19649e708c3e04decc1ca34c7a1faabe6c434
2019-11-06 09:53:55 -08:00
e4c4ff079c group quantized op benchmarks into a new binary (#29288)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29288

More quantized operators have been added to the benchmark suite. We want to split them from the un-quantized ones for easier benchmarking.

Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: QConv2d
# Mode: Eager
# Name: QConv2d_kernel3_G32_H56_OC512_N1_stride2_pad1_W56_IC512
# Input: kernel: 3, G: 32, H: 56, OC: 512, N: 1, stride: 2, pad: 1, W: 56, IC: 512
Forward Execution Time (us) : 5614.996

# Benchmarking PyTorch: QLinear
# Mode: Eager
# Name: QLinear_N6400_IN141_OUT15
# Input: N: 6400, IN: 141, OUT: 15
Forward Execution Time (us) : 2829.075

Reviewed By: hl475

Differential Revision: D18349850

fbshipit-source-id: 5b2fd9c1d5a25068592e5059909bb6c14095f397
2019-11-06 09:48:53 -08:00
114e7382b6 skip cuda test if not on GPU machines
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29287

Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151

Reviewed By: hl475

Differential Revision: D18344574

fbshipit-source-id: 881c857cf901c4539ee1a61171ab41df1c476db7
2019-11-06 09:37:04 -08:00
e86450620d add cuda to all op benchmark (#29285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29285

as title

Test Plan:
```
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: ConvTranspose2d
# Mode: Eager
# Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu
# Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu
Forward Execution Time (us) : 10434.151

Reviewed By: hl475

Differential Revision: D18338258

fbshipit-source-id: 944e87d1ec70daadb205faaf2825d4a2202086c5
2019-11-06 09:37:00 -08:00
27115612ab add execution mode to the test name (#29284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29284

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- iterations 1 --ai_pep_format true
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
PyTorchObserver {"type": "PyTorch_add_M64_N64_K64_cpu_Eager", "metric": "latency", "unit": "ms", "value": "26.64516019518487"}

Reviewed By: hl475

Differential Revision: D18336980

fbshipit-source-id: 1f9d5147a56afeb68cd526a57f7375c5ec39efa4
2019-11-06 09:32:54 -08:00
50fa132bd1 explicitly provide memory format when calling to clone() at SortingKthValue.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28666

Test Plan: Imported from OSS

Differential Revision: D18333371

Pulled By: ifedan

fbshipit-source-id: 11d4bbdaf8e57c97a1c47181ce7e953f2ad5b49e
2019-11-06 09:20:12 -08:00
af45801f0d explicitly provide memory format when calling to clone() at SpectralOps.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28667

Test Plan: Imported from OSS

Differential Revision: D18333358

Pulled By: ifedan

fbshipit-source-id: 6e5d035517e2b9de811c80ec8255dafceb1a511e
2019-11-06 09:17:05 -08:00
d05da7dad3 Fix virtualenv builds on Windows (#29273)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/29058.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29273

Differential Revision: D18349822

Pulled By: ezyang

fbshipit-source-id: c4d76521cc0742d890f22f1d7f32dede5600b651
2019-11-06 09:02:30 -08:00
4e53f3bcfe explicitly provide memory format when calling to clone() at Unique.cu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28668

Test Plan: Imported from OSS

Differential Revision: D18333364

Pulled By: ifedan

fbshipit-source-id: 9e9ce3287021d63d35c2db8b954f0ae548fd19d4
2019-11-06 08:41:53 -08:00
5e0cf05585 explicitly provide memory format when calling to clone() at TensorTransformations.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28664

Test Plan: Imported from OSS

Differential Revision: D18333351

Pulled By: ifedan

fbshipit-source-id: 8a42b65330c55e23e699f2c1ae58824e74cdd1e1
2019-11-06 08:00:16 -08:00
abe05a16ac Revert D18195868: Implementation of cosine learning rate training policy
Test Plan: revert-hammer

Differential Revision:
D18195868

Original commit changeset: 67bdb0b8dd31

fbshipit-source-id: f26761c82788f4c06f624fbd968fb966db8ecb47
2019-11-06 07:50:04 -08:00
689599d07d explicitly provide memory format when calling to clone() at LinearAlgebra.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28661

Test Plan: Imported from OSS

Differential Revision: D18333362

Pulled By: ifedan

fbshipit-source-id: dcb7a1c63473415654d7a964aa732c8f0d5480ec
2019-11-06 07:45:19 -08:00
81bf73643b Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29227

Test Plan: Imported from OSS

Differential Revision: D18330969

Pulled By: VitalyFedyunin

fbshipit-source-id: 54d75c025b40520866b2480ce86e6483e2dcb002
2019-11-06 07:24:42 -08:00
cc1c0120bc Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29226

Test Plan: Imported from OSS

Differential Revision: D18330965

Pulled By: VitalyFedyunin

fbshipit-source-id: 7029848bc1379a50caba6961c7a6e1d56c1fc0ad
2019-11-06 07:24:38 -08:00
e3e06549c1 Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29225

Test Plan: Imported from OSS

Differential Revision: D18330964

Pulled By: VitalyFedyunin

fbshipit-source-id: f357a0cc125bd90a62575bd461722b9e36e75cbf
2019-11-06 07:24:34 -08:00
47f94d5393 Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29224

Test Plan: Imported from OSS

Differential Revision: D18330968

Pulled By: VitalyFedyunin

fbshipit-source-id: 42a5553248bfe4c7084b56850df4bcd323bad638
2019-11-06 07:24:30 -08:00
aeae0d8403 Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29223

Test Plan: Imported from OSS

Differential Revision: D18330967

Pulled By: VitalyFedyunin

fbshipit-source-id: 25c740dd66c64fb533a0a410801ea2a53905c282
2019-11-06 07:24:25 -08:00
d410fc5a81 Autogenerated contiguous memory format for old *_like calls
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29222

Test Plan: Imported from OSS

Differential Revision: D18330966

Pulled By: VitalyFedyunin

fbshipit-source-id: 9e8da4e826cc43fac9828737ef744606491812a4
2019-11-06 07:24:21 -08:00
a248ef7b9c fix autograd support for torch.mean(tensor, dimname) (#29199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29199

Previously, we called `native::mean_cpu_gpu` inside `mean(Tensor, Dimname)`;
`native::mean_cpu_gpu` is not supported by autograd. This PR replaces
`native::mean_cpu_gpu` with `at::mean(Tensor, int)` so that the dimname
overload can piggyback off of autograd support for `at::mean(Tensor,
int)`.

Also added tests (those didn't exist before) for autograd support for
named tensor reduction functions.

Test Plan: - `python test/test_namedtensor.py -v`

Differential Revision: D18334617

Pulled By: zou3519

fbshipit-source-id: 1714eb3fd93714fe860f208831e8d910f01c1c78
2019-11-06 07:21:30 -08:00
ff9d508b88 Remove tools/setup_helpers/cuda.py. (#28617)
Summary:
Except for the Windows default path, everything it does has been done in
FindCUDA.cmake. Search for nvcc in path has been added to FindCUDA.cmake (https://github.com/pytorch/pytorch/issues/29160). The Windows default path part is moved to
build_pytorch_libs.py. CUDA_HOME is kept for now because other parts of
the build system is still using it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28617

Differential Revision: D18347814

Pulled By: ezyang

fbshipit-source-id: 22bb7eccc17b559ce3efc1ca964e3fbb270b5b0f
2019-11-06 07:12:01 -08:00
bc91e19861 Enable ONNX constant folding for opset 11. (#29011)
Summary:
Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 and 10 of ONNX. Opset 11 support was recently introduced in the ONNX exporter. For opset 11, it is currently a no-op. This change enables ONNX constant folding for opset 11. Specifically there are three main changes:
1) Turn on constant folding ONNX pass for opset 11.
2) Enable constant folding tests in `test/onnx/test_utility_funs.py` and `test/onnx/test_pytorch_onnx_onnxruntime.py` for opset 11.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29011

Reviewed By: hl475

Differential Revision: D18306998

Pulled By: houseroad

fbshipit-source-id: afeed21ca29e01c278612e51dacd93397dd6e2d8
2019-11-05 23:22:39 -08:00
ee21142e40 Move custom passes to last optimization step (#29256)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29256

..

Test Plan: ..

Reviewed By: ZolotukhinM

Differential Revision: D18340212

fbshipit-source-id: 30f4850c8a21bdab42c7cf04b4b92b1787449ee2
2019-11-05 20:10:33 -08:00
6ea4219d20 Temporarily disable qnnpack tests on MACOS (#29176)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29176

Captured in issue  #27326

Test Plan:
python test/test_quantized.py test_qconv

Imported from OSS

Differential Revision: D18336184

fbshipit-source-id: 7394b04215b6c8b7bc0508f1648f23022bd031cb
2019-11-05 18:52:45 -08:00
ee8d5e5249 Implementation of cosine learning rate training policy (#29017)
Summary:
Implementation of the cosine learning rate in: https://arxiv.org/pdf/1608.03983.pdf.

Mostly inspired from:
https://github.com/pytorch/fairseq/blob/master/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29017

Test Plan:
buck test -v 2 caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test  -- test_composite_cosine_lr_policy

learning rate log with max_lr=0.3, initial_period=20, t_mult=0.95, lr_shrink=0.95: P120327179

https://pxl.cl/PrcP

full canary: https://fburl.com/fblearner/mw69ylsd

Differential Revision: D18195868

Pulled By: grantlj

fbshipit-source-id: 67bdb0b8dd31d040d16b29d0da3115907bd141ef
2019-11-05 18:19:41 -08:00
d545e4f155 qrelu benchmarking
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29174

Test Plan: Imported from OSS

Differential Revision: D18319345

Pulled By: z-a-f

fbshipit-source-id: b64f0131296771ed201d85664930cceb7be185bd
2019-11-05 17:20:40 -08:00
13f53d0fea Updating submodules
Summary:
GitHub commits:

de05e0e7ac
b6641eb7fa
ec1aa6936b
80479de3f7

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 815f4c5a06826e1a508e5d5016f2be42e96b7fea
2019-11-05 17:07:23 -08:00
6e38c3b89e Make get_trace_graph private
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29149

Test Plan: Imported from OSS

Differential Revision: D18307559

Pulled By: jamesr66a

fbshipit-source-id: 0b6aec2a1d10810d4e7f6b30b256cca79fc4e854
2019-11-05 17:04:36 -08:00
2f2a0d1607 Disables test_atomic_ops and testInputOrder (#29145)
Summary:
These tests have been flaky for some time, see:

- https://github.com/pytorch/pytorch/issues/28179
- https://github.com/pytorch/pytorch/issues/9064

This PR disables them. The actual tests were added/updated 2+ years ago. It's unclear who, if anyone, would own them now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29145

Differential Revision: D18327937

Pulled By: mruberry

fbshipit-source-id: d02731d662aff3545b581272e5ae8db4e3097d87
2019-11-05 16:53:53 -08:00
30f88bb05a Fix the TestApp (#29247)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29247

### Summary

If you run the TestApp using Cocoapods, you'll likely run into an error due to the lack of `config.json` in the main bundle. This PR fixes this crash and updates the README as well.

### Test Plan

- Don't break CIs

Test Plan: Imported from OSS

Differential Revision: D18339047

Pulled By: xta0

fbshipit-source-id: 244cf1ca8729c7ac918258d4eff14d34363e8389
2019-11-05 16:28:51 -08:00
003cb8595b skip more flaky rpc tests (#29157)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29157

As reported, these tests are flaky and time out. Skip them
while we investigate further.
ghstack-source-id: 93287663

Test Plan: CI

Differential Revision: D18309204

fbshipit-source-id: 95f0ea5e0c1162b78da412a34db446a01dfc33bf
2019-11-05 15:49:13 -08:00
35f8b450fc explicitly provide memory format when calling to clone() at SobolEngineOps.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28662

Test Plan: Imported from OSS

Differential Revision: D18333374

Pulled By: ifedan

fbshipit-source-id: c8e18e9937b373daba0ead819622350b693c4bfa
2019-11-05 15:45:50 -08:00
9232143d6a explicitly provide memory format when calling to clone() at Sorting.cpp
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28663

Test Plan: Imported from OSS

Differential Revision: D18333373

Pulled By: ifedan

fbshipit-source-id: 908880dd58d5e795db661a7249a11028f610c328
2019-11-05 15:35:55 -08:00
6389c18709 C++ parity, nn::CrossMapLRN2d (#29039)
Summary:
yf225 https://github.com/pytorch/pytorch/issues/25883
re- pull request because of rebase mistake!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29039

Differential Revision: D18326829

Pulled By: yf225

fbshipit-source-id: 5ed737f6275e4463efa4951d9b7f45c6f2723c82
2019-11-05 15:27:08 -08:00
492764b18f Enable the intra-op parallelism for layer norm (#28464)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28464

We would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model.

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm"

Reviewed By: BIT-silence

Differential Revision: D18063407

fbshipit-source-id: c116e744d78ea50b3aadf2e9a819e5b876a944bf
2019-11-05 15:24:32 -08:00
a5aeb37493 Don't throw when type is used in TorchScript (#28053)
Summary:
Type objects in python have an attribute `__abstractmethods__` that throws when it is accessed, so we were failing with an AttributeError whenever a type was used in TorchScript.

This pr prevents that error from happening. We can't just throw when a type is used because it could be used to access a static method: https://github.com/pytorch/pytorch/pull/27163
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28053

Differential Revision: D18332347

Pulled By: eellison

fbshipit-source-id: 9c7f2220f92674ad4d903621d9762cecc566ab0d
2019-11-05 15:15:12 -08:00
ac027d30d5 Half test time, test_asymmetric_load_with_join, to avoid flakiness (#29139)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29139

Each test has 100 sec timeout.

Current this test takes 90~110 secs to finish, causing flakiness.

Half the load to make it not on the edge of timeout.
ghstack-source-id: 93203670

Differential Revision: D5644012

fbshipit-source-id: 2a85999cf1ae6d18e9a871cd76ce194e1ce7b3e8
2019-11-05 14:54:19 -08:00
ebf5dd447e Cocoapods 1.3.1 release (#29240)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29240

### Summary

The 1.3.1 binary has been uploaded to AWS - https://ossci-ios.s3.amazonaws.com/libtorch_ios_1.3.1.zip. This PR updates the cocoapods version to 1.3.1

### Test Plan

- The 1.3.1 binary works well

Test Plan: Imported from OSS

Differential Revision: D18333750

Pulled By: xta0

fbshipit-source-id: fe6e42c51f3902ad42cab33f473dffb0f6f33333
2019-11-05 14:50:46 -08:00
8a2dcff189 Add cuda version for operators BatchSparseToDense and BatchDenseToSparse (#29166)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29166

As titled

Test Plan:
unittest

 buck test  mode/dev-nosan  caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test

Reviewed By: xianjiec

Differential Revision: D18197966

fbshipit-source-id: 7486300c509dd552ddb7484c2d83099f62878278
2019-11-05 13:06:23 -08:00
fd4f22e4ea Generalized LU factorization (#28608)
Summary:
This PR implements support for generalized LU factorization that is required for various algorithms such as PCA (see issue https://github.com/pytorch/pytorch/issues/8049).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28608

Differential Revision: D18326449

Pulled By: ezyang

fbshipit-source-id: d4011d75710e06e87ddbf5ad9afae42ba3330548
2019-11-05 12:27:40 -08:00
9492994feb submodule swapping via module interface (#28409)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28409

This PR enables submodule swapping via module interface. User could
declare a submodule as an module interface type in the ScriptModule,
during compilation we will record the module interface type in
ModuleInfo of ConcreteModuleType, the JIT type associated will have the
correct ModuleInterfaceType, and CppModule will get the correct module list

Given that we still keep the module interface type in the type system,
the graph is not inlined when we call Module::Attr and it will use
prim::CallMethod to call the method, this allow us to do module swapping
for the ScriptModule that also meet the same module interface type, and
    we only allow the module swapping through the module interface
    approach.

Test Plan: Imported from OSS

Reviewed By: driazati

Differential Revision: D18284309

fbshipit-source-id: 2cb843e4b75fa3fcd8c6020832a81014dbff4f03
2019-11-05 11:31:40 -08:00
f1c78492f8 Revert D18299298: Migrate conv3d from TH to ATen (CPU)
Test Plan: revert-hammer

Differential Revision:
D18299298

Original commit changeset: 97d53e8c976a

fbshipit-source-id: 33057d5a91d11bca136f69bc2d6ff0699d31492a
2019-11-05 11:26:48 -08:00
eb4189089a README (#28533)
Summary:
Copy of android.md from the site + information about Nightly builds

It's a bit of duplication with separate repo pytorch.github.io , but I think more people will find it and we can faster iterate on it and keep in sync with the code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28533

Reviewed By: dreiss

Differential Revision: D18153638

Pulled By: IvanKobzarev

fbshipit-source-id: 288ef3f153d8e239795a85e3b8992e99f072f3b7
2019-11-05 11:06:23 -08:00
26f57cbe5e Revert D18309297: CPU-strided-complex support for ComplexFloat
Test Plan: revert-hammer

Differential Revision:
D18309297

Original commit changeset: adf4bc3a45ba

fbshipit-source-id: de45d9d7863a7f530be6773635b05bc4a7251d96
2019-11-05 10:26:30 -08:00
25e261d6d5 assertEquals is deprecated, use assertEqual instead
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28335

Differential Revision: D18263456

Pulled By: ngimel

fbshipit-source-id: c0f79071feaa5a4c3c4b20505013bf7c4b5455d5
2019-11-05 09:52:21 -08:00
c99cdfeb7d link to documentation for RNNBase.flatten_parameters() (#29196)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28658

I have added the link to the docs for `flatten_parameters`.

RNNBase is a superclass of RNN, LSTM and GRM classes. Should I add a link to `flatten_parameters()` in those sections as well ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29196

Differential Revision: D18326815

Pulled By: ezyang

fbshipit-source-id: 4239019112e77753a0820aea95c981a2c868f5b0
2019-11-05 09:45:21 -08:00
f32ab6157b CPU-strided-complex support for ComplexFloat (#29133)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

Changes
- [x]  Fixed Vec256 Permute operations for Complex Float
- [x]  Fixed copy_kernel_cast between complex data types
  -  copy_kernel_cast should not call std::real during inter-complex dtype conversion.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29133

Differential Revision: D18309297

Pulled By: ezyang

fbshipit-source-id: adf4bc3a45ba2918c8998d59fa94a52f89663e94
2019-11-05 09:17:54 -08:00
21d11e0b64 FindCUDA: Use find_program instead of find_path to find nvcc (#29160)
Summary:
Otherwise nvcc is not found if it is in env PATH but a non-standard
location.

Import from my patch for CMake:
https://gitlab.kitware.com/cmake/cmake/merge_requests/3990

Although we currently do nvcc search in a Python script, it will be removed soon in https://github.com/pytorch/pytorch/issues/28617.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29160

Differential Revision: D18326693

Pulled By: ezyang

fbshipit-source-id: dc7ff3f6026f0655386ff685bce7372e2b061a4b
2019-11-05 08:51:35 -08:00
a02681f804 Cleaned up func removed unused variable (#29179)
Summary:
I don't see `_frames_up` being used anywhere. Just to clean up the code thought it should be removed.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29179

Differential Revision: D18319876

Pulled By: suo

fbshipit-source-id: 5e612ff94ccc88fc85288ffc26213e1d11580c36
2019-11-05 08:48:45 -08:00
7434da2c3f value assigned but never used in _recursive.py (#29181)
Summary:
# Description
I'm new to this project just wanted to start with small bug fixes. I found some unused local variables and I've removed them in this pr.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29181

Differential Revision: D18319893

Pulled By: suo

fbshipit-source-id: e4f9f13b6db2ca213015569deb12d3fd9beb74a8
2019-11-05 08:48:41 -08:00
c6d908d491 Support Conv+BatchNorm fusion for 1d/3d (#29113)
Summary:
Support Conv+BatchNorm fusion for 1d/3d by being adaptive to number of dimensions (partially fixes https://github.com/pytorch/pytorch/issues/28757)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29113

Differential Revision: D18298248

Pulled By: soumith

fbshipit-source-id: 2fc75353aecc0e315c90e63476481acef6ebf784
2019-11-05 08:43:51 -08:00
546ae3002d Migrate conv3d from TH to ATen (CPU) (#29007)
Summary:
This is a port of the VolumetricConvolutionMM TH (CPU) implementation to ATen as `slow_conv3d`.

- [x] unfolded3d_copy & unfolded3d_acc
- [x] forward
- [x] backward
- [x] basic sanity cross check with 1.3 impl
- [ ] systematic testing
- [ ] performance comparison & optimization

Script used for performance testing: [benchmark_conv3d.py](https://gist.github.com/andreaskoepf/8865eea4bb05220f78fc6d9d408c49fc)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29007

Differential Revision: D18299298

Pulled By: ezyang

fbshipit-source-id: 97d53e8c976a09aecbc6f05dd8e982cc58cdf6d8
2019-11-05 08:09:20 -08:00
f2a35db2d3 batch_norm_cpu_inference for channel last (#28982)
Summary:
channels last version for batch_norm_cpu_inference_contiguous

Benchmark:
The benchmark test uses a fixed batch size n=20, channel number in [1,3,10,100,1000], height and width size in [1,4,16,64,256], height and width size are always the same in this test.

We use the following code to do this benchmark.
It tests contiguous, channels last and non-contiguous tensor in each loop and print out the benchmark. It also compare the outputs within each loop to make sure the correctness of the new change.

        for c in [1,3,10,100,1000]:
            for hw in [1,4,16,64,256]:
                print('Benchmark n=20 c={0} h={1} w={2}'.format(c, hw, hw))

                m = nn.BatchNorm2d(c, affine=False)
                m.eval()
                input = torch.randn(20, c, hw, hw)
                output = m(input)
                %timeit m(input)

                for name, param in m.named_parameters():
                    if param.requires_grad:
                        if param.data.dim() == 4:
                            param.data = param.data.contiguous(memory_format=torch.channels_last)
                m.eval()
                input = input.contiguous(memory_format=torch.channels_last)
                output1 = m(input)
                %timeit m(input)

                m = nn.BatchNorm2d(c, affine=False)
                m.eval()
                input = input.permute(0,1,3,2)
                output2 = m(input)
                %timeit m(input)
                output2 = output2.permute(0,1,3,2)

        print(output.equal(output1), output.equal(output2))

Sample output:
Benchmark n=20 c=100 h=256 w=256 -> title line
101 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> contiguous tensor
100 ms ± 898 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) -> channels last tensor
1.3 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> non-contiguous tensor
True True -> 1st output compare with 2nd output, 1st output compare 3rd output, expect True

**Benchmark Before this change:**
Benchmark n=20 c=1 h=1 w=1
10.1 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.2 µs ± 305 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.7 µs ± 784 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=4 w=4
10.2 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.1 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.5 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=16 w=16
11 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
17.3 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=64 w=64
24.2 µs ± 536 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
23.9 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
66 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=1 h=256 w=256
539 µs ± 7.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
539 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.42 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=3 h=1 w=1
10 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
9.97 µs ± 93 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.4 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=3 h=4 w=4
10.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
16.1 µs ± 601 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
19.1 µs ± 658 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=3 h=16 w=16
13.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
25.3 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
32.4 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=3 h=64 w=64
51.1 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
159 µs ± 7.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
199 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=3 h=256 w=256
1.25 ms ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
2.95 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.14 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
True True
Benchmark n=20 c=10 h=1 w=1
9.97 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.5 µs ± 852 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.7 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=10 h=4 w=4
11.2 µs ± 84.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
29.7 µs ± 343 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
39.4 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=10 h=16 w=16
19.7 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
68.3 µs ± 912 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
90.3 µs ± 4.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=10 h=64 w=64
325 µs ± 5.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
918 µs ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
991 µs ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=10 h=256 w=256
9.47 ms ± 73.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
34.7 ms ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
91.5 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=100 h=1 w=1
11.8 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.1 µs ± 800 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12 µs ± 533 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=100 h=4 w=4
26.7 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
231 µs ± 8.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
335 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=100 h=16 w=16
178 µs ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1.45 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.52 ms ± 94.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=100 h=64 w=64
6.9 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
30.3 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
27 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=100 h=256 w=256
98.9 ms ± 818 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.29 s ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.32 s ± 9.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True
Benchmark n=20 c=1000 h=1 w=1
18.6 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
18.7 µs ± 947 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
15.8 µs ± 261 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1000 h=4 w=4
111 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
2.07 ms ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.19 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
True True
Benchmark n=20 c=1000 h=16 w=16
3.87 ms ± 336 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
25.6 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
27 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=1000 h=64 w=64
70.1 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
467 ms ± 26.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
444 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True
Benchmark n=20 c=1000 h=256 w=256
2.39 s ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
19.2 s ± 181 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
22.1 s ± 1.13 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True

**Benchmark After this change:**
Benchmark n=20 c=1 h=1 w=1
10.4 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.5 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.7 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=4 w=4
11.8 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
13.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=16 w=16
11.9 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.1 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
18.2 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1 h=64 w=64
27.6 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
32.2 µs ± 8.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
68.9 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=1 h=256 w=256
601 µs ± 49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
597 µs ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.48 ms ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=3 h=1 w=1
10.8 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.6 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.5 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=3 h=4 w=4
11.6 µs ± 551 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11.7 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
19.9 µs ± 340 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=3 h=16 w=16
13.7 µs ± 223 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
24.7 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
33.7 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=3 h=64 w=64
53.3 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
212 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
204 µs ± 5.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=3 h=256 w=256
1.49 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
3.27 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7.08 ms ± 290 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
True True
Benchmark n=20 c=10 h=1 w=1
10.7 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.8 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
10.8 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=10 h=4 w=4
11.6 µs ± 129 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
12.9 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
43.7 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=10 h=16 w=16
20.7 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
37.2 µs ± 795 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
92.5 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
True True
Benchmark n=20 c=10 h=64 w=64
342 µs ± 9.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
622 µs ± 37.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.03 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=10 h=256 w=256
9.49 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
90.5 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=100 h=1 w=1
12 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
11 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=100 h=4 w=4
22.3 µs ± 451 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
18.7 µs ± 255 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
323 µs ± 6.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=100 h=16 w=16
211 µs ± 22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
222 µs ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.5 ms ± 59.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
True True
Benchmark n=20 c=100 h=64 w=64
7.2 ms ± 1e+03 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.51 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
27.4 ms ± 695 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=100 h=256 w=256
101 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100 ms ± 898 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
1.3 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True
Benchmark n=20 c=1000 h=1 w=1
16.9 µs ± 589 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
16.5 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
16.5 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
True True
Benchmark n=20 c=1000 h=4 w=4
116 µs ± 6.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
67 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.23 ms ± 80 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
True True
Benchmark n=20 c=1000 h=16 w=16
3.53 ms ± 72.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3.53 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
27 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
True True
Benchmark n=20 c=1000 h=64 w=64
68.6 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
68 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
425 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True
Benchmark n=20 c=1000 h=256 w=256
2.51 s ± 97.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
2.84 s ± 471 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
21.5 s ± 933 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
True True

The channel last batch normalization is getting faster with this change and the previous existing code/logic is not affected based on the benchmark above.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28982

Reviewed By: VitalyFedyunin

Differential Revision: D18253305

Pulled By: glaringlee

fbshipit-source-id: a0fcac65544f10d736141ee70edeab8a3f1b3e02
2019-11-05 07:59:39 -08:00
cb6d9deec6 support for cdist (#29129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29129

cdist(x1, x2) does the following:
- assume x1, x2 are 2-dimensional. Then x1, x2 are each considered to be
a list of vectors.
- The operation returns a matrix that is the pairwise distance between
each vector in x1 and each vector in x2. The matrix has first dimension
size equal to the number of vectors in x1 and second dimension size equal
to the number of vectors in x2.
- cdist also supports arbitrary left-hand broadcastable batch
dimensions. In this case, x1 and x2 are each considered to be a batch
of a list of vectors.

The above leads to the following name inference rule for cdist:
- In the 2D case, propagate x1.names[-2] and x2.names[-1] (because
the final result has size (x1.size[-2], x2.size[-2]).
- in the ND case, unify all the batch dimensions together to produce the
output batch dimensions and then apply the rule for the 2D case.

Furthermore, I moved all of the name checking in the implementation to
occur before name inference because name inference assumes that the
shapes are valid.

Test Plan: - new test: `pytest test/test_namedtensor.py -v -k "cdist"`

Differential Revision: D18311867

Pulled By: zou3519

fbshipit-source-id: 713d7cdda93c8fe92e7f1bd7f7c5c6e20a8138e3
2019-11-05 07:24:23 -08:00
3233a058fa Add TensorNames::checkUnique, operator<< (TensorName) (#29124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29124

TensorNames::checkUnique gives a nice error message if there are
duplicate names.

Adding operator<< on TensorName cleans up some code. A TensorName gets
printed out as: "'H' (index 2 of ['N', 'C', 'H', 'W'])" for example.

Test Plan: - New c++ tests. test with `build/bin/NamedTensor_test`.

Differential Revision: D18311868

Pulled By: zou3519

fbshipit-source-id: 5be197dba227f0328b40d7f66e78fffefe4dbd00
2019-11-05 07:24:19 -08:00
2c3c702d29 Fix poisson_nll_loss with full option (#28637)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/28575.

It seems `poisson_nll_loss` was implemented with the incorrect assumption about `masked_select`, which actually doesn't return tensor with the same storage, so in-place operation used there didn't work as intended.
Here I used `masked_fill` instead.

Also, the existing test didn't have `reference_fn`, so I added it (although it's not fundamentally useful since current cpp `poisson_nll_loss` itself does exactly same algorithm as `reference_fn`).

Thanks in advance for reviewing this PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28637

Differential Revision: D18299724

Pulled By: albanD

fbshipit-source-id: 1aac5b20e77bf54874b79018207ba8f743766232
2019-11-05 07:10:35 -08:00
49fba35208 Run clang-format for torch/distributed/rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27531

Test Plan: Imported from OSS

Differential Revision: D17808206

Pulled By: pietern

fbshipit-source-id: 7d23327bfba42dab4b60779c9f03b7952ff0db7a
2019-11-05 06:25:30 -08:00
6c3915643b Rename PythonUDF{Call,Resp} (#27530)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27530

Per discussion in #27286, the `UDF` part is superfluous.

This makes the naming consistent with the `MessageType` enum.

Test Plan: Imported from OSS

Differential Revision: D17808211

Pulled By: pietern

fbshipit-source-id: 0ff925de26d027951ce285750ad276ed17fee4c6
2019-11-05 06:25:26 -08:00
b4df413712 Scope pybind11 functions to torch.distributed.{autograd,rpc}
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27529

Test Plan: Imported from OSS

Differential Revision: D17808209

Pulled By: pietern

fbshipit-source-id: 1e3e086085167320c3fc369467f5d75ce39fa4ea
2019-11-05 06:25:22 -08:00
69f845cb77 C++ API parity: MarginRankingLoss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29000

Test Plan: Imported from OSS

Differential Revision: D18271855

Pulled By: pbelevich

fbshipit-source-id: cbafc7f059173306c83673d7be374c2d3700911f
2019-11-05 05:41:40 -08:00
0d056e75e9 Updating submodules
Summary:
GitHub commits:

d70aa3c904

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 2c18456a882470f185946af2749a4e0c2e6f9cde
2019-11-05 02:10:50 -08:00
ca7d0803e9 use fbgemm's 3d group conv fast path (#29085)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29085

Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/159

Change DNNLOWP operators to use fbgemm's new 3D groupwise convolution (D18192339)

This diff also fixes an issue when column offsets are fused into bias.
In this case, we construct ReQuantizeOutput with col_offsets == 0 and A_zero_point == 0 even if real A_zero_point is 0.
In fbgemmGroupwiseConv, when we call dispatchOutputProcessing, we shouldn't pass the original A_zero_point .

Test Plan: https://github.com/pytorch/pytorch/pull/29134

Reviewed By: dskhudia

Differential Revision: D18282373

fbshipit-source-id: 993d584e7fa8e07c74597304c0fd9386f7ed0e41
2019-11-05 00:58:49 -08:00
9e314f557f Fix for torch.save not saving source files (#28965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28965

Fixed the reference to correct object

Test Plan:
Added new unit test test_serialization_save_warnings in test_torch
    Verified by running the test_torch tests

Imported from OSS

Differential Revision: D18306797

fbshipit-source-id: bbdc7a1aa59a395fcbb736bcc7c3f96db45454d3
2019-11-04 23:16:51 -08:00
026fd36c71 Use at::kLong for torch::tensor(integer_value) when dtype is not specified (#29066)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29066

This PR is BC-breaking in the following way:

Previously, C++ `torch::tensor` with an integer literal or a braced-init-list of
integer literals produces a tensor with dtype being the type of the integer literal(s). After this PR, it always produces a tensor of dtype `at::kLong` (aka. int64_t), matching Python `torch.tensor` behavior.

Test Plan: Imported from OSS

Differential Revision: D18307248

Pulled By: yf225

fbshipit-source-id: 7a8a2eefa113cbb238f23264843bdb3b77fec668
2019-11-04 21:39:10 -08:00
1189f559cc Creating new layer FCWithBootstrap used in bootstrapping uncertainty approach (#29152)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29152

Bootstrapping uncertainty approach: bootstrap the last layer before the last fully-connected layer. FCWithBootstrap is a new layer to handle the logic for the bootstrapping process.

Goal:
- return a struct with the bootstrapped indices and bootstrapped predictions from this layer
- separate the functionality in the train_net and eval_net
- save the bootstrapped FC in this object so that the eval_net can use them during prediction time

Reviewed By: wx1988

Differential Revision: D17822429

fbshipit-source-id: 15dec501503d581aeb69cb9ae9e8c3a3fbc7e7b5
2019-11-04 21:18:15 -08:00
56f7415795 L0 norm approx with budget (#29155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29155

Update the L0 norm regularizer with a budget feature to penalize features over this limit

Formula and summary:

{F212248495}

Test Plan: * Unit test located in: ~/fbsource/fbcode/caffe2/caffe2/fb/dper/layer_models/tests/split_1/fsparse_nn_test.py

Reviewed By: un-disclosed, wx1988

Differential Revision: D17458138

fbshipit-source-id: 2ed9ce6f55573b0bfc0fefbfd392f90c7542a0fd
2019-11-04 21:09:53 -08:00
64cbea0fbb Updating submodules
Summary:
GitHub commits:

0432ab3260

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 77234d9f3baf213270258a6cc21bf4e3cb75ca7f
2019-11-04 20:00:42 -08:00
974702fba0 Removing quantization from the dispatcher. Changing the message.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29054

Test Plan: Imported from OSS

Differential Revision: D18276881

Pulled By: z-a-f

fbshipit-source-id: 3adee0bc784b13f2e00a643d2a96447cb666806d
2019-11-04 17:31:20 -08:00
0d9dc469cc Introduce math_compat.h for older Android versions (#28567)
Summary:
When building with Android NDK platforms prior to android-21,
and when building for Android with libstdc++, there are some
gaps in the C and C++ standard libraries.  We use both for our
internal 32-bit builds, so we need PyTorch to support this platform.

All of the gaps are filled with this math_compat.h header, which
needs to be included in any file that uses one of the functions
that are not properly defined on Android.  The file is a bit
hack-tastic, but it is only used on a platform that is not receiving
updates, so there shouldn't be a risk of breakage in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28567

Test Plan: Internal android build.

Differential Revision: D18099513

Pulled By: dreiss

fbshipit-source-id: 020aab19c6fa083206310b018925d92275d4a548
2019-11-04 17:26:17 -08:00
cb72c9f5b1 Make caffe2/fb folder compatible with AMD (#29131)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29131

caffe2_pb2.CUDA --> workspace.GpuDeviceType
workspace.NumCudaDevices() --> workspace.NumGpuDevices()

Also added the totalGlobalMem into get_device_properties(), which is needed by multi_gpu_utils.py

Test Plan:
sandcastle

f148921769

Reviewed By: bddppq

Differential Revision: D18290090

fbshipit-source-id: bde7c175d1fb6ff59a062266c1b17de39d113b24
2019-11-04 16:40:29 -08:00
02e34919ae Bring back the stack #28426 with Windows build fixed (#28843)
Summary:
ezyang This brings back the stack https://github.com/pytorch/pytorch/pull/28426 with hopefully windows build fixed. Let's wait for the CI to see what happens.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28843

Differential Revision: D18224616

Pulled By: ezyang

fbshipit-source-id: e13051e9ff9cb8d437a733b2c89b4172a379cafc
2019-11-04 16:32:56 -08:00
df22e4c157 Remove Unicode characters from header, fixing lint. (#29126)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29126

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18300420

Pulled By: ezyang

fbshipit-source-id: d9b3ec75098cdb54624e4f98d4c66db1f4ff62bd
2019-11-04 15:07:37 -08:00
379f3ae3ea Double fetch depth. (#29030)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29030

Might fix #27648

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18300252

Pulled By: ezyang

fbshipit-source-id: 542c16b6c1e78c2f9cc45e567f2e0cd1d4272ee3
2019-11-04 15:04:48 -08:00
25261a4776 Merge Tensor and Variable. (#28620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28620

All Tensors are Variables now, they just happen to have requires_grad=False. Tensors ALWAYS have `VariableTensorId` in their type set.

When constructing this patch, I had to make decisions about what I would fix in this patch, and what I would leave for follow up PRs. Here is the cleanup that happens in this patch:

- The `is_variable` property is removed from TensorOptions. I removed this immediately because unlike Tensor::is_variable, TensorOptions::is_variable doesn't respect our VariableTensorId thread-local state. This means that there were a bunch of places where TensorOptions::is_variable was false, which is obviously bogus in the world when tensor and variable are merged. Instead of keeping the method as a function that always returns true, I just opted to remove it entirely (it's not public API.) All places we set `is_variable` are deleted.
  - Knock on effect: there is no longer a separate DeprecatedTypeProperties for the variable and non-variable versions of type.
  - Knock on effect: instead of asserting on TensorOptions::is_variable, instead we just test `at::impl::variable_is_excluded()`
- There is now only one copy of the cuDNN RNN dropout cache, not two (I'm not sure why we had two to begin with)

Some cleanup that doesn't happen in this patch:
- Eliminating unnecessary uses of `make_variable`
- Eliminating `Tensor::is_variable`

The most subtle part of this patch is retaining tracing behavior: the fact that everything is a Variable means that more code gets routed to VariableType than before; this can change traces. I identified two places where we didn't appropriately turn off VariableType, mostly factory functions:

- `torch.tensor` must turn off VariableType before invoking `at::empty` to construct the tensor, as it subsequently does direct data access
- `tensor_slow` (invoked when you pass a Python scalar to a tensor argument) must turn off VariableType before calling `scalar_to_tensor` so the scalar gets traced as constant, rather than as a call to `scalar_to_tensor`.

Honestly, these are all giant hacks, and should be replaced with a more specialized guard that just toggles tracing.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: dreiss

Differential Revision: D18171156

Pulled By: ezyang

fbshipit-source-id: 5b6a045beba37492647e350190f495114e86504d
2019-11-04 14:59:57 -08:00
215ac1065a Print which output didn't have dependence. (#29047)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29047

When a tuple is returned, it is helpful to know specifically
which output was the culprit.

Actually, it was somewhat /more/ helpful to actually see the
contents of the tensor which didn't have dependence (or, e.g.,
the backtrace of the code that populated it), but that seemed
a step too far.
ghstack-source-id: 93091993

Test Plan:
manually tested because I was debugging an incorrect
trace and looked to see that the output number was indeed identifying
the correct tensor.

Reviewed By: dreiss

Differential Revision: D18274323

fbshipit-source-id: f1551bb03a3cdfa58b9e7f95736d53f317f53d5e
2019-11-04 14:59:53 -08:00
150357c887 Updating submodules
Summary:
GitHub commits:

f746854f94
99e8fc1fc4
1c0794abd7
126d5bb8c5

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 51255fdf8b51237d08afa5362d9dc19e6961ea28
2019-11-04 14:56:13 -08:00
fd0f9811ad add timeout for RPC futures, and ability to set timeout when initializing rpc (#28392)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28392

Per #25531, we want to clean up futures when we detect that there are
failures/timeouts. As a first step, this diff adds timers to the future object,
provides functionality to check if a future is timed out, and allows
specification of the timeout when initializing rpc. A future diff will check for these timeouts and mark the future completed with an exception indicating that it has timed out.
ghstack-source-id: 93192622

Test Plan: Added unit tests.

Differential Revision: D18025163

fbshipit-source-id: 195fb50c736caf5c7b2bada9a5f6116bb106ed33
2019-11-04 14:43:03 -08:00
60cb56d128 Refactor iterables (#29138)
Summary:
Refactor list comprehensions so they go through the same path as other for loops, making List Comprehensions work with modulelists, also fixing https://github.com/pytorch/pytorch/issues/27255

Replacing https://github.com/pytorch/pytorch/pull/28296 which was gh-poisoned and previously accepted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29138

Differential Revision: D18303432

Pulled By: eellison

fbshipit-source-id: 8e4c0ba6f800142d5c4d921d56917cfae0c74655
2019-11-04 14:39:22 -08:00
7560b8c5a7 Modify ONNX constant folding test point in test_utility_funs.py for clarity (#28861)
Summary:
This is a minor update to the test point `TestUtilityFuns.test_constant_fold_concat` in `test/onnx/test_utility_fun.py` for clarity. Unlike before, the test model forward() method now uses the input `x`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28861

Differential Revision: D18306881

Pulled By: houseroad

fbshipit-source-id: dda8b4123e7646c2e416ce914a4698f9b96e2a6c
2019-11-04 14:37:01 -08:00
7102aceaf8 Default to not build Caffe2 operators on Windows. (#29061)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29061

It looks like we are too close to the maximum library size on
Windows.  Kill Caffe2 operators to get us lower again.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: smessmer

Differential Revision: D18281083

Pulled By: ezyang

fbshipit-source-id: 8a11f9059dbf330f659bd96cc0cc2abc947723a8
2019-11-04 14:32:47 -08:00
044ff91950 reduce predefined_min_secs for execution time (#29142)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29142

 as title

Test Plan:
```
Before this diff:
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 122.965

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 229.735

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 950.455

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 826.893

After this diff:
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test;
Parsing buck files: finished in 0.7 sec
Building: finished in 02:35.7 min (100%) 7281/7281 jobs, 1 updated
  Total time: 02:36.4 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 125.021

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 244.076

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 946.280

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu_bwd1
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 863.835

Reviewed By: hl475

Differential Revision: D18305676

fbshipit-source-id: d382084e39b87c554084891f87701b87cd2d3800
2019-11-04 14:29:00 -08:00
20e8634999 pass more arguments to Int8ConvPackWeight op in unit tests (#29086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29086

For Int8ConvPackWeight to decide which convolution implementation should be used, we need to pass more arguments.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D18286931

fbshipit-source-id: d178cc6d696d0e83aad18bb34eb071f44b0c2015
2019-11-04 13:55:24 -08:00
7fb2ccaed8 Update type definitions for nn.Identity (#29135)
Summary:
Updated PR instead of https://github.com/pytorch/pytorch/issues/29114

Running mypy on the following code is throwing an error, Module has no attribute Identity:
```
import torch.nn as nn
layer = nn.Identity()
```
Using the following instead does not give an error:
```
import torch
layer = torch.nn.Identity()
```

CC: ezyang soumith (Sorry for causing the revert previously! Hope this one works fine!)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29135

Differential Revision: D18306331

Pulled By: ezyang

fbshipit-source-id: f10be8a0cccecef423184d009bad8be6d54098a5
2019-11-04 13:27:13 -08:00
e01324d058 Port l1_loss to Aten (#26795)
Summary:
VitalyFedyunin, This PR is about port L1 lose to Aten:
**Test script:**
```
import torch
import torch.nn as nn
import time

torch.manual_seed(0)

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
loss = nn.L1Loss(reduction = 'sum')
if torch.cuda.is_available():
    device = "cuda"
    loss = loss.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    target = torch.randn(128, n, device=device)
    for i in range(1000):
        output = loss(input, target)
        output.backward()

#get running time
for n in [100, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    target = torch.randn(128, n, device=device)
    for i in range(10000):
        t1 = _time()
        output = loss(input, target)
        t2 = _time()
        output.backward()
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
Test Device: CPU: skx-8180, GPU: Tesla P100.

**Perfromance:**
Before:
```
GPU:
reduction=’mean’
nput size(128, 100) forward time is 0.31 (ms); backwad avg time is 0.09 (ms).
input size(128, 10000) forward time is 0.33 (ms); backwad avg time is 0.14 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.31 (ms); backwad avg time is 0.10 (ms).
input size(128, 10000) forward time is 0.34 (ms); backwad avg time is 0.14 (ms).

CPU:
reduction=’mean’
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.10 (ms).
input size(128, 10000) forward time is 1.92 (ms); backwad avg time is 2.96 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms).
input size(128, 10000) forward time is 1.96 (ms); backwad avg time is 2.79 (ms).

nume_thread = 1:
reduction=’mean’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 1.67 (ms); backwad avg time is 2.50 (ms).
reduction=’sum’:
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 1.67 (ms); backwad avg time is 2.51 (ms).
```
After:
```
GPU:
reduction=’mean’
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.10 (ms).
input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.17 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.08 (ms).
input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.16 (ms).

CPU:
reduction=’mean’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 0.14 (ms); backwad avg time is 0.18 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 0.15 (ms); backwad avg time is 0.17 (ms).

nume_thread = 1:
reduction=’mean’:
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.06 (ms).
input size(128, 10000) forward time is 1.05 (ms); backwad avg time is 1.72 (ms).
reduction=’sum’:
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 1.03 (ms); backwad avg time is 1.71 (ms).
```

How to set number thread? using following script:
```
num_threads=$1
script=$2
last_core=`expr $num_threads - 1`

echo "using $num_threads OMP threads"
echo "bind cores to 0~$last_core"

export OMP_NUM_THREADS=$num_threads
export KMP_AFFINITY=granularity=fine,compact,1,0

numactl --physcpubind=0-$last_core --membind=0 python $script
```
and run `./run.sh 1 L1loss.py`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26795

Differential Revision: D18140434

Pulled By: VitalyFedyunin

fbshipit-source-id: d0b976ec36797f2e6b4e58fbbac89688d29e736f
2019-11-04 13:20:07 -08:00
ebc216a076 Opset 11 updates (#28225)
Summary:
This PR contains:
1- pad updates for opset11 symbolic
2- Updated avg_pool for opset11
3- TopK updates for opset 11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28225

Reviewed By: hl475

Differential Revision: D18282928

Pulled By: houseroad

fbshipit-source-id: aff2cabca9a155a9b475e35fed69a678544d6669
2019-11-04 12:16:12 -08:00
669662cd2f Updating submodules
Summary:
GitHub commits:

c72ce78355
3c1420258d

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 48e6a2175e768dbbaf67dfec557c7741808a9458
2019-11-04 12:08:18 -08:00
7190789f58 Handling of failing and terminal async cpu ops (#29052)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052

Make sure we handle the case of multiple, async, terminal (no children)
and failing cpu ops.

Test Plan: AsyncIf tests

Reviewed By: yyetim

Differential Revision: D18276401

Pulled By: ilia-cher

fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60
2019-11-04 12:01:21 -08:00
19ac5929e2 Remove definitions of acosh and asinh from TH (#28696)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28696

They are not used anywhere.

Test Plan: Imported from OSS

Differential Revision: D18302769

Pulled By: VitalyFedyunin

fbshipit-source-id: 8680951cbceb607ef545f92cbfa9204ce8f7ac4a
2019-11-04 11:56:25 -08:00
24d43750ee Updating submodules
Summary:
GitHub commits:

b4d85028d8

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 0417fe507561cde8b7739ae289a8d16d1429bea5
2019-11-04 11:04:26 -08:00
69b1d71427 Fix GELU module docs (#29112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29112

Fix GELU module docs

Test Plan: unittest

Reviewed By: hl475

Differential Revision: D18297681

fbshipit-source-id: 6b86a1a58c62fbb3b1395639271ee16c4043d03d
2019-11-04 10:45:00 -08:00
00a561a23a Fix build error caused by recent commits. (#29056)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29056

There are a couple of recent published diffs break the internal pytorch build, so fix it here.
ghstack-source-id: 93101569

Test Plan:
buck install -r aidemos-android
buck install -r fb4a

Reviewed By: iseeyuan

Differential Revision: D18236331

fbshipit-source-id: e1cecae8c30fd9b23b6bf379f652b4926542618d
2019-11-04 10:13:09 -08:00
93acd1998f Revert D18249048: Moved VonMises distribution with sampling upstream from Pyro.
Test Plan: revert-hammer

Differential Revision:
D18249048

Original commit changeset: 3e6df9006c7b

fbshipit-source-id: 001666e4b5b9879d36147bacfc761ea661ded900
2019-11-04 09:50:50 -08:00
0a4433750e Updating submodules
Summary:
GitHub commits:

9588c8bbf9

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: b882fe57643c1983e9859290e2dddec198a78ed0
2019-11-04 09:24:25 -08:00
fdeef45852 Add Support For Module Containers as Iterables (#28255)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28255

Add support for treating Sequentials, ModuleLists, and ModuleDicts as iterables.

As previously, when emitting a for loop over a Module Container we unroll the for loop over all elements. We require that any Sugared Value in an iterable with a Module Container have a statically - determinable length.

Otherwise, if you zipped over a list of varying length and an nn.Sequential that alternated between returning a Tensor and a Dictionary, the output type would change based on the length of the list.

Fix for #17179
And https://github.com/pytorch/pytorch/issues/27401
and https://github.com/pytorch/pytorch/issues/27506

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D18278124

Pulled By: eellison

fbshipit-source-id: aca336a5b8da89c756b1f0884883649510cbde3c
2019-11-04 09:19:40 -08:00
Jie
8160f390cf (#23861)
Summary:
Added nhwc support for:
1. cudnn_batch_norm & cudnn_batch_norm_backward
2. cudnn_convolution_forward & cudnn_convolution_backward
3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward

patching suggest_memory_format for convolution

suggest_memory_format has ambiguous meaning for two cases:
1. tensor with NCHW where C = 1.
   we could use stride of C as a hint to tell the intended memory format.
2. tensor with NCHW where H == W == 1.
   there's no way to identify the intended memory format from strides.

Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding
ambiguity for some of the special cases.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861

Differential Revision: D18263434

Pulled By: VitalyFedyunin

fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9
2019-11-04 09:11:50 -08:00
Jie
70f3f23e3a (#29016)
Summary:
Adding limitation on launch config for grid size
Test added in test_cuda;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29016

Differential Revision: D18293788

Pulled By: ngimel

fbshipit-source-id: 44de308b05a4fe44bfffc2f3713fd9fa67ef74fa
2019-11-04 08:50:18 -08:00
0f97e08a36 Moved VonMises distribution with sampling upstream from Pyro. (#17168)
Summary:
At the encouragement of Pyro developers and https://github.com/pytorch/pytorch/issues/13811, I have opened this PR to move the (2D) von Mises distribution upstream.
CC: fritzo neerajprad
Pull Request resolved: https://github.com/pytorch/pytorch/pull/17168

Differential Revision: D18249048

Pulled By: ezyang

fbshipit-source-id: 3e6df9006c7b85da7c4f55307c5bfd54c2e254e6
2019-11-04 08:44:11 -08:00
7ff39d2942 LayerNorm: Handling if batch size is zero (#28614)
Summary:
Handling of empty example was giving a cuda error.
Adding getLastError check to make sure cuda errors are attributed to the
correct function (instead of currently it was attributing the error to the next
cuda operator).
Added special case for batch-size zero, also added to cpu to keep things
consistent.

Resubmit of D18085429 without stacked commits
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28614

Test Plan: test included

Differential Revision: D18122212

Pulled By: ggoossen

fbshipit-source-id: 8c6741a157a9fbbc82685d81a6f8021452b650d4
2019-11-04 08:37:19 -08:00
23695ab23f Moving python allgather_coalesced impl from Py to C. (#29059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29059
This is a resubmit of reverted diff D18209289 ( PR #28857 ).

Test Plan:
buck test caffe2/test:c10d
buck test caffe2/test:distributed_gloo

Reviewed By: pietern

Differential Revision: D18277097

fbshipit-source-id: aecfd7206d70829f0cac66182bf02fccee410fed
2019-11-04 08:34:34 -08:00
a1386bd950 Fix smoketests by running them with postnightly job. (#28994)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28994

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18273476

Pulled By: ezyang

fbshipit-source-id: de59faa49c13198c18e61fdb05ab1d3d7cc16e08
2019-11-04 08:30:17 -08:00
0fbce15828 Retry conda installation on OS X. (#28979)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28979

Fixes #28969

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18273477

Pulled By: ezyang

fbshipit-source-id: 9bcc10034a4ad7d55709dd54735d60500043da65
2019-11-04 08:30:13 -08:00
a90389f20e Port cuda sigmoid to Aten(CUDA) (#26643)
Summary:
VitalyFedyunin, this PR port cuda sigmoid to Aten: https://github.com/pytorch/pytorch/issues/24624; TH/THC sigmoid code can 't be removed because the sigmoid_backward in THNN/THCUNN rely on it.  I will port sigmoid_backward to Aten next step, incuding CPU and CUDA, which will remove the sigmoid code in TH/THC .

Test script:
```
import timeit

 device = "cuda"
 for n, t in [(10, 100000),(1000, 10000)]:
     print('a.sigmoid() (a.numel() == {}) for {} times'.format(n, t))
     for dtype in ('torch.float', 'torch.double', 'torch.half'):
         print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t')
         print(timeit.timeit(f'a.sigmoid()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t))
```

Device: **Tesla P40**

Before:
```
a.sigmoid() (a.numel() == 10) for 100000 times
device: cuda, dtype: torch.float, 100000 times          1.2853778750286438
device: cuda, dtype: torch.double, 100000 times         1.2787265420192853
device: cuda, dtype: torch.half, 100000 times           1.2610833930084482
a.sigmoid() (a.numel() == 1000) for 10000 times
device: cuda, dtype: torch.float, 10000 times           0.1274153349804692
device: cuda, dtype: torch.double, 10000 times          0.13953313598176464
device: cuda, dtype: torch.half, 10000 times            0.1265286349807866
```
After:
```
a.sigmoid() (a.numel() == 10) for 100000 times
device: cuda, dtype: torch.float, 100000 times          1.275270765996538
device: cuda, dtype: torch.double, 100000 times         1.285128042974975
device: cuda, dtype: torch.half, 100000 times           1.2761492819990963
a.sigmoid() (a.numel() == 1000) for 10000 times
device: cuda, dtype: torch.float, 10000 times           0.12851508799940348
device: cuda, dtype: torch.double, 10000 times          0.13738596899202093
device: cuda, dtype: torch.half, 10000 times            0.12715664599090815
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26643

Differential Revision: D17666550

Pulled By: VitalyFedyunin

fbshipit-source-id: 376479d94d0649c171fd0b2557699bbdd050fec3
2019-11-04 07:40:06 -08:00
bbada862dc Revert D18298225: Update modules/__init__.pyi.in to include Identity
Test Plan: revert-hammer

Differential Revision:
D18298225

Original commit changeset: b271bf000868

fbshipit-source-id: 77667adf6817a242f4f2e4eaa7ea8190f5090c49
2019-11-04 07:28:56 -08:00
a0dc060682 Update modules/__init__.pyi.in to include Identity (#29114)
Summary:
Running mypy on the following code is throwing an error, `Module has no attribute Identity`:

```
import torch.nn as nn
layer = nn.Identity()
```

Using the following instead does not give an error:

```
import torch
layer = torch.nn.Identity()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29114

Differential Revision: D18298225

Pulled By: soumith

fbshipit-source-id: b271bf00086876cca8d63ae0cde6cebf69a7051e
2019-11-04 06:33:03 -08:00
2460dced8f Add torch.nn.GELU for GELU activation (#28944)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28944

Add torch.nn.GELU for GELU activation

Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU"

Reviewed By: hl475, houseroad

Differential Revision: D18240946

fbshipit-source-id: 6284b30def9bd4c12bf7fb2ed08b1b2f0310bb78
2019-11-03 21:55:05 -08:00
3bffb730b6 Add note about when to install typing package (#29103)
Summary:
Was just trying to build pytorch from source and had a small hiccup because the instructions say to `conda install typing`. Because `typing` is a built-in module in recent Python 3 versions, conda interpreted that to mean that I want Python 2. So I added a note to the docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29103

Differential Revision: D18294139

Pulled By: soumith

fbshipit-source-id: 621a2f62ebe870520197baec8f8bcdc1a0c57de9
2019-11-03 19:38:55 -08:00
e95dc9814e introduce module interface declaration (#28408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28408

This enable interface to defined on a nn.Module, and the InterfaceType
now have a field of is_module_ to distinguish if it's a module interface
or a normal interface (This is similar to what ClassType distinguish on
module and torchscript classes).

The module interface can be assigned with any ScriptModule that has the
compatible signatures on schemas. A normal object that is not a
ScriptModule will not be able to assigned to an module interface and
will error out when user explicitly doing so. Assigning a ScriptModule
to class interface will make it only available in attribute_list, not
module_list. More details on subtyping relationship documented in the
jit_type.h

If you declare an module interface inside an nn.Module that is being
compiled to a ScriptModule, behavior to our internal compilation will
be:

1. ConcreteModuleType will record it as an module attribute and add to
   the attributes_ list.
2. JitType that is created from the ConcreteModuleType will record it as
   an attribute and pre-genenerate the slot. The slot will be marked as
   EntityType::MODULE still to make sure JitType record it as a Module
   slot
3. cpp_module will also register it as a Module as the Slot type is the
   source of truth

Since JitType will record it as attribute as store its type, it will
behave normally as the class interface attribute behave now. This means
the submodule assigned to this module interface is not getting inlined
into the graph as the normal `Module::attr` behave, it will generate
interface callMethod and allow us to later swap this with another
ScriptModule that implicitly implements this module interface.

Test Plan: Imported from OSS

Differential Revision: D18284311

fbshipit-source-id: e0b8f6e8c34b2087fab337a969e5ea3fb37ec209
2019-11-02 16:39:00 -07:00
1e904049ca guard against inheritance on torchscript classes (#28407)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28407

Given that we do not have support for inheitance or any polymorphism
strategy yet, we should guard against user from using it until we get
the full support so that user won't confuse by the weird behaviors.

Test Plan: Imported from OSS

Differential Revision: D18284310

fbshipit-source-id: f55a224f4190d57926d91ed98f6168d787387eb8
2019-11-02 16:38:56 -07:00
73d77626b8 Check device connection before running xcodebuild (#28996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28996

### Summary

It'd be frustrated to realize the device is not connected after waiting for the build finishes. This PR checks the device connection status before xcodebuild.

### Test Plan

- Don't break `bootstrap.sh`

Test Plan: Imported from OSS

Differential Revision: D18258348

Pulled By: xta0

fbshipit-source-id: dda90e7194114e99b2774a3b64ed41f78221f827
2019-11-02 14:38:08 -07:00
0c5e738cf7 Updating submodules
Summary:
GitHub commits:

612ae995a6
a6f5d4d621
799f6a8c0d

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 365832bf627e855c0aa15e083a873894368b0cfd
2019-11-02 14:38:04 -07:00
496d23224f Updating submodules
Summary:
GitHub commits:

7cb2e01c52
0d91a981e9
adeb2b0e38

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 9dbe4512464819932e1a82ae08c3ab37e7f7c1ff
2019-11-01 19:03:44 -07:00
1345dabb1d Only set CCACHE_WRAPPER_PATH in the build scripts if it is not already passed in.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29002

Test Plan: Imported from OSS

Differential Revision: D18277225

Pulled By: AshkanAliabadi

fbshipit-source-id: eb70607790754cd5d214133967404242c05dd5d5
2019-11-01 18:39:12 -07:00
e8e7d93293 Additional autograd unit tests for Python UDFs. (#29041)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29041

1) Enhanced autograd unit tests to test the
torch.distributed.autograd.backward() API more thoroughly on Python UDFs.
2) Enhanced `python_error` to override `what` such that it returns an
appropriate error string if we call `what()` on this error. This ensures we can
propagate exceptions over the wire during RPCs (since we get the error string
by calling what() on the exception)
ghstack-source-id: 93098679
ghstack-source-id: 93098679

Test Plan: waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D18273041

fbshipit-source-id: 85d3932fed6337668a812367fdfce233c1b3ff8e
2019-11-01 18:30:09 -07:00
a68c1e109e C++ API: torch::nn::BatchNorm{2,3}d (#28936)
Summary:
Add torch::nn::BatchNorm{2,3}d module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883 #28176

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28936

Differential Revision: D18274584

Pulled By: yf225

fbshipit-source-id: 3784eee9f8947f6c7c9f1699544a3d36a1a019b7
2019-11-01 17:50:33 -07:00
23193c155f Quantized Tensor support copy (#28612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612

att

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D18255247

fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68
2019-11-01 17:40:17 -07:00
41e42c34d6 Revert D17989951: Move unboxed dispatch decision into dispatcher
Test Plan: revert-hammer

Differential Revision:
D17989951

Original commit changeset: b343d9650deb

fbshipit-source-id: 0d2f470bab47e40fcffd5ec23f88549da15af873
2019-11-01 14:11:59 -07:00
cddda17394 ParallelWorkersTest.testParallelWorkersInitFun is flaky (#29045)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29045

Addressing an issue seen in GitHub https://github.com/pytorch/pytorch/issues/28958

It seems sometimes the workers in this test don't stop cleanly.  The purpose of this test is to check that the init_fun in init_workers works as expected, which is captured by the assertEqual in the for loop in the test.  The behavior of stop() is not really important here.

The fact it's returning false is probably indicative that a worker is getting blocked but that doesn't affect the correctness of the test.

Test Plan: Ran the test 100 times, it consistently succeeds.

Reviewed By: akyrola

Differential Revision: D18273064

fbshipit-source-id: 5fdff8cf80ec7ba04acf4666a3116e081d96ffec
2019-11-01 13:59:02 -07:00
314066bd74 Making torch/csrc/cuda nccl usage safe for nccl 2.5 (#29014)
Summary:
Thanks to AddyLaddy ptrblck for tracking this fix down.

In torch/csrc/cuda/nccl.cpp and torch/csrc/cuda/python_nccl.cpp, construction of the `AutoNcclGroup` guard (which calls `ncclGroupStart()`) [precedes](https://github.com/pytorch/pytorch/pull/29014/files#diff-3b6a42619dd44000cf58c0328b679a1cL239-L241) a possible call to `get_communicators`, which may call `ncclCommInitAll()`.  Calling `ncclCommInitAll()` within a `ncclGroupStart()/End()` is incorrect according to our Nccl people.

It seemed ok (relevant tests were silently passing) as long as Pytorch was compiled/linked against Nccl 2.4.x (which is currently what's locked into your third_party/nccl subrepo).  However, when we tried to compile and link against Nccl 2.5.x in internal builds, we began to see test hangs (TestAutogradDeviceTypeCUDA.test_unused_output_device_cuda was what initially brought it to our attention).

The present PR fixes those hangs, as far as we know, and will prevent a nasty future surprise when you start building against nccl 2.5.

The backend affected by this PR is exposed via https://github.com/pytorch/pytorch/blob/master/torch/cuda/nccl.py.  I'm not sure if the exposure is actually used anywhere (I think the distributed frontend is now backed by ProcessGroupNCCL in torch/lib/c10d).  So this PR may affect code that is already dead or dying, but still tested, it seems.

I skimmed ProcessGroupNCCL.cpp for potential similar vulnerabilities and didn't spot anything obvious.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29014

Differential Revision: D18274799

Pulled By: ezyang

fbshipit-source-id: c5f88cf187960d61736be14458be01e3675c6702
2019-11-01 13:53:31 -07:00
d8d7af0811 Fix CUDA shared memory out of bound access in findPattern (#28989)
Summary:
This fixes https://github.com/pytorch/pytorch/issues/28789

Only the first two elements of `smem` are used in this function but at the beginning, it resets all the `C10_WARP_SIZE` to 0. When the `scalar_t` is 64bit, it goes out of the total shared memory size which is `sizeof(int) * C10_WARP_SIZE`, although this does not lead to any failure in CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28989

Differential Revision: D18271598

Pulled By: ngimel

fbshipit-source-id: 38cc863722509892646f719efb05e2730a7d9ae1
2019-11-01 13:50:25 -07:00
bace0c8d7a remove a redundant move preventing a copy elision
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29040

Differential Revision: D18272902

Pulled By: Krovatkin

fbshipit-source-id: 23d4546aeb8945b7c7a5d472f543171699fc08b9
2019-11-01 13:13:00 -07:00
b693c5d6a0 replace add benchmark with add_ (#29050)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29050

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add_
# Mode: Eager
# Name: add__M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 31475.766

Reviewed By: hl475

Differential Revision: D18265767

fbshipit-source-id: 7aaa04f5fa5b2dd58bbc1aa045693314032e0ff0
2019-11-01 13:08:27 -07:00
1e2049c566 #26426 fixed (#28715)
Summary:
This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426
houseroad bddppq soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715

Reviewed By: hl475

Differential Revision: D18146731

Pulled By: houseroad

fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130
2019-11-01 12:53:01 -07:00
4a94eaa60b C++ API parity: PoissonNLLLoss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28755

Test Plan: Imported from OSS

Differential Revision: D18202436

Pulled By: pbelevich

fbshipit-source-id: a7a27d5f3cdbcbbd9bbbffa02b576609d5fdc9b3
2019-11-01 12:35:59 -07:00
7ea83120df Fixing the shape calculation for pool tests
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28853

Test Plan: Imported from OSS

Differential Revision: D18212290

Pulled By: z-a-f

fbshipit-source-id: 44a41f3192c8b168a8a0fb68eb33b68400917c7a
2019-11-01 12:29:27 -07:00
5ac3df7712 Minor fix and turn off fold_convbn (#27403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27403

In fold_convbn pass, we need to recompute the parameter(weight, bias) for
conv, update the attribute of conv and update the access of bias in conv
because if the original conv have no bias, the `self.bias` access will be
inline and replaced by Constant node `None = prim::Constant()`, we need to
update this to use `GetAttr[name="bias"]` to make this work. But there is
also some work going on the handle constants, so we'll fix this pass after
that is done.

Test Plan:
.

Imported from OSS

Differential Revision: D18182918

fbshipit-source-id: bba510bc41ab58e0eb76f7b77335b6e3ffe2862d
2019-11-01 12:15:38 -07:00
d690521cf6 Add e2e test for conv+bn (#27348)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27348

att

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18182920

fbshipit-source-id: 40edc4d85903f979cd4755d6785d2842faa4d566
2019-11-01 11:28:47 -07:00
9041e29d94 Revert D18209289: Moving python allgather_coalesced impl from Py to C
Test Plan: revert-hammer

Differential Revision:
D18209289

Original commit changeset: c5a4c4a1aaa0

fbshipit-source-id: d4865e3f8c4eeee285c711e5c2250b8c9f9b0d25
2019-11-01 11:23:41 -07:00
dbbb2fc9e5 Remove the linkage to CUDA libraries when ROCM is used. (#29009)
Summary:
Currently when ROCM is used, CUDA libraries are still linked. There has
been no error because USE_CUDA is set to OFF upon a preliminary check in
tools/setup_helper/cuda.py, and no CUDA variable is set. Hence, these
lines can pass simply because those variables are always undefined, and thus expanded to empty strings. But this
cannot be safely relied on, and is causing https://github.com/pytorch/pytorch/issues/28617 to fail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29009

Differential Revision: D18273472

Pulled By: ezyang

fbshipit-source-id: b8b6580e8a44d874ac678ed9073412d4d2e393ee
2019-11-01 11:18:21 -07:00
a49a656264 Updating submodules
Summary:
GitHub commits:

efdfedc749
15a29e620b

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: df306a4b693299f76d904bf15f24bb2cf367ab30
2019-11-01 11:11:58 -07:00
71be5fe54e add support for {ones,zeros,full,rand,randn}_like ops (#28981)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28981

This PR adds support for calling those functions on named tensors. The
implementation is not the nicest: in the future we have plans to merge
names into TensorOptions, at which point we don't need the extra
branches that check if the tensor has names. Right now, however, these
functions are very useful to have (in particular, ones_like is used by
autograd to generate gradients).

Test Plan: - Added tests for each of these

Differential Revision: D18270937

Pulled By: zou3519

fbshipit-source-id: 720739ff0474449a960b81728345a4250becbfc3
2019-11-01 11:04:42 -07:00
0a101bf8d5 Improve name inference API by introducing a TensorName helper struct (#28904)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28904

Motivation
============

Before this PR, a core problem with writing name inference rules was
that each rule needed to handle misalignment by themselves. A misaligned
name occurs when we are matching None with a non-None name, but the
non-None name already exists in the first tensor.

For example, `A` is misaligned in `Tensor[A, None] + Tensor[None, A]`.

Each op handled this in a custom way
- align_from_right (used by broadcasting) handles misalignment
- compute_matmul_outnames checks for misalignment across batch and
feature dimensions.

We can actually codify "misalignment" into something more rigorous by
folding it into the definition of `match` and eliminate special handling
of "misalignment". That is what this PR attempts to do.

Approach
============

Definition: Two names in two tensors *match* if they are equal, or if at
least one of them is a wildcard that can be *refined* to the other name.

With this new definition, to check if two names match, we need to know
about the names list that each name came from to determine if a wildcard
can successfully be *refined* to the other name.

For example, consider the following:
```
tensor: Tensor[A, None]
other: Tensor[None, A]`
```
when unifying `tensor.names[-1]` with `other.names[-1]`, we see that
`tensor.names[-1]` is None and `other.names[-1]` is A. Then we check to
see if `tensor.names[-1]` can be refined to `A`; it can't be refined if
there is already an `A` in `tensor.names`.

Enter `TensorNames`.
A TensorName represents a Dimname associated with some DimnameList
(that came from a Tensor).

`TensorNames` is a list of such TensorName objects with some helper
functions attached.

One can perform the following operations:
- unify two `TensorName` objects
- unify two `TensorNames` objects with right alignment.

Plan
============

This PR changes `compute_matmul_outnames` to use `TensorNames` to
demonstrate how they make writing name inference rules easier. In the
future I'll convert other name inference rules to use `TensorNames` as
well.

Test Plan
- run all tests

Test Plan: Imported from OSS

Differential Revision: D18270666

Pulled By: zou3519

fbshipit-source-id: 3ec96cc957747eb4cfe4ea17fd02ef3d8828a20c
2019-11-01 11:01:48 -07:00
d0204ea92a Remove dead includes in caffe2/binaries
Reviewed By: ezyang

Differential Revision: D18136357

fbshipit-source-id: df357c9d4b344b5621b838c2a2657658e10f7000
2019-11-01 10:58:42 -07:00
bbea34f283 Revert D18266918: C++ API: torch::nn::BatchNorm{2,3}d
Test Plan: revert-hammer

Differential Revision:
D18266918

Original commit changeset: f432904c7298

fbshipit-source-id: 0e1c596b2e2f13b59082ff422c67ba025df4be07
2019-11-01 10:46:49 -07:00
88a34ef690 Move unboxed dispatch decision into dispatcher (#28251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28251

Before, the dispatch key for unboxed operators from native_functions.yaml was generated in codegen and passed to the c10 dispatcher.
Now, we generate it inside of the dispatcher, right next to where the same thing happens for boxed calls.
ghstack-source-id: 93085152

Test Plan: unit tests

Differential Revision: D17989951

fbshipit-source-id: b343d9650debc62bfcff84cf4d6bdaf9dacc9d16
2019-11-01 10:37:52 -07:00
22a346ee34 Moving python allgather_coalesced impl from Py to C
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28857

Test Plan:
buck test caffe2/test:c10d
buck test caffe2/test:distributed_gloo

Reviewed By: mrshenli

Differential Revision: D18209289

fbshipit-source-id: c5a4c4a1aaa07286a05a7c842dda428eeb46f696
2019-11-01 10:34:23 -07:00
b7c5b3d398 C++ API: torch::nn::BatchNorm{2,3}d (#28936)
Summary:
Add torch::nn::BatchNorm{2,3}d module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883 #28176

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28936

Differential Revision: D18266918

Pulled By: yf225

fbshipit-source-id: f432904c72985d52ec52cb992cceb372b6ff0244
2019-11-01 09:28:58 -07:00
c447941bda Migrate conv2d from TH to ATen (CPU) (#28793)
Summary:
This is a port of the SpatialConvolutionMM TH (CPU) implementation to ATen as `slow_conv2d`. In practice it is invoked  for ungrouped, non-dilated, non-float32 convolutions (e.g. float64, long, bfloat16).

- [x] unfolded_copy & unfolded_acc
- [x] forward
- [x] backward
- [x] basic sanity cross check with 1.3 impl
- [x] systematic testing
- [x] performance comparison & optimization

File used for performance testing: [benchmark_conv2d.py](https://gist.github.com/andreaskoepf/c2777b2e5e9d11610f9fc74372930527)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28793

Differential Revision: D18256451

Pulled By: ezyang

fbshipit-source-id: d09e84eef11ccf8a6178dfad485fe6fd0ddf0c86
2019-11-01 08:17:53 -07:00
31c932d9ab fixed replicate typo in torch/nn/parallel/__init__.pyi (#29005)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/29004
Pull Request resolved: https://github.com/pytorch/pytorch/pull/29005

Differential Revision: D18264637

Pulled By: pbelevich

fbshipit-source-id: 03013f668235deca35a58f70732111b53d792de5
2019-11-01 08:00:41 -07:00
a5d65d1f8f Fix embedding renormalization on cpu (#28546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28546

Fix #28370 repro

Test Plan: Imported from OSS

Differential Revision: D18251533

Pulled By: albanD

fbshipit-source-id: cd9ab609797b8c887ec9128752cc6a2f58a9aee6
2019-11-01 07:37:15 -07:00
7776d5bfe9 Update parallel_for/reduce doc (#28545)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28545

* **#28545 Update parallel_for/reduce doc**

Test Plan: Imported from OSS

Differential Revision: D18251534

Pulled By: albanD

fbshipit-source-id: e743e4acfe1a4b5a329c11f7d03efd34d19efda8
2019-11-01 07:37:11 -07:00
dd288d3b21 support addcmul, addcdiv (#28975)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28975

TensorIterator supports propagating names so we just needed to enable
them with support_named_tensor: True

Test Plan:
- really basic tests to test that each variant (outplace, inplace, out=)
supports named tensors.

Differential Revision: D18252421

Pulled By: zou3519

fbshipit-source-id: ea7fb59dcf8c708b6e45d03b9c2ba27fa6b6ce98
2019-11-01 07:11:58 -07:00
08860721ad Revert D18195584: Additional autograd unit tests for Python UDFs.
Test Plan: revert-hammer

Differential Revision:
D18195584

Original commit changeset: b795daf644ba

fbshipit-source-id: 413dac34f1a28e0a591893f43e116f006fd3f2be
2019-11-01 06:59:54 -07:00
72b9bda9e5 Smooth L1 loss (#27661)
Summary:
In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `SmoothL1Loss` module and `smooth_l1_loss` functional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27661

Differential Revision: D18002332

Pulled By: yf225

fbshipit-source-id: b382df8becb0de14986ec16ee0dc953d7b10e917
2019-10-31 23:41:35 -07:00
1c8ef29ac5 Remove copy-pasted code in THCTensorTopK.cuh (#28995)
Summary:
This is independent from https://github.com/pytorch/pytorch/pull/28989, but when https://github.com/pytorch/pytorch/issues/28989 get landed, this fixes https://github.com/pytorch/pytorch/issues/28792 .
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28995

Differential Revision: D18265797

Pulled By: soumith

fbshipit-source-id: 6dd7cffd05aa65e4b366f1c40b8bda0a633e3154
2019-10-31 21:26:50 -07:00
cd3ed4db76 Update README.md (#28971)
Summary:
Fixed some grammar.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28971

Differential Revision: D18265791

Pulled By: soumith

fbshipit-source-id: 778ab3e8a31f5f520a048c089c719c618427eaa6
2019-10-31 21:04:21 -07:00
aa30176c68 Add C++ API clip_grad_value_ for nn:utils (#28736)
Summary:
Adds C++ API clip_grad_value_ for torch::nn:utils module.
Also, fix the for indent level error in the original test/test_nn.py.

Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28736

Differential Revision: D18263807

Pulled By: yf225

fbshipit-source-id: 29282450bd2099df16925e1d0edd3d933f6eeb9b
2019-10-31 19:11:54 -07:00
8a1f42b81e Speed up threshold on CPU. (#27155)
Summary:
This is a small fix, but the runtime improvement does seem consistent (a bit less than 10%):

Benchmark (no turbo, Release build, gcc 8.3, RHEL 7.7, Intel(R) Core(TM) i7-8850H):

```python
import timeit

for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'):
    print(f'dtype={dtype}')
    for n, t in [(70_000, 200000),
                (700_000, 20000)]:
        print(f'torch.nn.Threshold(0.1, 20)(a), numel() == {n} for {t} times')
        print(timeit.timeit(f'm(a)', setup=f'import torch; m=torch.nn.Threshold(0.1, 20); a = torch.arange({n}, dtype={dtype})', number=t))
```

Before:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.88117562699972
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.525143070000013
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.673380930000349
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.677610996000112
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.957677209999929
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.8512293700005102
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.624350482999944
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.670380037000541
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.86375758200029
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.468234717999621
```

After:

```
dtype=torch.double
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.64173036200009
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.456986365000375
dtype=torch.float
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.431988049000211
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.446968590000324
dtype=torch.int16
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
3.743787463999979
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
1.823233144000369
dtype=torch.int32
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
5.42801834400052
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
3.4600211680008215
dtype=torch.int64
torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times
8.562551314000302
torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times
9.37924196699987
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27155

Differential Revision: D17790768

Pulled By: VitalyFedyunin

fbshipit-source-id: 3281eaff77ddddd658048c9e73824dd68c548591
2019-10-31 17:47:11 -07:00
d3cd64d71d PyRRef.owner() to return WorkerInfo (#28909)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28909

This allows to chain calls on RRef as exemplified in the new test case added.
ghstack-source-id: 92996018

Test Plan: unit test.

Differential Revision: D18231081

fbshipit-source-id: deeac044ef6d63f18ea241760ac17a3e644cb3d7
2019-10-31 17:11:24 -07:00
59c5de4d0e Don't permute in quantized::conv2d pattern (#27347)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27347

it's already done in the op, we don't need to permute again

Test Plan:
test_jit.py
we'll test in e2e tests

Imported from OSS

Differential Revision: D18182919

fbshipit-source-id: 04dd2a19a719828fbc7b62e451b81752187e0fcb
2019-10-31 15:58:28 -07:00
ba6defeb07 Revert D18254898: Revert D18202646: [pytorch][PR] Use aten's GRAIN_SIZE for TH Tensor ops
Test Plan: revert-hammer

Differential Revision:
D18254898

Original commit changeset: df19992db610

fbshipit-source-id: 4da5b3b2c4f6fb8f490a319cce50d619d54af0e1
2019-10-31 14:45:59 -07:00
3bba751cd6 Additional autograd unit tests for Python UDFs. (#28824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28824

1) Enhanced autograd unit tests to test the
torch.distributed.autograd.backward() API more thoroughly on Python UDFs.
2) Enhanced `python_error` to override `what` such that it returns an
appropriate error string if we call `what()` on this error. This ensures we can
propagate exceptions over the wire during RPCs (since we get the error string
by calling what() on the exception)
ghstack-source-id: 92972494

Test Plan: waitforbuildbot

Differential Revision: D18195584

fbshipit-source-id: b795daf644ba1816fdec484545192ab55a2f71e7
2019-10-31 14:03:00 -07:00
579ffb647d Add HashStore to c10d (#28921)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28921

This implementation is quite similar to the HashStore in gloo -
an ephemeral in-process store with a lock and unordered_map<>.

There are a few tweaks/differences based on c10d vs gloo:
  - c10d expects add/check methods
  - c10d get() use cases expect to wait up to super::timeout_ if the value isn't present
  - c10d set() isn't expected to throw if the value is present.
  - c10d uses uint8_t vs char

It's potentially a better choice for some cases than FileStore when we
don't need cross-process access, or care about the backing file.
ghstack-source-id: 92992341

Test Plan:
buck build mode/dev-nosan caffe2/torch/lib/c10d/...
    buck-out/dev/gen/caffe2/torch/lib/c10d/HashStoreTest

Differential Revision: D18233713

fbshipit-source-id: ab23f3f93d3148c1337f2cc6a8f2aff4aa6549f3
2019-10-31 13:55:22 -07:00
4654795d13 Revert D18202646: Use aten's GRAIN_SIZE for TH Tensor ops
Test Plan: revert-hammer

Differential Revision:
D18202646

Original commit changeset: ab30e5ef24e6

fbshipit-source-id: df19992db61055541fc0131426421038dea32a48
2019-10-31 13:42:40 -07:00
f63cbf3ae2 change op benchmark forward_only flag (#28967)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967

Change forward_only flag to take True or False so it should be integrated with PEP.

Test Plan:
```
[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True  --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 152.489

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 236.608

[mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False   --iterations 1
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 147.174

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 253.437

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu_bwdall
# Input: M: 64, N: 64, K: 64, device: cpu
Backward Execution Time (us) : 1044.082

Reviewed By: hl475

Differential Revision: D18247416

fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834
2019-10-31 13:28:58 -07:00
fcd6a8252c add shapes for fill benchmark (#28966)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28966

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:fill_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: fill_
# Mode: Eager
# Name: fill__N1024_cpu_dtypetorch.int32
# Input: N: 1024, device: cpu, dtype: torch.int32
Forward Execution Time (us) : 2.008

Reviewed By: hl475

Differential Revision: D18241521

fbshipit-source-id: 6eb6e1ab7e8a2f461c6fc537f5bb971d12f594c3
2019-10-31 13:28:49 -07:00
9034762a7d add more operators to benchmark_all_test (#28968)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28968

Add fill and as_strided operators.

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --list_ops
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# List of Operators to run:
# round_
# exponential_
# QLinear
...

Reviewed By: hl475

Differential Revision: D18241522

fbshipit-source-id: aade1d68a68a660d19d8dfd980eb4d5d0891488b
2019-10-31 13:28:39 -07:00
4bfe2f0900 Fix jit outplace tracing and reapply changes to *_like operators. (#28839)
Summary:
Reapply reverted and fix files `gen_variable_type.py` `test_jit.py`

https://github.com/pytorch/pytorch/issues/27891 Cleanup testing of _like operators
https://github.com/pytorch/pytorch/issues/27890 Add memory format support to randn_like operator
https://github.com/pytorch/pytorch/issues/27889 Add memory format support to randint_like operator
https://github.com/pytorch/pytorch/issues/27562 Add memory format support to zeros_like operator
https://github.com/pytorch/pytorch/issues/27561 Add memory format support to rand_like operator
https://github.com/pytorch/pytorch/issues/27270 Add memory format support to ones_like operator
https://github.com/pytorch/pytorch/issues/27262 Add memory format support to full_like operator
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28839

Test Plan:
Imported from GitHub, without a `Test Plan:` line.

buck test mode/dev //language_technology/neural_mt/os/pytorch_translate/test:test_onnx -- 'test_forced_decoder_export_vocab_reduction \(language_technology\.neural_mt\.os\.pytorch_translate\.test\.test_onnx\.TestONNX\)'

Differential Revision: D18203397

Pulled By: VitalyFedyunin

fbshipit-source-id: eea41cbd4c232cf5a54172b1e1b16b173798f298
2019-10-31 13:23:08 -07:00
0e441dd386 flip the "don't inline" switch (#26706)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26706

This has been ready for some time, just waiting on services to push with
the new code.

#forceTDhashing

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D17543304

fbshipit-source-id: baad22f4abc5af724ebde8507e948bee3e8bf6d4
2019-10-31 13:02:32 -07:00
595209bddc Fix bugs in torch::tensor constructor (#28523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28523

New features:
1. Previously, `torch::tensor({true, false, true})` throws `"tensor_cpu" not implemented for 'Bool'`. After this PR, it produces the correct bool tensor, matching the Python API behavior.
2. Tensors with zero-size dimensions are now supported, e.g. `torch::tensor({{}, {}})` produces a tensor with sizes `{2, 0}`, matching the Python API behavior.

BC-breaking bug fixes:
1. Previously, `torch::tensor({{1}, {2}})` produces a tensor of sizes `{2}`. After this PR, it produces a tensor of sizes `{2, 1}`, matching the Python API behavior.
2. Fixed semantics of `torch::tensor(1.1)`: it now returns a 0-dim tensor instead of a 1-dim tensor, matching the Python API behavior.
3. Previously, when passed a non-dtype `TensorOptions` to the `torch::tensor` constructor, it always produces a tensor of dtype `float`. After this PR, it produces tensor of different dtypes based on the dtype of the braced-init-list, matching the behavior of the no-options case.
```cpp
// Previously:
torch::tensor({1, 2, 3}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> float
torch::tensor({{1, 2, 3}}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> float
torch::tensor({1., 2., 3.}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> float
torch::tensor({{1., 2., 3.}}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> float

// Now:
torch::tensor({1, 2, 3}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> int
torch::tensor({{1, 2, 3}}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> int
torch::tensor({1., 2., 3.}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> double
torch::tensor({{1., 2., 3.}}, torch::TensorOptions(/*non-dtype-options*/)).dtype() -> double

// As comparison, currently:
torch::tensor({1, 2, 3}).dtype() -> int
torch::tensor({{1, 2, 3}}).dtype() -> int
torch::tensor({1., 2., 3.}).dtype() -> double
torch::tensor({{1., 2., 3.}}).dtype() -> double
```

Notes:
1. From now on, the behavior of `at::tensor(scalar_value)` (which produces a 1-dim tensor) would be different from `torch::tensor(scalar_value)` (which produces a 0-dim tensor). I will fix the behavior of `at::tensor(scalar_value)` in a follow-up PR.
2. From now on, the behavior of `at::tensor({1, 2, 3}, torch::TensorOptions(/*non-dtype-options*/))` (which produces a `float` tensor) would be different from `torch::tensor({1, 2, 3}, torch::TensorOptions(/*non-dtype-options*/))` (which produces a an `int` tensor). I will fix this behavior of `at::tensor` constructor in a follow-up PR.

Context for the changes in this PR:

The motivation comes from fixing the "`torch::tensor({{1}, {2}})` gives tensor of wrong sizes" bug - in order to fix it, I have to move the handling of `at::ArrayRef` and `std::vector` into `InitListTensor` (see below on why we need to do this) and renamed `InitListTensor` to `TensorDataContainer`. After such changes, support for bool values comes out of the box without extra effort, and support for tensors with zero-size dimensions only requires adding a default constructor for `TensorDataContainer`, so I added those two in this PR.

For the semantic change of `torch::tensor(1.1)`, it's actually more effort to preserve the original wrong behavior (i.e. we need to check the sizes of the tensor converted from `TensorDataContainer` and reshape any scalar tensor to a 1-D tensor). I think preserving the original wrong behavior doesn't give us much value, and since the above changes naturally fix the problem, we should just start using the right behavior instead.

For the "constructor with non-dtype options behavior" fix, the code looks simpler and easier to reason about with the fix, so I included it in this PR.

--------

Why we need to move the handling of `at::ArrayRef` and `std::vector` into `TensorDataContainer`:

`torch::tensor({{1}, {2}})` can match this function overload:
`torch::tensor(at::ArrayRef<int> values)`, because `{1}` and `{2}` can be treated as
a list-initialization of an `int` value. However, this will produce a Tensor with sizes `{2}`,
but we actually want a Tensor with sizes `{2, 1}`. In order to avoid matching this function overload,
we removed the function overload and moved the ability to convert `at::ArrayRef<T>`
(and similarly `std::vector<T>`) into `TensorDataContainer`, and since for braced-init-list the
`TensorDataContainer(std::initializer_list<TensorDataContainer>)` constructor is always preferred over all other constructors, it will take the `std::initializer_list` path, and all is good.

Test Plan: Imported from OSS

Differential Revision: D18234625

Pulled By: yf225

fbshipit-source-id: 0f3f6912e82e2117d2103e31b74e7e97baaa8693
2019-10-31 12:53:06 -07:00
c8771f5a44 Port mse_lose to ATen (#26529)
Summary:
VitalyFedyunin, This PR is about port mse lose to Aten:

**Test script:**
```
import torch
import torch.nn as nn
import time

def _time():
    if torch.cuda.is_available():
        torch.cuda.synchronize()
    return time.time()

device = "cpu"
loss = nn.MSELoss(reduction = 'sum')
if torch.cuda.is_available():
    device = "cuda"
    loss = loss.cuda()

#warm up
for n in [100, 10000]:
    input = torch.randn(128, n, requires_grad=True, device=device)
    target = torch.randn(128, n, device=device)
    for i in range(1000):
        output = loss(input, target)
        output.backward()

#get running time
for n in [100, 10000]:
    fwd_t = 0
    bwd_t = 0
    input = torch.randn(128, n, requires_grad=True, device=device)
    target = torch.randn(128, n, device=device)
    for i in range(10000):
        t1 = _time()
        output = loss(input, target)
        t2 = _time()
        output.backward()
        t3 = _time()
        fwd_t = fwd_t + (t2 -t1)
        bwd_t = bwd_t + (t3 - t2)
    fwd_avg = fwd_t / 10000 * 1000
    bwd_avg = bwd_t / 10000 * 1000
    print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)."
          % (n, fwd_avg, bwd_avg))
```
**Test Device:** CPU: skx-8180, GPU: Tesla P40.

### Perfromance:

**Before:**
```
GPU:
reduction=’mean’
input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.14 (ms).
input size(128, 10000) forward time is 0.12 (ms); backwad avg time is 0.21 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.09 (ms); backwad avg time is 0.15 (ms).
input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.20 (ms).

CPU:
OMP_NUM_THREADS=56
reduction=’mean’
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.09 (ms).
input size(128, 10000) forward time is 3.49 (ms); backwad avg time is 3.23 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.09 (ms).
input size(128, 10000) forward time is 3.49 (ms); backwad avg time is 3.23 (ms).

OMP_NUM_THREADS=1
reduction=’mean’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms).
input size(128, 10000) forward time is 1.41 (ms); backwad avg time is 1.66 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms).
input size(128, 10000) forward time is 1.44 (ms); backwad avg time is 1.68 (ms).
```

**After:**
```
GPU:
reduction=’mean’
input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.13 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 0.20 (ms).

reduction=’sum’
input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.14 (ms).
input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 0.20 (ms).

CPU:
OMP_NUM_THREADS=56
reduction=’mean’
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.06 (ms).
input size(128, 10000) forward time is 0.14 (ms); backwad avg time is 0.30 (ms).

reduction=’sum’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.06 (ms).
input size(128, 10000) forward time :qis 0.13 (ms); backwad avg time is 0.30 (ms).

OMP_NUM_THREADS=1
reduction=’mean’
input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.05 (ms).
input size(128, 10000) forward time is 0.85 (ms); backwad avg time is 1.27 (ms).
reduction=’sum’
input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms).
input size(128, 10000) forward time is 0.83 (ms); backwad avg time is 1.27 (ms).
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26529

Differential Revision: D18225144

Pulled By: VitalyFedyunin

fbshipit-source-id: ce837a297c70398a3ffa22f26ee9e812cf60d128
2019-10-31 12:37:54 -07:00
42faf961c8 Update fbjni submodule to new upstream and latest version
Summary:
The central fbjni repository is now public, so point to it and
take the latest version, which includes support for host builds
and some condensed syntax.

Test Plan: CI

Differential Revision: D18217840

fbshipit-source-id: 454e3e081f7e3155704fed692506251c4018b2a1
2019-10-31 11:48:25 -07:00
80b46ca35a Null AutogradMeta optimization (#28610)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28610

The basic idea is, in some cases where we stored a pointer to a full AutogradMeta object, instead store a nullptr. We let a nullptr represent a default-constructed AutogradMeta object, and simply populate it with a real AutogradMeta if there is ever a situation where we need to modify it.

The primary technical contrivance in this diff is I have to use AutogradMetaFactory to lazily initialize the AutogradMeta, as it is not available in the dynamic library that TensorImpl is in. (I spent a while trying to put them in the same compilation unit, but gave up in the end as it pushed us over the Windows linking binary size limit. Eep.)

Some other notes:
- `set_autograd_meta` now unconditionally turns a tensor into a variable. I audited all call sites and observed there are no occurrences where nullptr is passed (after this patch, there are now!)
- `copy_tensor_metadata` is updated to unconditionally preserve the VariableTensorId-ness of the destination tensor. I think this is the more correct semantics; we can't do the old semantics anymore.
- There's a bunch of places in the API where we return const references to objects. This is pretty weird to me, but I didn't feel like cleaning it up. But sometimes I don't conveniently have something that's the right lifetime, so I introduced a number of singletons to handle this correctly.

You might wonder why I'm doing the optimization before the variable-tensor dynamic merge. The reason is simple: this change is semantics preserving, while variable-tensor dynamic merge is not. So it is easier to get right, and prevents us from regressing performance if we do it the other way.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171162

Pulled By: ezyang

fbshipit-source-id: 580df729e4d04881b2b9caa0f0c00785b3afbb92
2019-10-31 11:45:16 -07:00
85e72edf3e Delete dead TensorImpl::detach_autograd_meta (#28609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28609

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171159

Pulled By: ezyang

fbshipit-source-id: 509061ca56186c7762da9634abecbafad0277d94
2019-10-31 11:45:12 -07:00
b52ceec80b Remove unused gradient_edge argument from make_variable_view (#28602)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28602

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171163

Pulled By: ezyang

fbshipit-source-id: 3f3d4cf0bd05c302f502795a04ecace0fc064255
2019-10-31 11:45:07 -07:00
335bfa24e0 Add an AutogradMeta factory. (#28593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28593

When I turn on Variable everywhere, I will need to be able to construct
AutogradMetas from TensorImpl.  But I cannot call the constructor directly
as it lives in another dynamic library. So I need another virtual factory interface
to be able to do this.

I also adjust the AutogradMeta constructor so that the TensorImpl argument is
optional. This argument is only needed if `requires_grad == True`, as we use it
to test if the variable is valid (only floating point tensors can have requires grad true).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171161

Pulled By: ezyang

fbshipit-source-id: 3f2e86720899b3bda36ddd90244c2624645cc519
2019-10-31 11:45:03 -07:00
18f2efa997 Unfriend Variable factory functions. (#28601)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28601

In the process, I moved AutogradMeta out of the Variable class. The
intent here is that I'm going to delete Variable class entirely,
so I had better not be putting stuff in it!

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171160

Pulled By: ezyang

fbshipit-source-id: 9c0bcdc82797eca0577d1b0745b4a2ae962f3010
2019-10-31 11:44:58 -07:00
9643f066cf Move all autograd_meta_ manipulating operations out-of-line. (#28592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28592

These aren't perf critical, and putting them in a cpp file makes it easier to
work on them.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171158

Pulled By: ezyang

fbshipit-source-id: 4aad434ad4aecba7ed46761f676df6bbec37733e
2019-10-31 11:44:54 -07:00
a844809a2c Test TensorTypeSet instead of autograd_meta_ for variable-ness. (#28543)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28543

By the current autograd_meta_ <=> type_set_ invariant (now explicitly documented
in the right place!), these are equivalent.  But when I introduce null
autograd_meta_ optimization, they won't be equivalent anymore: TensorTypeSet is
going to give me the right information no matter what.

In the long run, this patch will be a wash, because everything will "be a variable"
in the long term.  But I am making this change now to make sure that the invariant
actually holds.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18171157

Pulled By: ezyang

fbshipit-source-id: cbba8fd5df9e6873a8757925db5f578fecbd2486
2019-10-31 11:44:50 -07:00
38388b9b3c Updating submodules
Summary:
GitHub commits:

41e219e542
1a853c0fb4
727113485b
0bf264f1fc

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 771f54489bc6680b15df540dbfb789615a1a4a3f
2019-10-31 11:41:01 -07:00
00bd9eae33 Fix typo in Dataset and IterableDataset docs (#28960)
Summary:
Replaced "overrite" with "overwrite".
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28960

Differential Revision: D18246411

Pulled By: soumith

fbshipit-source-id: dc0979a44b7c621a316823061760e0358c227727
2019-10-31 11:34:52 -07:00
b1bf595e54 Update generated test model
Summary:
The Java and Python code were updated, but the test currently fails
because the model was not regenerated.

Test Plan: Ran test.

Reviewed By: xcheng16

Differential Revision: D18217841

fbshipit-source-id: 002eb2d3ed0eaa14b3d7b087b621a6970acf1378
2019-10-31 11:03:20 -07:00
c60bf2704a Support Offline Tensors through ONNXIFI layer
Summary:
Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4

Included changes:
- **[97fe555](https://github.com/houseroad/foxi/commit/97fe555)**: Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu>
- **[ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7)**: Add status and timeout to events (#14) <Jack Montgomery>

Test Plan: kicksandcastle

Reviewed By: ipiszy

Differential Revision: D18231697

fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9
2019-10-31 10:33:42 -07:00
05e88dc4fe skip additional flaky rpc tests (#28934)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28934

These tests are flaky, skip them as we investigate for a root cause
ghstack-source-id: 92945898

Test Plan: tests pass

Differential Revision: D18235766

fbshipit-source-id: 9bff65653954b767e32bcc1d25c65b0cea2c4331
2019-10-31 10:12:59 -07:00
275adb143e fix printing a node header (a kind wasn't being printed)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28887

Differential Revision: D18226435

Pulled By: Krovatkin

fbshipit-source-id: b8edf8bb52ff45ab625ccedf66263d3ab5895faa
2019-10-31 09:55:02 -07:00
fca99e96e8 Move cuda functions to cuda folder.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28818

Differential Revision: D18232782

Pulled By: ailzhang

fbshipit-source-id: 936a0635bccc7c759bbbff438f43f3812e34faed
2019-10-31 09:41:56 -07:00
c63e15aef8 Revert D18241759:
Test Plan: revert-hammer

Differential Revision:
D18241759

Original commit changeset: 8f2535bb0bc4

fbshipit-source-id: 870ac8e860e31f32138d42d470321e225a19990d
2019-10-31 07:54:26 -07:00
1dcf1b8938 Update pinverse doc for recent commit
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28877

Differential Revision: D18225510

Pulled By: albanD

fbshipit-source-id: 698af06ac9e4259eed93d146edb3a7fb13e39242
2019-10-31 07:36:35 -07:00
fe8804695b Use aten's GRAIN_SIZE for TH Tensor ops (#28770)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28198 in my tests on a 24 core AMD threadripper.

Profiling the benchmark showed that most of the slowdown in https://github.com/pytorch/pytorch/issues/28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size.

Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before.

Here are the timing results I get:

| Version    | Full iteration time | `index_select` | `mm`       | `addmm`    |
|:----------:|---------------:|-------------:|---------:|---------:|
| master     | 3505.85 ms/it  | 184.302 ms   | 9.520 ms | 8.494 ms |
| no scaling | 3453.18 ms/it  |   184.456 ms | 5.810 ms | 5.069 ms |
| this PR    | 3453.23 ms/it  |   184.526 ms | 5.824 ms | 5.202 ms |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28770

Differential Revision: D18202646

Pulled By: ezyang

fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854
2019-10-31 07:18:46 -07:00
9630b78c49 Pow() : Use lightweight operations for predefined scalar exponent values (#28903)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28903

Use of predefined and less compute intensive functions instead of pow() for predefined scalar exponent values.

Test Plan: automated tests

Reviewed By: jspark1105

Differential Revision: D18227280

fbshipit-source-id: 0a443832c3ff8372e64dbe04de4f7fb4ce7c0740
2019-10-31 05:39:39 -07:00
Jie
1b1e3d565c (#28927)
Summary:
This is to fix https://github.com/pytorch/pytorch/issues/22526

Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware;
Test added in test_cuda;
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28927

Differential Revision: D18241759

Pulled By: soumith

fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999
2019-10-31 01:00:47 -07:00
9fb0079036 merge some of the lint checks (#28933)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28933

Merge all the things that don't add annotations into a single
`quick-checks` job. This helps reduce concurrency and clutter
at the top of the status check page.

This doesn't touch the actually important items (flake8 + clang-tidy),
but those are a little trickier to handle because of how annotations are
added.

Test Plan: Imported from OSS

Differential Revision: D18235396

Pulled By: suo

fbshipit-source-id: 8fba44f3f5d398b1dce0f39f51d6652f3e0c1bf7
2019-10-30 23:02:34 -07:00
d071ca2972 Improve reshape backward when the op is a view (#28901)
Summary:
Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at https://github.com/pytorch/pytorch/pull/8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary.

Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`.

```py
In [15]: x = torch.randn(3, 4, requires_grad=True)

In [16]: y = x.reshape(x.shape)

In [17]: assert y._base is not None

In [18]: gy = torch.randn_like(y)

In [20]: gx = torch.autograd.grad(y, x, gy)[0]

In [21]: gx
Out[21]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

In [22]: gx._base  # not gy
Out[22]:
tensor([ 0.2189,  0.3396, -0.1108,  1.7703,  1.0737, -0.1222,  1.0765, -1.3363,
        -1.3798, -0.2950,  0.0800,  0.2501])

In [23]: gy.zero_()
Out[23]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [24]: gx  # not sharing storage with gy
Out[24]:
tensor([[ 0.2189,  0.3396, -0.1108,  1.7703],
        [ 1.0737, -0.1222,  1.0765, -1.3363],
        [-1.3798, -0.2950,  0.0800,  0.2501]])

# but everything is optimized with view, which should be equivalent with reshape in this case
In [25]: y = x.view(x.shape)

In [26]: assert y._base is not None

In [27]: gy = torch.randn_like(y)

In [28]: gx = torch.autograd.grad(y, x, gy)[0]

In [29]: gx
Out[29]:
tensor([[-2.4463,  1.1446,  0.1501,  0.1212],
        [-1.1125,  1.4661,  0.9092, -0.2153],
        [-0.1937, -0.3381, -1.3883, -0.7329]])

In [30]: gy.zero_()
Out[30]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [31]: gx  # sharing storage with gy
Out[31]:
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28901

Differential Revision: D18240868

Pulled By: ezyang

fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0
2019-10-30 22:38:41 -07:00
47301a153b Eliminate unnecessary Tensor ref count bumps.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28773

Differential Revision: D18229349

fbshipit-source-id: 4d0bc22ae827d8f207a08f9f08d8fe13ad700656
2019-10-30 21:14:13 -07:00
64c7ac233e Disable flaky remote tests in dist_autograd_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28920

Test Plan: Imported from OSS

Differential Revision: D18233625

Pulled By: mrshenli

fbshipit-source-id: d4b04ea3629d0828756ebb118f5763677d62729b
2019-10-30 18:43:10 -07:00
fd5c68b5e4 Revert D18231741: Enable PyTorch Probot as a GitHub Action.
Test Plan: revert-hammer

Differential Revision:
D18231741

Original commit changeset: d49711ad41d7

fbshipit-source-id: f390ec3ca8c55bfc308d8eacad5dd7dfae36500e
2019-10-30 18:24:40 -07:00
5e94e66c6f unify unary ops benchmark (#28913)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28913

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: abs
# Mode: Eager
# Name: abs_M512_N512_cpu
# Input: M: 512, N: 512, device: cpu
Forward Execution Time (us) : 90.233

...

Reviewed By: hl475

Differential Revision: D18231641

fbshipit-source-id: 3093db47d0356b927768f15dc63af6ad8aadd430
2019-10-30 17:46:13 -07:00
2ffc4cca67 unify split benchmark (#28912)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28912

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:split_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: split
# Mode: Eager
# Name: split_M256_N512_parts2_cpu
# Input: M: 256, N: 512, parts: 2, device: cpu
Forward Execution Time (us) : 3.434

Reviewed By: hl475

Differential Revision: D18231542

fbshipit-source-id: 84898db55996aa3faf156d4fb14f32d6db780e7a
2019-10-30 17:46:09 -07:00
94d2599d77 unify softmax benchmark (#28911)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28911

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short
# Benchmarking PyTorch: Softmax
# Mode: Eager
# Name: Softmax_N4_C3_H256_W256_cpu
# Input: N: 4, C: 3, H: 256, W: 256, device: cpu
Forward Execution Time (us) : 17929.381
...

Reviewed By: hl475

Differential Revision: D18231517

fbshipit-source-id: 61f35849e1f4cf44cf09e60a7b618f8e9fc67b9c
2019-10-30 17:46:05 -07:00
ed4a978d79 unify pool benchmark (#28898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28898

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:pool_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: MaxPool1d
# Mode: Eager
# Name: MaxPool1d_kernel3_stride1_N8_C256_L256_cpu
# Input: kernel: 3, stride: 1, N: 8, C: 256, L: 256, device: cpu
Forward Execution Time (us) : 7133.492

Reviewed By: hl475

Differential Revision: D18228351

fbshipit-source-id: 47af93d5dd3776384f89b1289fbbe01c572ba9fc
2019-10-30 17:46:01 -07:00
f5e99b3249 unify matmul benchmark (#28899)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28899

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:matmul_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: matmul
# Mode: Eager
# Name: matmul_M128_N128_K128_trans_aTrue_trans_bFalse_cpu
# Input: M: 128, N: 128, K: 128, trans_a: True, trans_b: False, device: cpu
Forward Execution Time (us) : 39.535

Reviewed By: hl475

Differential Revision: D18228271

fbshipit-source-id: 681ed2745c25a122997346a23acdbc67e55e5ec4
2019-10-30 17:45:57 -07:00
28be2d4994 Better error message for quantized dispatch (#28635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28635

Fixes #28518

Test Plan: Imported from OSS

Differential Revision: D18132566

Pulled By: z-a-f

fbshipit-source-id: 08acc3033b12a0b79b43a5346b7af100416ffa94
2019-10-30 16:51:22 -07:00
6e1c18303b unify linear benchmark (#28897)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28897

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:linear_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N4_IN256_OUT128_cpu
# Input: N: 4, IN: 256, OUT: 128, device: cpu
Forward Execution Time (us) : 39.275

Reviewed By: hl475

Differential Revision: D18228070

fbshipit-source-id: 9c209eb74e574c6ef85ebcd78b824ef7d5e65dde
2019-10-30 16:25:48 -07:00
a7b235f968 unify gather benchmark (#28895)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28895

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: Conv1d
# Mode: Eager
# Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu
# Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu
Forward Execution Time (us) : 208.936

Reviewed By: hl475

Differential Revision: D18227757

fbshipit-source-id: 493dd81108848fe3d48fb5ad940eb6aef84b639c
2019-10-30 16:25:43 -07:00
6e4147c72c unify conv benchmark (#28894)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28894

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: Conv1d
# Mode: Eager
# Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu
# Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu
Forward Execution Time (us) : 208.936

Reviewed By: hl475

Differential Revision: D18227626

fbshipit-source-id: 1ae768f529aa888415840ca10197323407e47d39
2019-10-30 16:25:39 -07:00
dbf8f535fc unify chunk benchmark (#28892)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28892

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:chunk_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: chunks
# Mode: Eager
# Name: chunks_M256_N512_chunks2_cpu
# Input: M: 256, N: 512, chunks: 2, device: cpu
Forward Execution Time (us) : 4.098

Reviewed By: hl475

Differential Revision: D18227499

fbshipit-source-id: 72268b7fe94a7d92d6e47f58f33902a33367c68b
2019-10-30 16:25:35 -07:00
15deee25bc Fix aten::format regex for clang8 (#28916)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28916

The previous regex caused a `std::regex_error` under clang8 complaining about `error_brack`, which is strange because the square brackets are balanced. Seems like a stdlib bug to me. So to workaround this, I've switched to the older regex with a non-greedy match in the inner atom

Test Plan: Imported from OSS

Differential Revision: D18232654

Pulled By: jamesr66a

fbshipit-source-id: f82a9a24acf090010b03f23454d2b0f7a1e3589e
2019-10-30 16:14:46 -07:00
88b2bfd706 unify cat benchmark (#28893)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28893

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:cat_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0_cpu
# Input: M: 256, N: 512, K: 1, dim: 0, device: cpu
Forward Execution Time (us) : 78.607

Reviewed By: hl475

Differential Revision: D18227341

fbshipit-source-id: d383709a5aab600f99b37d07e4d4393645289101
2019-10-30 15:53:37 -07:00
aa30b37d2e unify batchnorm benchmark (#28889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28889

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:batchnorm_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: batchnorm
# Mode: Eager
# Name: batchnorm_M1_N256_K3136_cpu
# Input: M: 1, N: 256, K: 3136, device: cpu
Forward Execution Time (us) : 276.192

Reviewed By: hl475

Differential Revision: D18227180

fbshipit-source-id: d8abe56237bb84903315332a5ecdaa1dff613110
2019-10-30 15:53:33 -07:00
740474838f unify as_strided benchmark (#28890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28890

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:as_strided_test

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0_cpu
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0, device: cpu
Forward Execution Time (us) : 2.792
...

Reviewed By: hl475

Differential Revision: D18227052

fbshipit-source-id: e17d9335ec89b47706a363bdb31451a01d4cbc5b
2019-10-30 15:53:29 -07:00
db15c2ba20 unify add benchmark format (#28891)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 125.279
...

Reviewed By: hl475

Differential Revision: D18226789

fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95
2019-10-30 15:53:25 -07:00
d6f1e49c4a C++ API parity: CTCLoss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28654

Test Plan: Imported from OSS

Differential Revision: D18202437

Pulled By: pbelevich

fbshipit-source-id: a4b80a57e65da84f3988002a026c648fa52a0fde
2019-10-30 14:35:02 -07:00
2466dc8544 Migrate nll_loss from TH to ATen (CPU) (#28270)
Summary:
This is a port of the negative log likelihood TH loss implementation to ATen which is used by `torch.nn.functional.nll_loss()` for 2d inputs (N, C).

## Performance Impact

I measured no significant performance-difference of the port compared to the original implementation when using this [benchmark test script](https://gist.github.com/andreaskoepf/3c8e3698607773db2788dfd8885a9ed9).

### WITH PR applied:
```
CPU forward 1000 took 2.5290995836257935e-05
CPU forward 10000 took 5.757302278652787e-05
CPU forward 100000 took 0.0004873779835179448
CPU forward 1000000 took 0.0051894880016334355
CPU forward 10000000 took 0.026263039995683357
CPU forward TOTAL time 0.8068871730065439
CPU for- & backward 1000 took 0.00018794499919749796
CPU for- & backward 10000 took 0.0002642899926286191
CPU for- & backward 100000 took 0.0011828370043076575
CPU for- & backward 1000000 took 0.01250307000009343
CPU for- & backward 10000000 took 0.11453165800776333
CPU for- & backward TOTAL time 0.824805997981457
```

### Original TH version:
```
CPU forward 1000 took 2.1958985598757863e-05
CPU forward 10000 took 6.608400144614279e-05
CPU forward 100000 took 0.0004632119962479919
CPU forward 1000000 took 0.005477247992530465
CPU forward 10000000 took 0.02681165697867982
CPU forward TOTAL time 0.8073387439944781
CPU for- & backward 1000 took 0.00020634100656025112
CPU for- & backward 10000 took 0.00031720998231321573
CPU for- & backward 100000 took 0.0011843869870062917
CPU for- & backward 1000000 took 0.010876987013034523
CPU for- & backward 10000000 took 0.09893897600704804
CPU for- & backward TOTAL time 0.8271351839939598
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28270

Differential Revision: D18009584

Pulled By: ezyang

fbshipit-source-id: 77daf47c61a9dd9bb3b5a8d3e48585bbb665e979
2019-10-30 14:12:09 -07:00
732a3d8f8c Fix UNICODE conflict on Windows (#28782)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27568.
cc IlyaOvodov.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28782

Differential Revision: D18201449

Pulled By: ezyang

fbshipit-source-id: 404e7c0abdfeef52a0e81ab2acd1b61e86c28f39
2019-10-30 14:09:31 -07:00
e3a24ba6d5 Enable PyTorch Probot as a GitHub Action. (#28879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28879

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18231741

Pulled By: ezyang

fbshipit-source-id: d49711ad41d7ff7e527326c68fd8db86da10a818
2019-10-30 13:59:23 -07:00
f5edb62a7f Clean extending autograd doc for output size 1 (#28860)
Summary:
Fix https://github.com/pytorch/pytorch/issues/28583
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28860

Differential Revision: D18224497

Pulled By: albanD

fbshipit-source-id: 0fa4eacce6f6092d555e509dc23bd75206f78d41
2019-10-30 13:57:10 -07:00
5821b9bf0f Remove error logging of high empty range ratio
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28854

Reviewed By: xianjiec

Differential Revision: D18206695

fbshipit-source-id: 4ce471f0236b2ceaf54ba1b1ce96e193feca720b
2019-10-30 12:55:25 -07:00
1d3d9ec7d4 C++ API Parity: functional::fold and Fold::pretty_print (#28732)
Summary:
Adds `torch::nn::functional::fold` support and updates `Fold::pretty_print` in the C++ API for more thorough Python parity.

Note: Small updates in source files to maintain consistency elsewhere.

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28732

Differential Revision: D18219955

Pulled By: yf225

fbshipit-source-id: fd2e9be8f17db77c1b1f384c0d2e16cc34858c0c
2019-10-30 11:37:39 -07:00
807fbf8816 Disable flaky tests in dist_autograd_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28876

Test Plan: Imported from OSS

Differential Revision: D18224445

Pulled By: mrshenli

fbshipit-source-id: 4de2c24ac6e9ffb004457e2dc43730dc7e478e5a
2019-10-30 11:34:35 -07:00
a465b033fd Local response norm (#28759)
Summary:
Implemented LocalResponseNorm and some initial tests for modules and functional. Reference https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28759

Differential Revision: D18219745

Pulled By: yf225

fbshipit-source-id: e6aad568a8b1e81f54752decaefd4f9044029da9
2019-10-30 11:31:00 -07:00
3073785f4c Fix when giving jit format warning about unsupported options (#28616)
Summary:
Current reges also matches strings with '{}' so warning is always given.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28616

Test Plan: Previous code was giving a warning about unspported options, these disappeared. When adding something inside '{}' the warning came back.

Differential Revision: D18039443

Pulled By: ggoossen

fbshipit-source-id: bb3a2892d5707a32030d43250c40f3058aa1d18b
2019-10-30 11:24:14 -07:00
50fd20b64a fix bug on setup.py to include header files on caffe2/utils/math (#28869)
Summary:
This problem is from issue [https://github.com/pytorch/pytorch/issues/28753](https://github.com/pytorch/pytorch/issues/28753)

The header files on directories`math` and `threadpool` should be included on the built package because they are included on the other header files, such as on file `torch/include/caffe2/utils/math.h`
```
#include "caffe2/core/common.h"
#include "caffe2/core/types.h"
#include "caffe2/utils/math/broadcast.h"
#include "caffe2/utils/math/elementwise.h"
#include "caffe2/utils/math/reduce.h"
#include "caffe2/utils/math/transpose.h"
#include "caffe2/utils/math/utils.h"
```
But the `setup.py` doesn't include the header files on `master` branch. The header files on `utils` directory of a built `torch` package are the following:
```
> ls include/caffe2/utils
bench_utils.h  conversions.h  eigen_utils.h    map_utils.h    murmur_hash3.h   proto_wrap.h      smart_tensor_printer.h
cast.h         cpuid.h        filler.h         math-detail.h  proto_convert.h  signal_handler.h  string_utils.h
cblas.h        cpu_neon.h     fixed_divisor.h  math.h         proto_utils.h    simple_queue.h    zmq_helper.h
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28869

Differential Revision: D18226319

Pulled By: soumith

fbshipit-source-id: 51575ddc559181c069b3324aa9b2d1669310ba25
2019-10-30 11:11:15 -07:00
331e09eca4 Make FileStore not segfault with concurrent accesses. (#28812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28812

FileStore isn't thread-safe. We've observed a few FB unittests
already using this class in an unsafe manner.

This change enforces at most a single concurrent use of
the various file options, from this specific Store instance.
This protects the cache_, pos_, and the relative integrity
of the operations.

An alternative would be simply to explicitly document this
class as non-thread-safe, though perhaps not everybody will
read the warning.

ghstack-source-id: 92874098

Test Plan:
buck test mode/dev-nosan caffe2/...
  Actual observed failures were in ThreadRpcAgentTest

Differential Revision: D18187821

fbshipit-source-id: 67c765da74c836a9ac9f887cdf1a28a75247e04b
2019-10-30 11:03:00 -07:00
e0009fdeb1 Migrate sinh and sinh_ from the TH to Aten (CUDA) (#28527)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28527

Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4):

```python
import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.sinh(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.sinh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```

Before:

```
torch.sinh(a) a.numel() == 10000 for 20000 times torch.half
0.3807680979998622
torch.sinh(a) a.numel() == 10000 for 20000 times torch.float
0.37430476099962107
torch.sinh(a) a.numel() == 10000 for 20000 times torch.double
1.0580407639999976
torch.sinh(a) a.numel() == 100000 for 20000 times torch.half
0.7996397469996737
torch.sinh(a) a.numel() == 100000 for 20000 times torch.float
1.010930432999885
torch.sinh(a) a.numel() == 100000 for 20000 times torch.double
7.310400856999877
```

After:

```
torch.sinh(a) a.numel() == 10000 for 20000 times torch.half
0.3720399889998589
torch.sinh(a) a.numel() == 10000 for 20000 times torch.float
0.3694016069994177
torch.sinh(a) a.numel() == 10000 for 20000 times torch.double
1.0551542660004998
torch.sinh(a) a.numel() == 100000 for 20000 times torch.half
0.7431191599998783
torch.sinh(a) a.numel() == 100000 for 20000 times torch.float
0.9953043630002867
torch.sinh(a) a.numel() == 100000 for 20000 times torch.double
7.3146168890007175
```

Close #24628

Test Plan: Imported from OSS

Differential Revision: D18124732

Pulled By: VitalyFedyunin

fbshipit-source-id: 054b0c0884ac12de2dd1a92c5de916aaf047f9e9
2019-10-30 10:57:14 -07:00
a7166ae448 Migrate asin and asin_ from the TH to Aten (CUDA) (#28482)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28482

Benchmark (RHEL 7.3, Release, P1000, gcc 8.3):

```python
import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```

Before:

```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.475854377997166
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.4772826389998954
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6297428649995709
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5475849750000634
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6156488769993302
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.728912709000724
```

After:

```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.5107104659982724
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.509122366001975
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6929216960015765
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5914848840002378
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6518679289983993
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.916458261999651
```

Close #24537

Test Plan: Imported from OSS

Differential Revision: D18089074

Pulled By: VitalyFedyunin

fbshipit-source-id: f27515dd1ee73b6e2391ebcc0004af28bcb82234
2019-10-30 10:57:10 -07:00
d0bd8a3afc Migrate sin and sin_ from the TH to Aten (CUDA) (#28237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28237

Benchmark (RHEL 7, gcc 8.3.1, P1000):

```python
import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```

Before:

```
torch.sin(a) a.numel() == 10000 for 20000 times torch.half
0.4649172620011086
torch.sin(a) a.numel() == 10000 for 20000 times torch.float
0.4616892600006395
torch.sin(a) a.numel() == 10000 for 20000 times torch.double
0.5166665920005471
torch.sin(a) a.numel() == 100000 for 20000 times torch.half
0.5376560490003612
torch.sin(a) a.numel() == 100000 for 20000 times torch.float
0.6207812359989475
torch.sin(a) a.numel() == 100000 for 20000 times torch.double
1.873208982999131
```

After:

```
torch.sin(a) a.numel() == 10000 for 20000 times torch.half
0.4796977340010926
torch.sin(a) a.numel() == 10000 for 20000 times torch.float
0.48329569199995603
torch.sin(a) a.numel() == 10000 for 20000 times torch.double
0.5380683220009814
torch.sin(a) a.numel() == 100000 for 20000 times torch.half
0.5299932739999349
torch.sin(a) a.numel() == 100000 for 20000 times torch.float
0.6144487999990815
torch.sin(a) a.numel() == 100000 for 20000 times torch.double
1.8838113630008593
```

Close #24627

Test Plan: Imported from OSS

Differential Revision: D18089072

Pulled By: VitalyFedyunin

fbshipit-source-id: 4824804960309fe7fdb16073d021388704986993
2019-10-30 10:57:06 -07:00
2526f97464 Include hierarchy information in C++ API loading error messages (#28499)
Summary:
Before, we would only give the key we are looking for (i.e. typically
just "No such serialized tensor 'weight'", no matter for which submodule
we were looking for a weight.
Now we error with "No such serialized tensor '0.conv1.weight'" or
similar.
The analogous information is added to missing module error messages.

I threw in a test, and it saved me already...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28499

Differential Revision: D18122442

Pulled By: yf225

fbshipit-source-id: a134b6d06ca33de984a11d6fea923244bcd9fb95
2019-10-30 08:41:37 -07:00
726f0ce946 Increase verbosity of Hypothesis on CI. (#28799)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28799

When the verbosity is quiet, hypothesis no longer prints the real
error when it finds multiple falsifying examples: it just says
that there are two failures.  This is supremely unuseful. Make
it print more.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18206936

Pulled By: ezyang

fbshipit-source-id: 03bb60ba24cee28706bb3d1f0858c32b6743a109
2019-10-30 08:28:20 -07:00
496f740824 Connect with clip range gather operator (#28866)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28866

When we are working on the fix for int32 instead of int64, we also need to take care of the ClipRangesGatherSigridHash since this is the operator that actually gets used during inference.

Test Plan: Added unittest to cover for the new case

Reviewed By: ipiszy

Differential Revision: D17147237

fbshipit-source-id: 2b562b72a6ae8f7282e54d822467b8204fb1055e
2019-10-29 23:32:08 -07:00
eb00af37bd insert_prepack_unpack for conv (#27346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27346

att

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18182915

fbshipit-source-id: d646ae76ce44f5d12e974c776a3e92e5e163493c
2019-10-29 22:03:07 -07:00
790563b374 Add OfflineTensor (#28855)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28855

Resubmit:
OfflineTensor will be a shell to just carry the shape and dtype. No data will be stored. This should help us plumb through the onnxifi process.

Test Plan:
```
buck test caffe2/caffe2/fb/opt:onnxifi_with_offline_tensor_test
```

Reviewed By: ipiszy, ChunliF

Differential Revision: D18212824

fbshipit-source-id: 5c8aaed2ef11d719dfa2a2901875efd66806ea56
2019-10-29 21:59:57 -07:00
a8b63cacbc Updating submodules
Summary:
GitHub commits:

4b2da87ee6
b997eec151
9f34d1f643
a3960fc875
541c404784
b2438faaf0
06335bac7c
2ac6f45e20

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: a6fb756d2d210d0505c889ba6c0e207e6a2d074d
2019-10-29 19:54:36 -07:00
043530a9b9 Support remote for Python UDF in distributed autograd
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28656

Test Plan: Imported from OSS

Differential Revision: D18138561

Pulled By: mrshenli

fbshipit-source-id: 798e7c00465b5a299f7b4642683bc407895bc7da
2019-10-29 19:39:04 -07:00
400293fcc6 Support remote for builtin operators in distributed autograd (#28630)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28630

This includes:
1. Respect autograd context in rpc.remote for builtin ops
2. Force setting autograd context in RRef.to_here() even if the
message for to_here() does not contain any tensor.

Test Plan: Imported from OSS

Differential Revision: D18138562

Pulled By: mrshenli

fbshipit-source-id: a39ec83e556d19130f22eb317927241a017000ba
2019-10-29 19:39:00 -07:00
ec81cd55fc Migrate implementations of triu and tril to a separate file (#28750)
Summary:
Having them in BatchLinearAlgebra.cpp/.cu seemed out of place, since they are more general purpose and this code was interspersed between LAPACK and MAGMA wrappers as well.

Changelog:
- Move tril* / triu* to TriangularOps.cpp/.cu
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28750

Test Plan:
- Builds should complete successfully to ensure that the migration is error-free
- Tests should pass to ensure the methods that the front-end is unaffected.

Differential Revision: D18205456

Pulled By: soumith

fbshipit-source-id: 41966b9ddfe9f196f4d7c6a5e466782c1985d3d9
2019-10-29 19:24:05 -07:00
1c436ded44 Remove test_quantizer.py and reuse one of its test in test_quantization.py (#27269)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27269

Remove `test_quantizer.py`, add and rewrite one of the tests in `test_quantizer`
in `test_quantization.py`
The conv test is removed for now since conv pattern is still broken, we'll add another test
later
ghstack-source-id: 92869823

Test Plan:
python test/test_quantization.py

Imported from OSS

Differential Revision: D18182916

fbshipit-source-id: 325b5d8e877228d6a513e3ddf52c974479250d42
2019-10-29 19:04:21 -07:00
dfe7b25eaf Add nn::Flatten to C++ Frontend (#28072)
Summary:
Adds torch::nn::Flatten module support for the C++ API.

Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28072

Differential Revision: D18202778

Pulled By: yf225

fbshipit-source-id: 43345dcbdf2f50d75746bf9a0ba293b84df275ab
2019-10-29 17:52:47 -07:00
57c9b1cefc Enabling inplace relu
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28710

Test Plan: Imported from OSS

Differential Revision: D18146120

Pulled By: z-a-f

fbshipit-source-id: d8f0982f5a2ae35f7deb34e67cdb64be700a9d6c
2019-10-29 17:33:48 -07:00
cbc234bceb C++ API: torch::nn::BatchNorm1d (#28176)
Summary:
Add torch::nn::BatchNorm1d function/module support for the C++ API.
torch::nn::BatchNorm{2,3}d will be added after this PR is merged.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225

I would like to discuss about below items.

* Necessity of `num_batches_tracked` in `BatchNormImplBase`
  * `num_batches_tracked` is needed to calculate `momentum` when we do not feed `momentum` argument in Python API. But in C++ API, `momentum` argument has a default value.
  * `num_batches_tracked` is only used for counting up `BatchNorm1d::foward()` call. I think it is no necessary for user anymore.
* The design of `BatchNorm{1,2,3}dOptions`
  * We have already `BatchNormOptions` used for deprecated `BatchNorm` module. However, it is hard to use it for `BatchNorm{1,2,3}dOptions` because of the arguments disagreement of each modules.
  * In this PR, I introduce `BatchNormOptionsv2` template class for the `BatchNorm{1,2,3}dOptions`. But I'm not sure this design is good or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28176

Differential Revision: D18196843

Pulled By: yf225

fbshipit-source-id: 667e2b5de4150d5776c41b9088c9e6c2ead24cd4
2019-10-29 17:29:42 -07:00
8f1564b8ab Add enum type to rpc registry for consolidating RPC initialization code path (#28628)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28628

Consolidate code paths of ProcessGroupAgent construction and other RPC Backend construction.
ghstack-source-id: 92845348

Differential Revision: D5516188

fbshipit-source-id: 151d9b7b74f68631d6673fecc74dec525949b8f0
2019-10-29 17:26:15 -07:00
b1ea19ca17 Update the misleading comments for zero_points and scale in dynamic quant linear module [1/2] (#28767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28767

The scale and zero_point are for the output activation tensor, not for the weight tensor. We removed them here because we don't need the zero points and scales for the output tensors in dynamic quantization.

ghstack-source-id: 92807318

Test Plan: CI

Differential Revision: D18164949

fbshipit-source-id: 0f9172bfef615c30dc28e1dd4448a9f3cc897c2e
2019-10-29 17:20:32 -07:00
4e56455b09 whitelist autogradanynonzero (#28852)
Summary:
prim::AutogradAnyNonZero is optimized away under normal circumstances (a graph executor specializes tensor arguments and runs `specializeAutogradZero`), so the change should be backward compatible for as long as we are running the original executor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28852

Differential Revision: D18213118

Pulled By: Krovatkin

fbshipit-source-id: 223f172c59e5f2b05460db7de98edbadc45dd73d
2019-10-29 17:00:27 -07:00
f1f86994bc Fix implementation of F::kl_div / F::mse_loss / F::binary_cross_entropy
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28806

Test Plan: Imported from OSS

Differential Revision: D18202859

Pulled By: yf225

fbshipit-source-id: 1aa19111cd5111dd5f2779f7f00f07f2f2e16d4d
2019-10-29 16:54:27 -07:00
d201ff8925 Factor out insertPrepackUnpackForLinear (#27239)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27239

att

Test Plan:
python test/test_jit.py 'TestJit.test_insert_prepack_unpack'

Imported from OSS

Differential Revision: D18182913

fbshipit-source-id: 7cbaac9159520d9e873079d10bf80764f2ec27ae
2019-10-29 16:06:16 -07:00
80e270a76c Add support for host build to pytorch_android native code (#27664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27664

When ANDROID_ABI is not set, find libtorch headers and libraries from
the LIBTORCH_HOME build variable (which must be set by hand), place
output under a "host" directory, and use dynamic linking instead of
static.

This doesn't actually work without some local changes to fbjni, but I
want to get the changes landed to avoid unnecessary merge conflicts.

Test Plan: Imported from OSS

Differential Revision: D18210315

Pulled By: dreiss

fbshipit-source-id: 685a62de3c2a0a52bec7fd6fb95113058456bac8
2019-10-29 16:04:18 -07:00
34455c68b5 Remove unnecessary BUILD_DIR variable in Android CMake build (#27663)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27663

CMake sets CMAKE_BINARY_DIR and creates it automatically.  Using this
allows us to use the -B command-line flag to CMake to specify an
alternate output directory.

Test Plan: Imported from OSS

Differential Revision: D18210316

Pulled By: dreiss

fbshipit-source-id: ba2f6bd4b881ddd00de73fe9c33d82645ad5495d
2019-10-29 16:04:13 -07:00
c9423c30b3 Add host build for pytorch_android (#27662)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27662

This adds a new gradle subproject at pytorch_android/host and tweaks
the top-level build.gradle to only run some Android bits on the other
projects.

Referencing Java sources from inside the host directory feels a bit
hacky, but getting host and Android Gradle builds to coexist in the same
directory hit several roadblocks.  We can try a bigger refactor to
separate the Android-specific and non-Android-specific parts of the
code, but that seems overkill at this point for 4 Java files.

This doesn't actually run without some local changes to fbjni, but I
want to get the files landed to avoid unnecessary merge conflicts.

Test Plan: Imported from OSS

Differential Revision: D18210317

Pulled By: dreiss

fbshipit-source-id: dafb54dde06a5a9a48fc7b7065d9359c5c480795
2019-10-29 16:04:09 -07:00
793e2914e4 Support full id interations (#28769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28769

Support full id interaction.

Test Plan:
* unit-tests
  * buck test caffe2/caffe2/python/operator_test:pack_ops_test --
  * buck test caffe2/caffe2/fb/dper/layer_models/tests:sparse_nn_attention_test -- test_sparse_nn_full_id

* canary
  * apply SUM + full id with max_length as 20 on SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID: f147253340 (v1: f146340704)

# of embeddings for this features is 20:
{F219139816}

The corresponding ops: two lookups, which is as expected.
```
op {
  input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_0/Repeat_0/sparse_lookup/w"
  input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:values"
  input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:lengths"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_0/Repeat_0/sparse_lookup/output"
  name: ""
  type: "SparseLengthsSum"
}
op {
  input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/w"
  input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:values"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/output"
  name: ""
  type: "Gather"
}
op {
  input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:lengths"
  input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/output"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/PackSegments/embedding_packed"
  name: ""
  type: "PackSegments"
  arg {
    name: "max_length"
    i: 20
  }
  arg {
    name: "pad_minf"
    i: 0
  }
}
op {
  input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/PackSegments/embedding_packed"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/reshaped_record"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/old_shape"
  name: ""
  type: "Reshape"
  arg {
    name: "shape"
    ints: -1
    ints: 1280
  }
}
op {
  input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/reshaped_record"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_0"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_1"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_2"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_3"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_4"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_5"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_6"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_7"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_8"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_9"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_10"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_11"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_12"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_13"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_14"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_15"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_16"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_17"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_18"
  output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_19"
  name: ""
  type: "Split"
  arg {
    name: "axis"
    i: 1
  }
}
```

Reviewed By: chonglinsun

Differential Revision: D18083520

fbshipit-source-id: f592fb7734dd4e3e712ba42dc0afcd0b32a4afa0
2019-10-29 14:56:18 -07:00
aa949b12b3 InsertObservers (#27238)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27238

att

Test Plan:
test_jit.py insert_observers

Imported from OSS

Differential Revision: D18182914

fbshipit-source-id: 718300f259a2e38e730d3e7cc6308813fd1112af
2019-10-29 14:24:16 -07:00
4045d6c3fa Revert D18187208: Add OfflineTensor
Test Plan: revert-hammer

Differential Revision:
D18187208

Original commit changeset: 57c70f6f9897

fbshipit-source-id: d13b089ceb645b2a9852923cd21a752a2f45a15b
2019-10-29 14:20:46 -07:00
e33b4b6761 Use c10::variant-based enums for Reduction
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27942

Test Plan: Imported from OSS

Differential Revision: D18202857

Pulled By: yf225

fbshipit-source-id: 0303ce2508e3b7665c6a91ae270a7d0ef0e45900
2019-10-29 14:15:48 -07:00
d8c368bd62 CPU-strided-complex support for compare and pointwise ops (#28735)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

These changes optimize complex Vec256 math kernels so that are within 2X real number performance on average.  [Benchmarks are here](https://docs.google.com/spreadsheets/d/17pObcrSTpV4BOOX9FYf1vIX3QUlEgQhLvL1IBEyJyzs/edit#gid=0)

Changes so far:

- [x]  Added complex support for eq, neq, max, and min ops.
   - max/min ops need to compare the absolute value for complex numbers (using zabs).
- [x] Added complex support for is_nonzero and where.
   - where op compares the absolute value for complex numbers (using zabs).
- [x] Added complex support for linear interp and and pointwise ops.
- [x] Added complex support for check_convert and Linspace/Logspace.
   - std::complex does not support ++operator.
   - All compilers from clang, g++, c++ on aarch64, x86 produce the same assembly code when using `+=1' instead of `++`. [example for loop](https://godbolt.org/z/O6NW_p)
- [x] Added complex support for log, log2, log10.
- [x] Optimized Vec256 operators using various logarithmic identities.
  - `asin()`, `acos()`, `atan()` is optimized using a `ln()` identity.
  - `sqrt()` is optimized by splitting the computation into real and imag parts.
  - several `_mm256_mul_pd` are avoided by using `_mm256_xor_pd` ops instead.
- [x] Added complex support for pow.
  - exp is cast to `std::complex<double>`.
  - no special optimization is added when the `exp` is real because the `std::pow()` operator expects a std::complex number.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28735

Differential Revision: D18170691

Pulled By: ezyang

fbshipit-source-id: 6f167398e112cdeab02fcfde8b543cb6629c865a
2019-10-29 13:37:01 -07:00
22d70bc1ec Add OfflineTensor
Summary: OfflineTensor will be a shell to just carry the shape and dtype. No data will be stored. This should help us plumb through the onnxifi process.

Test Plan:
```
buck test caffe2/caffe2/fb/opt:onnxifi_with_offline_tensor_test
```

Reviewed By: ChunliF, zrphercule

Differential Revision: D18187208

fbshipit-source-id: 57c70f6f9897a5fc66580c81295db108acd03862
2019-10-29 13:04:00 -07:00
6b5bfd4cfc Make inserted child module names unique (#27237)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27237

Making inserted observer module and wrapper module names unique

Test Plan:
test_jit.py

Imported from OSS

Differential Revision: D18182917

fbshipit-source-id: 77aa5997fbf024c73085866550372b5e68ad9ae1
2019-10-29 12:30:49 -07:00
7e8c48bff5 argmax for half datatype fix (#28787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28787

Stack from [ghstack](https://github.com/ezyang/ghstack):
* **#28787 argmax for half datatype fix**

Test Plan: Imported from OSS

Differential Revision: D18194420

Pulled By: pbelevich

fbshipit-source-id: d2abec1ea8a9ce3a93aec5a2c5bba57d163197e6
2019-10-29 12:25:43 -07:00
e57a119773 Remove autograd copy_ specific isFloatingPoint (#28279)
Summary:
Remove autograd copy_ specific isFloatingPoint and use
c10's isFloatingType (and isComplexType).
Before this, .to or .copy_ would drop requires_grad for bfloat16
as the floating types were only considered to be double, float,
and half.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28279

Differential Revision: D18176084

Pulled By: izdeby

fbshipit-source-id: 8a005a6105e4a827be5c8163135e693a7daae4f4
2019-10-29 12:25:39 -07:00
83331bf123 don't overspecify required python version (#28842)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28842

We don't care which python version, and github actions has changed the
versions available, breaking our CI. So just pin it to 3-something to
make it more future proof

Test Plan: Imported from OSS

Differential Revision: D18205349

Pulled By: suo

fbshipit-source-id: bf260dc29a138dd8bf8c85081a182aae298fe86d
2019-10-29 12:08:47 -07:00
b7d472a109 Some fixes for jit overview doc (#28112)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28112

att

Test Plan:
reading

Imported from OSS

Differential Revision: D18173102

fbshipit-source-id: d8574758288bfce08eaf0f4f6163284defb56d6e
2019-10-29 12:08:42 -07:00
efbaa8a563 added a check for zero stride
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28784

Differential Revision: D18178889

Pulled By: anjali411

fbshipit-source-id: 976810bf3f9def3a8f5ca6885b1e049b831f06f3
2019-10-29 12:08:38 -07:00
607defa8a9 print per block avg time when running on AI-PEP machines (#28838)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28838

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test -- --ai_pep_format true
  Total time: 02:36.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: Softmax
/proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  """
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.83197245048359"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.839232977246866"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.7970924858236685"}
PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.708389271399938"}
# Benchmarking PyTorch: Softmax
...

Reviewed By: hl475

Differential Revision: D18202504

fbshipit-source-id: 4a332763432b3b5886f241bb2ce49d4df481a6f3
2019-10-29 12:08:33 -07:00
0a68e8bab0 fix op bench runtime error when use_jit is enabled (#28837)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837

The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue.

Test Plan:
```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit
Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated
  Total time: 02:30.3 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: JIT
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 118.052

Reviewed By: hl475

Differential Revision: D18197057

fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8
2019-10-29 12:08:28 -07:00
ac4c72db3b add DNNLOWP static qparam choosing to pybind
Summary: as title

Test Plan: test in stacked diff

Reviewed By: csummersea

Differential Revision: D18123726

fbshipit-source-id: ce75db1e6f314a822a94ebdfc11988fab50ee836
2019-10-29 12:05:33 -07:00
f42768f8c0 Add scripts to run cuda-memcheck (#28127)
Summary:
This PR adds scripts that could be used for https://github.com/pytorch/pytorch/issues/26052

Example output:

```
Success: TestTorchDeviceTypeCPU.test_advancedindex_big_cpu
Success: TestTorchDeviceTypeCPU.test_addcmul_cpu
Success: TestTorchDeviceTypeCPU.test_addbmm_cpu_float32
Success: TestTorchDeviceTypeCPU.test_advancedindex_cpu_float16
Success: TestTorchDeviceTypeCPU.test_addmv_cpu
Success: TestTorchDeviceTypeCPU.test_addcdiv_cpu
Success: TestTorchDeviceTypeCPU.test_all_any_empty_cpu
Success: TestTorchDeviceTypeCPU.test_atan2_cpu
Success: TestTorchDeviceTypeCPU.test_advancedindex_cpu_float64
Success: TestTorchDeviceTypeCPU.test_baddbmm_cpu_float32
Success: TestTorchDeviceTypeCPU.test_atan2_edgecases_cpu
Success: TestTorchDeviceTypeCPU.test_add_cpu
Success: TestTorchDeviceTypeCPU.test_addr_cpu_bfloat16
Success: TestTorchDeviceTypeCPU.test_addr_cpu_float32
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28127

Differential Revision: D18184255

Pulled By: mruberry

fbshipit-source-id: 7fd4bd9faf9f8b37b369f631c63f26eb965b16e7
2019-10-29 12:05:29 -07:00
4703854321 change softmax input shape (#28836)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28836

as title

Test Plan:
```
buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
  Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
  ... and 56 more. See logs for all changes
Parsing buck files: finished in 6.2 sec
Creating action graph: finished in 8.8 sec
Building: finished in 05:42.6 min (100%) 28336/28336 jobs, 23707 updated
  Total time: 05:57.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: Softmax
/proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  """
# Mode: Eager
# Name: Softmax_N4_C3_H256_W256
# Input: N: 4, C: 3, H: 256, W: 256
Forward Execution Time (us) : 18422.487

Reviewed By: hl475

Differential Revision: D18202335

fbshipit-source-id: 0bb376cb465d998a49196e148d48d436126ae334
2019-10-29 12:05:25 -07:00
ef5a6b2262 Avoid the misleading zero_point and scale [2/2] (#28827)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28827

When we print the `DynamicLinear` module, we don't want to print the scale and zero points as they are not needed for the dynamic quantization.

Let's take the output of RoBERTa model as an example:

Before this PR:
```
      (19): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (20): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
```

After this PR:
```
      (19): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (20): TransformerEncoderLayer(
        (dropout): Dropout(p=0.1, inplace=False)
        (attention): MultiheadAttention(
          (dropout): Dropout(p=0.1, inplace=False)
          (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072)
          (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024)
        )
        (residual_mlp): ResidualMLP(
          (mlp): Sequential(
            (0): DynamicQuantizedLinear(in_features=1024, out_features=4096)
            (1): GeLU()
            (2): Dropout(p=0.1, inplace=False)
            (3): DynamicQuantizedLinear(in_features=4096, out_features=1024)
            (4): Dropout(p=0.1, inplace=False)
          )
        )
        (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
```
ghstack-source-id: 92807317

Test Plan: CI

Differential Revision: D18197022

fbshipit-source-id: e41635330cfdfb008a0468d6a8ff67a06f7e1c59
2019-10-29 12:02:45 -07:00
47faee2fae Switching tests to ProfilingExecutor (rebased)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28535

Differential Revision: D18197932

Pulled By: Krovatkin

fbshipit-source-id: 2639b205e899f800787ee57c157447d54e4669c3
2019-10-29 11:41:42 -07:00
eb55104185 Updating submodules
Summary:
GitHub commits:

214b370edb

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: aa03f9a37d316c232fdf2e4289c32ec68a22b469
2019-10-29 10:13:21 -07:00
5fbec1f55d Revert D18170996: Move type casting to c10/util/TypeCast.h
Test Plan: revert-hammer

Differential Revision:
D18170996

Original commit changeset: 41658afd5c0a

fbshipit-source-id: 394e84bbc52bdd708609304261ffa1513a771d57
2019-10-29 07:43:01 -07:00
0301f5f30b Revert D18170997: Make TensorIterator stop promoting types by copying
Test Plan: revert-hammer

Differential Revision:
D18170997

Original commit changeset: 9c82c1c89583

fbshipit-source-id: 8862d9628864d23a087f2895870386772a634e45
2019-10-29 07:42:56 -07:00
dff159804f Revert D18170995: Simplify copy kernel
Test Plan: revert-hammer

Differential Revision:
D18170995

Original commit changeset: 461b53641813

fbshipit-source-id: 1ebb119325d746a153982ac3209d3570a7e18d88
2019-10-29 07:42:52 -07:00
f6692146e7 Add Conv3dInt8 (#28768)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28768

Add Conv3dInt8

Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv"

Reviewed By: jianyuh

Differential Revision: D18023661

fbshipit-source-id: 8fc7a4350baf29271dfd6fa3c1c4b10e60e2fdbf
2019-10-28 23:28:11 -07:00
295401f04c Updating submodules
Summary:
GitHub commits:

edee4921c4

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: b69770ac1a801b372fba0e112124b25ad1572821
2019-10-28 22:24:35 -07:00
a0339c8d8f bootstrap.sh refactor (#28809)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28809

### Summary

This PR adds the interactive mode to `bootstrap.sh`. Instead of passing the credential information from command parameters(`-t`,`-p`), we're going to ask the user enter that information and save it to a config file, such that next time you don't have to enter again. So all you need now, is one line command

```shell
./bootstrap
```

### Test Plan

- TestApp.ipa can be installed on any devices
- Don't break CI jobs

Test Plan: Imported from OSS

Differential Revision: D18194032

Pulled By: xta0

fbshipit-source-id: a416ef7f13fa565e2c10bb55f94a8ce994b4e869
2019-10-28 22:20:29 -07:00
097da55249 Fix BC check CI (#28816)
Summary:
Skip the functions which were reverted.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28816

Reviewed By: hl475

Differential Revision: D18196628

Pulled By: houseroad

fbshipit-source-id: 30d43fcd57efb21b870c6a630b7ee305604dc603
2019-10-28 21:47:18 -07:00
52dd587123 C++ API parity: Upsample (#28413)
Summary:
Adds `interpolate` functional and `Upsample` module support for the C++ API.

**Issue**: https://github.com/pytorch/pytorch/issues/25883

**Reviewer**: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28413

Differential Revision: D18165014

Pulled By: yf225

fbshipit-source-id: ecae2f432a301b1f4afa7c038b2d104cbad139f2
2019-10-28 21:34:44 -07:00
1e3e1f5bf9 Fix build error in VariableTypeManual
Summary:
build error in internal pt mobile build

```
xplat/caffe2/torch/csrc/autograd/VariableTypeManual.cpp:118:49: error: address of function 'requires_grad' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion]
      autograd::utils::requires_grad_leaf_error(requires_grad)
      ~~~~~~~~                                  ^~~~~~~~~~~~~
xplat/caffe2/torch/csrc/autograd/VariableTypeManual.cpp:118:49: note: prefix with the address-of operator to silence this warning
```

I think the variable name in requires_grad_leaf_error is wrong.

Test Plan: mobile build works

Reviewed By: pbelevich

Differential Revision: D18192663

fbshipit-source-id: a3d3ebb9039022eb228c1d183a1076f65f9e84e0
2019-10-28 17:55:41 -07:00
c6ad68cf10 Updating submodules
Summary:
GitHub commits:

724e939772
f4fb4266c0
95d4b19724
8b8131450e
ac8faa6528
5487e2b1a2

Test Plan: n/a

Reviewed By: zpao

fbshipit-source-id: 9b9f4cccd869638215c17111361a6f6c480c73af
2019-10-28 17:55:35 -07:00
6f90567e0c Add the unittest import for test_fake_quant.py (#28815)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28815

Add the unittest import
ghstack-source-id: 92789329

Test Plan: CI

Differential Revision: D18191989

fbshipit-source-id: c54e0309e21156c33e4fec01bfba17a1c30463c9
2019-10-28 17:52:57 -07:00
949678bd9e Small fixes for torchbind (#28800)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28800

Fix up namespaces and make friendly error message when registered class doesn't inherit from the right base

Test Plan: Imported from OSS

Differential Revision: D18175067

Pulled By: jamesr66a

fbshipit-source-id: 5c7cf3a49fb45db502d84eb3f9a69be126ee59fb
2019-10-28 16:45:24 -07:00
f63cf96c4d Update C++ parity table for torch::nn::Linear (#28804)
Summary:
Since we have merged https://github.com/pytorch/pytorch/pull/27382 (thanks pbelevich!)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28804

Differential Revision: D18185714

Pulled By: yf225

fbshipit-source-id: 1148f5837fbf578843b989fc53fd334519943cdd
2019-10-28 14:55:25 -07:00
5c5b2c68db Simplify copy kernel (#28428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28428

Using the new type promotion and dynamic casting added to
`TensorIterator`, the copy kernels could be greatly simplified.

Benchmark on CUDA:
```python
import torch
import timeit
import pandas
import itertools
from tqdm.notebook import tqdm
import math
print(torch.__version__)
print()

_10M = 10 * 1024 ** 2

d = {}

for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)):
    if from_ not in d:
        d[from_] = {}
    a = torch.empty(_10M, dtype=from_, device='cuda')
    min_ = math.inf
    for i in range(100):
        torch.cuda.synchronize()
        start = timeit.default_timer()
        a.to(to)
        torch.cuda.synchronize()
        end = timeit.default_timer()
        elapsed = end - start
        if elapsed < min_:
            min_ = elapsed
    d[from_][to] = int(min_ * 1000 * 1000)

pandas.DataFrame(d)
```

original:
![image](https://user-images.githubusercontent.com/1032377/67623519-e3e6dd80-f7da-11e9-86ea-9cc9f237123b.png)

new:
![image](https://user-images.githubusercontent.com/1032377/67623527-fc56f800-f7da-11e9-82bd-dc1ff9821b68.png)

Test Plan: Imported from OSS

Differential Revision: D18170995

Pulled By: ezyang

fbshipit-source-id: 461b53641813dc6cfa872a094ae917e750c60759
2019-10-28 14:49:06 -07:00
b9f099ed93 Make TensorIterator stop promoting types by copying (#28427)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28427

Fixes: https://github.com/pytorch/pytorch/issues/26401

This PR fixes the issue by using the newly added dynamic cast inside
`TensorIterator` so that instead of converting the type at the beginning
(which generates extra kernel launches), the `TensorIterator` do a
load-cast-compute-store for each element while looping. So there is only
one read and one write of memory.

**nvprof:**
```python
import torch

_100M = 100 * 1024 ** 2
r = torch.randn(_100M, dtype=torch.float32, device='cuda')
d = torch.randn(_100M, dtype=torch.float64, device='cuda')
torch.cuda.synchronize()
torch.cuda.profiler.start()
r.add_(d)
torch.cuda.profiler.stop()
torch.cuda.synchronize()
```

```
==11407== NVPROF is profiling process 11407, command:
/home/xgao/anaconda3/bin/python simple.py
==11407== Profiling application: /home/xgao/anaconda3/bin/python
simple.py
==11407== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min
Max  Name
 GPU activities:  100.00%  2.0611ms         1  2.0611ms  2.0611ms
2.0611ms
_ZN2at6native18elementwise_kernelILi512ELi1EZNS0_15gpu_kernel_implIZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEvS4_RKT_EUliE_EEviT1_
      API calls:  100.00%  1.05006s         1  1.05006s  1.05006s
1.05006s  cudaLaunchKernel
                    0.00%  2.7740us         2  1.3870us     673ns
2.1010us  cudaGetDevice
                    0.00%  2.3730us         1  2.3730us  2.3730us
2.3730us  cudaSetDevice
                    0.00%     830ns         1     830ns     830ns
830ns  cudaGetLastError
```

**benchmark**
```python
import torch
print(torch.__version__)
print(torch.version.git_version)

_100M = 100 * 1024 ** 2
r = torch.randn(_100M, dtype=torch.float32, device='cuda')
d = torch.randn(_100M, dtype=torch.float64, device='cuda')
torch.cuda.synchronize()
%timeit r.add_(d); torch.cuda.synchronize()
```

original
```
1.4.0a0+7d277b0
7d277b0670eb1f9098a7e098e93b20453e8b5c9f
6.83 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

after
```
1.4.0a0+f0f2f65
f0f2f654cba9b8c569f0bcd583732bbc891f80b2
2.08 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

For more benchmark, see: https://github.com/pytorch/pytorch/pull/28344

Test Plan: Imported from OSS

Differential Revision: D18170997

Pulled By: ezyang

fbshipit-source-id: 9c82c1c89583f3e6202c5d790b9b73ad9f960fad
2019-10-28 14:49:02 -07:00
688a9dbe3c Move type casting to c10/util/TypeCast.h (#28426)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28426

Type casting is used in copy, and will be used also in tensor iterator
in the next stacked diff. I move it to c10 to make it serve as an common
util for different things.

I also add two dynamic casting functions
- fetch_and_cast
- cast_and_store

fetch_and_cast fetch a value with dynamic type specified by a ScalarType
from a void pointer and cast it to a static type.

cast_and_store casts a static typed value into dynamic type specified
by a ScalarType, and store it into a void pointer.

Test Plan: Imported from OSS

Differential Revision: D18170996

Pulled By: ezyang

fbshipit-source-id: 41658afd5c0ab58c6b6c510424893d9a2a0c059e
2019-10-28 14:48:57 -07:00
f33813d589 Return NotImplemented from all binary math ops (#27423)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/26333

Fixes the operators missed in https://github.com/pytorch/pytorch/issues/26507 and includes a test for all operators.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27423

Differential Revision: D17835390

Pulled By: ezyang

fbshipit-source-id: 7a1351c7ccc8ad11454dbaa00d3701dcee4f06a8
2019-10-28 14:28:33 -07:00
9e64c54c01 Add the warning message for API with linear modules (#28766)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28766

Add the warning message to explicitly ask the users to upgrade the deprecated `torch.jit.quantized` API to the new `torch.quantization.quantize_dynamic` API.
ghstack-source-id: 92711620

Test Plan: CI

Differential Revision: D18164903

fbshipit-source-id: e6aff2527f335c2d9f362e6856ce8597edb52aaa
2019-10-28 14:24:44 -07:00
02d318461e Temporarily disable test_numerical_consistency_per_channel due to failure (#28807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28807

`FAIL: test_numerical_consistency_per_channel (_main_.TestFakeQuantizePerChannel)`

This test is failing consistently on master, we can't find a clean blame.
ghstack-source-id: 92763176

Test Plan: CI

Differential Revision: D18181496

fbshipit-source-id: 5948af06c4cb7dea9a8db1366deb7c12f6ec1c72
2019-10-28 13:51:10 -07:00
9f890a9218 make sure clang-tidy is diffing against the right thing (#28788)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28788

Okay, my last fix was wrong because it turns out that the base SHA is
computed at PR time using the actual repo's view of the base ref, not
the user's. So if the user doesn't rebase on top of the latest master
before putting up the PR, the diff thing is wrong anyway.

This PR fixes the issue by not relying on any of these API details and
just getting the merge-base of the base and head refs, which should
guarantee we are diffing against the right thing.

This solution taken from https://github.com/github/VisualStudio/pull/1008

Test Plan: Imported from OSS

Differential Revision: D18172391

Pulled By: suo

fbshipit-source-id: 491a50119194508b2eefa5bd39fe813ca85f27b1
2019-10-28 13:47:51 -07:00
5804e54c81 Deprecate torch::nn::modules_ordered_dict API (#28774)
Summary:
I finally found a way to get the following API to work for constructing a list of named submodules for `Sequential`:
```cpp
Sequential sequential({
  {"m1", MyModule(1)},
  {"m2", MyModule(2)}
})`
```
which was actually our original proposed design and much simpler than our current API:
```cpp
Sequential sequential(modules_ordered_dict({
  {"m1", MyModule(1)},
  {"m2", MyModule(2)}
}));
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28774

Differential Revision: D18174013

Pulled By: yf225

fbshipit-source-id: 3a18c2d36b6a65a07bee7346a7516780567c7774
2019-10-28 13:01:13 -07:00
87c98acf5d Back out "Add memory format support to full_like operator" (#28803)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28803

Original commit changeset: 1761a9939aa7
ghstack-source-id: 92748946

Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx

Reviewed By: ifedan

Differential Revision: D18175282

fbshipit-source-id: d3f537bbed50a4524797edd96b210b8455ef1bcc
2019-10-28 12:44:53 -07:00
d828fef8ac Back out "Add memory format support to ones_like operator" (#28802)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28802

Original commit changeset: 5da9530f6b23
ghstack-source-id: 92748794

Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx

Reviewed By: ifedan

Differential Revision: D18175303

fbshipit-source-id: ac36c7d345cba901bc2b64dc22661b8d0f179f13
2019-10-28 12:44:49 -07:00
266c1652e6 Back out "Add memory format support to rand_like operator" (#28801)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28801

Original commit changeset: 2a1d47571268
ghstack-source-id: 92748792

Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx

Reviewed By: ifedan

Differential Revision: D18175304

fbshipit-source-id: ffd61f6e42f256b39b80a6b42d989c238228f25d
2019-10-28 12:44:45 -07:00
648749b203 C++ API: torch::nn::LPPool2d (#28492)
Summary:
Add torch::nn::LPPool2d module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883 #27800

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28492

Differential Revision: D18109401

Pulled By: yf225

fbshipit-source-id: 5cedecb895d9d44c2167cdb3f6f758f3426b3497
2019-10-28 12:28:25 -07:00
052046b18e Enabling intra-op parallelism for dynamic quantized Linear operator (#28477)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28477

Similar to https://github.com/pytorch/pytorch/pull/26692, we would like to enable the intra-op parallelism for dynamic Linear op.
ghstack-source-id: 92419573

Test Plan:
CI

Test Benchmark:
```
import time
import torch

K, N = 1024, 1024

print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ')

for M in range(512, 2049, 512):
    print(M, sep=',', end=', ')
    for num_threads in (1, 2, 4, 8, 16,):

        torch.set_num_threads(num_threads)

        x = torch.rand(M, K)
        w = torch.rand(K, N)

        NITER = 20

        # Test dynamic quantized
        q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8)
        packed_w = torch.ops.quantized.linear_prepack(q_w, None)

        s = time.time()
        for i in range(NITER):
            torch.ops.quantized.linear_dynamic(x, packed_w)
        elapsed_per_iter_dyn_quant = (time.time() - s) / NITER

        print("{:0.2f}".format(2.0*M*N*K/elapsed_per_iter_dyn_quant/1E9), end=', ')
    print("\n", end='')
```
Before this Diff:
```
(base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 119.28, 139.50, 141.66, 141.58, 141.42,
1024, 122.42, 141.21, 123.09, 141.85, 123.03,
1536, 122.80, 122.18, 141.39, 123.25, 141.35,
2048, 123.41, 141.34, 123.62, 140.55, 123.76,
```

After this Diff:
```
(base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py
M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16
512, 123.29, 271.99, 508.66, 882.83, 1295.07,
1024, 126.05, 273.15, 515.42, 914.11, 877.63,
1536, 142.48, 236.85, 524.10, 481.32, 970.81,
2048, 124.76, 279.03, 433.73, 958.67, 1045.82,
```

Differential Revision: D18074757

fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5
2019-10-28 12:13:35 -07:00
9f44a04613 separate PT and C2 to reduce build time (#28731)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731

as title

Test Plan:
```
Before:
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds.
  Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev')
  ... and 69 more. See logs for all changes
Parsing buck files: finished in 7.2 sec
Creating action graph: finished in 10.0 sec
Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated
  Total time: 06:55.7 min
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: sigmoid

With this diff
buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid
Parsing buck files: finished in 6.4 sec
Creating action graph: finished in 9.8 sec
Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated
  Total time: 06:52.1 min

Reviewed By: hl475

Differential Revision: D18152071

fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee
2019-10-28 11:10:47 -07:00
0c7537c409 Fix obviously-broken .clang-tidy files (#28547)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28547

Pull Request resolved: https://github.com/pytorch/glow/pull/3672

See D18090864 for more background.  The issue i addressed there is more widespread, so i'm fixing all the other `.clang-tidy` files clearly not working as intended.

Perhaps this means it's time to lint the linter config :-)

Test Plan:
Here's the resulting output for `~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy` related to each file touched:

`fbcode/admarket/intent/.clang-tidy`: P119723794
`fbcode/caffe2/.clang-tidy`: P119723978
`fbcode/glow/glow/.clang-tidy`: P119724081
`fbcode/ice_palace/.clang-tidy`: P119724774
`fbcode/unified_graph/aggregator/.clang-tidy`: P119724375
`xplat/caffe2/.clang-tidy`: P119724464
`xplat/mcfcpp/.clang-tidy`:
```
[billfarner@devvm2187.ftw3 ~/fbsource/xplat/mcfcpp]  ~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy -explain-config
'readability-identifier-naming' is enabled in the /home/billfarner/fbsource/xplat/mcfcpp/.clang-tidy.
```

`xplat/wa-msys/mcfcpp/.clang-tidy`:
```
[billfarner@devvm2187.ftw3 ~/fbsource/xplat/wa-msys/mcfcpp]  ~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy -explain-config
'readability-identifier-naming' is enabled in the /home/billfarner/fbsource/xplat/wa-msys/mcfcpp/.clang-tidy.
```

Reviewed By: soumith

Differential Revision: D18092684

fbshipit-source-id: 951307d125c0346322cb2c636c0300004a48d7a9
2019-10-28 09:54:34 -07:00
f5ea2ca34a Reduce logging frequency for empty range tolarence
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28704

Reviewed By: xianjiec

Differential Revision: D18138828

fbshipit-source-id: 4f3c376502cb6e30b931217702c4ca537c9eb644
2019-10-28 09:52:17 -07:00
7ed9a3ec48 Change reorder_dimensions behavior to favor output writting sequence (#28615)
Summary:
reorder_dimensions() currently iterate all the operands when determining the dimension order in the TensorIterator. It tries to move a dimension to front if any operand has a dimension whose stride is bigger than this dimension.

reorder_dimensions() do respect the case that stride has zero value. I did not see a reason why reorder_dimensions() need to keep probing each operand under regular cases.

Changed behavior a little bit.
Since operands is ordered by outputs tensor first followed by input tensor.  I would favor the writing of outputs is as sequential as possible. This could make the copy between tensors with different memory format faster.

Pls correct me if this change is wrong, thanks.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28615

Reviewed By: VitalyFedyunin

Differential Revision: D18122474

Pulled By: glaringlee

fbshipit-source-id: f36467489fe6c6514b14ce9dcc439628d5d5ad0e
2019-10-28 08:50:03 -07:00
82f31e02a3 Remove the redundant calculation of derivative of power function (#28651)
Summary:
Hi, I notice that the pytorch faced the the issue as HIPS/autograd#541 .
I try to solve it, hope it can help.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28651

Reviewed By: gchanan

Differential Revision: D18137163

Pulled By: albanD

fbshipit-source-id: 888bef65c72c4c15c2acdd4b13d5041008b1354e
2019-10-28 08:37:04 -07:00
4230132baf Added docs for context method mixins. Fixes issue #27365 (#28643)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27365 .

This PR:
1. Makes Context method docs available.
2. Links [Extending torch autograd](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd) notes to Context method docs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28643

Differential Revision: D18170089

Pulled By: albanD

fbshipit-source-id: a1119ea8e2f8a71f0d1aadf416f2f98343aa9b7b
2019-10-28 08:31:35 -07:00
0e86c99bfb Updating submodules
Summary:
GitHub commits:

a3277b4e50
9ac4d71072
98141ffe1b

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: e1d9ec467e9d72774fda12ca1e8ca2e740fbe5c7
2019-10-28 08:29:12 -07:00
5835ad07cb provide memory format as Contiguous explicitly when calling to clone() (#28029)
Summary:
provide memory format explicitly when calling to clone():
```
clone(MemoryFormat::Contiguous); \\instead of clone()
```

This change is based on https://github.com/pytorch/pytorch/pull/27106
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28029

Differential Revision: D17937468

Pulled By: ifedan

fbshipit-source-id: 0a6a600af76fc616f88893e5db16aabd7981ce14
2019-10-28 08:21:39 -07:00
6eaea39867 Kill _th_zero binding, just use a simple native function instead.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28597

Test Plan: Imported from OSS

Differential Revision: D18116721

Pulled By: gchanan

fbshipit-source-id: f93b968333042700c31e37f434080b200754dddc
2019-10-28 08:17:46 -07:00
8e67b78d9b Kill THTensor_(match), which isn't used.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28596

Test Plan: Imported from OSS

Differential Revision: D18116718

Pulled By: gchanan

fbshipit-source-id: a038eaad0f6cf951a5d412078cfcba3ae534ea95
2019-10-28 08:17:43 -07:00
1e5b2559ac Write out some set_ overloads instead of relying on code binding generation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28595

Test Plan: Imported from OSS

Differential Revision: D18116720

Pulled By: gchanan

fbshipit-source-id: a917e03aeb8d5513ad3882163642b800ae35dabe
2019-10-28 08:17:39 -07:00
45dab56153 adding python all_gather coalesced functionality and testing. (#28634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28634

caveat 1: this only works in sync mode.
caveat 2: this is going to go away and be replaced by c++ implementation

Test Plan: buck test caffe2/test:distributed_gloo -- test_all_gather_coalesced

Reviewed By: mrshenli

Differential Revision: D18123422

fbshipit-source-id: cfb9950d5d54c6181a5240e7cc9fed88ed47f5d9
2019-10-28 08:12:36 -07:00
aea94de067 Exclude more files in torch/csrc/distributed when USE_DISTRIBUTED=0 (#28621)
Summary:
Changelog:
- Guard inclusion of certain files in torch/csrc/distributed included in caffe2/CMakeLists.txt when USE_DISTRIBUTED=0
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28621

Test Plan:
- Builds should be successful
- Tests should pass

Differential Revision: D18145330

Pulled By: ezyang

fbshipit-source-id: 7167a356b03ae783e6b0120f2ad3552db2b3ed86
2019-10-28 08:03:30 -07:00
4cf7277d62 Explain how to specify library location for MKL (#28779)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24334.

I'm still kind of confused why `FindMKL.cmake` was unable to locate my MKL libraries. They are in the standard `/opt/intel/mkl` installation prefix on macOS. But at least with this more detailed error message, it will be easier for people to figure out how to fix the problem.

zhangguanheng66 xkszltl soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28779

Differential Revision: D18170998

Pulled By: soumith

fbshipit-source-id: 47e61baadd84c758267dca566eb1fb8a081de92f
2019-10-28 08:00:54 -07:00
5da932ad72 Return None correctly from Tensor.names (#28659)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28659

Previously, we would return None from `Tensor.names` without bumping the
refcount. This is a bug; the Python API requires the developer to
increment the refcount on new references to None. This is because None
is a singleton object and does not automatically have its reference
count bumped when one uses Py_None (which is a pointer to the actual
None singleton object).

See the following for Python documentation on this:
- https://docs.python.org/3/c-api/none.html#c.Py_RETURN_NONE
- https://docs.python.org/3/extending/extending.html#back-to-the-example

Fixes https://github.com/pytorch/pytorch/issues/28646

Test Plan: - New test.

Differential Revision: D18140593

Pulled By: zou3519

fbshipit-source-id: 302a09021b68229e2e7b1b584b3549b30506bdab
2019-10-28 07:01:22 -07:00
c60dee271d addmm: Fix handling of case with empty tensor (#28613)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28613

addmm: Fix handling of case with empty tensor.
Currently these cause an error

Recreation of D18085389 without stacked diffs

Test Plan: test included

Differential Revision: D18122004

fbshipit-source-id: 71513c02ace691902553bea5ce9dc2538cca4c99
2019-10-28 05:52:50 -07:00
c89340f068 Extend HasElements to support multiple inputs (#28717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28717

Make HasElements support multiple inputs. Any input has element, then return true.

Test Plan: to be added

Reviewed By: BIT-silence

Differential Revision: D17972759

fbshipit-source-id: 3ecdea74a30fcfaaa6490fef1debc6cde68db922
2019-10-27 23:00:07 -07:00
7df3366f8d Eliminate some unnecessary tensor ref count bumps.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28695

Differential Revision: D18144971

fbshipit-source-id: 3d6ee1343a458a8363707ce468f4d9eab2784ebb
2019-10-27 20:02:34 -07:00
f43194ed9e Move mode_t declaration in PadOptions (#28760)
Summary:
Based on the discussion in https://github.com/pytorch/pytorch/pull/28413#discussion_r338839489, putting anything that's not tagged as `public:` under a `TORCH_ARG` line would hide it under `private:`. To get around this problem, we should move the `mode_t` declaration at the top of the PadOptions declaration.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28760

Differential Revision: D18165117

Pulled By: yf225

fbshipit-source-id: cf39c0a893822264cd6a64cd887729afcd84dbd0
2019-10-27 15:51:39 -07:00
d5afd97569 Refactor qconv_prepack and qconv_unpack to support conv3d (#28481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28481

Refactor qconv_prepack and qconv_unpack to support conv3d

Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv"

Reviewed By: dskhudia

Differential Revision: D18023651

fbshipit-source-id: 8cbc9fe68f93bc4b247a4f41423c6d8c30a5ef90
2019-10-27 14:43:16 -07:00
764e0ee882 Improve Tensor type hints (#28578)
Summary:
I've typed some attributes from ee920b92c4/torch/csrc/autograd/python_variable.cpp (L490) that were not included in the stubs so that MyPy will be aware of them. I made sure to only add those attributes that are mentioned somewhere in the documentation. If there are attributes mentioned in the documentation that are not meant to be part of the public API (or the opposite), please let me know. I've also made sure that attributes that can't be set are typed as read-only properties. If setting `dtype`, `shape`, `device` or `names` directly is not part of the public API, let me know and I'll make them properties as well.

I've also added `__len__`, `__iter__` and `__contains__`, which means MyPy will no longer complain about `len(t)`, `t1 in t2` and `for t1 in t2`.

Shameless plug: I have another typing-related PR here that needs review: https://github.com/pytorch/pytorch/pull/27445

Fixes https://github.com/pytorch/pytorch/issues/28457
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28578

Reviewed By: lerks

Differential Revision: D18113954

Pulled By: fmassa

fbshipit-source-id: 0b69a2966d22054d8d87392f19ec5aa3918773bc
2019-10-27 04:43:51 -07:00
440b192078 Type hints: Return Iterator instead of Iterable from __iter__ (#27445)
Summary:
`__iter__` methods are supposed to return iterators (https://docs.python.org/3/reference/datamodel.html#object.__iter__), but some of them are typed to return iterables, which is too general. This results in error messages such as `Iterable[Module[Any]]" has no attribute "__next__"` from Mypy. Technically this should also have caused a type error [here](8f7020bbdb/torch/nn/modules/container.py (L115)), but due to a bug in Mypy type checking isn't working correctly in untyped methods (this will be fixed in the next release though: https://github.com/python/mypy/pull/7530).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27445

Reviewed By: lerks

Differential Revision: D18113966

Pulled By: fmassa

fbshipit-source-id: c6261ac866f86df4328e6d2fdfca0625aa2d2492
2019-10-27 04:40:55 -07:00
f782500ee0 Abstract tracer::enter and tracer::exit into a function
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28473

Test Plan: Imported from OSS

Differential Revision: D18121007

Pulled By: jamesr66a

fbshipit-source-id: 4c4a4344ad9bcc4630b945d2a645a0b05928933c
2019-10-26 18:41:14 -07:00
7ff272c6da Back out D17980308-D17980313 (#28748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28748

Found D17980313 to break unit tests, backed out descendants too to avoid conflicts.

Test Plan:
Failed on master:

   buck test mode/dev-nosan language_technology/neural_mt/fb/pytorch_translate/test:test_onnx

Passes with this diff.

Differential Revision: D18157588

fbshipit-source-id: e2b56eac8c5bfccf3ce9a3a2993f6332ab1471e7
2019-10-26 13:08:49 -07:00
e96ea288a8 Automation scripts for perf testing (#28622)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28622

### Summary

As discussed in #28405 , this is the third PR.  The`bootstrap.sh` script is mainly for those who want to do perf on iOS but don't want to touch XCode or any iOS code.  But it does require you have valid iOS dev credentials installed on your machine. (You can easily acquire those stuff from any experienced iOS developers. Takes only 5 mins to setup )

 All you need to do is run

```shell
./bootstrap -t ${TEAM_ID} -p ${PROFILE}
```

The testing app will be automatically installed on your device. The log of the benchmark function will be displayed on the screen.

### Test plan

Don't break any CI jobs unless they're flaky.

Test Plan: Imported from OSS

Differential Revision: D18156178

Pulled By: xta0

fbshipit-source-id: cd7ba8d87bf26db885262888b9d6a5fd072309d1
2019-10-25 19:50:24 -07:00
dbf1996f79 Support MultiheadedAttention module (#28555)
Summary:
This makes MultiheadedAttention TorchScript compatible

It also breaks BC-compatibility for old models that do not have `_qkv_same_embed_dim` as an attribute.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28555

Pulled By: driazati

Differential Revision: D18124746

fbshipit-source-id: 5c5042fc6fc0e557db859a8ae05174cba5fce6a9
2019-10-25 17:28:53 -07:00
e886450863 report p50 time instead of avg (#28722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28722

as title

Test Plan:
```buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: sigmoid
iters: 200, 462.6029555220157
iters: 400, 441.04792759753764
iters: 800, 441.81562116136774
iters: 1600, 440.79964311094955
iters: 3200, 436.3108493271284
iters: 6400, 440.87966314691585
iters: 12800, 452.29464218209614
# Mode: Eager
# Name: sigmoid_M512_N512
# Input: M: 512, N: 512
Forward Execution Time (us) : 441.048

Reviewed By: hl475

Differential Revision: D18149525

fbshipit-source-id: 5fe70a35b790ee7ad3ff57c0cb0b1c29cb609b83
2019-10-25 17:22:27 -07:00
60d606094c Export Meshgrid (#26037)
Summary:
Exporting meshgrid op in opset 9 symbolics
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26037

Reviewed By: hl475

Differential Revision: D17452325

Pulled By: houseroad

fbshipit-source-id: d556b78e46594a232cdefd8c257cccd8b98221d6
2019-10-25 16:59:22 -07:00
0c48092b22 Resets rnn _flat_weights on _apply (#28562)
Summary:
Currently when _apply() is called on RNNBase (or one of its children, like LSTM), the _flat_weights attribute may or may not be updated. In particular, when using .to() and sending a module like LSTM to XLA, a third party device type, the tensors in _flat_weights will not be updated and will remain on CPU. This causes the LSTM forward to fail since the forward call receives a mix of XLA and CPU tensors.

This occurs because third party device types, like XLA, may not be a compatible shallow copy type to native tensors. When this is the case and _apply is called Module parameters are replaced, not updated. RNNBase would not sync _flat_tensors with its params in this case, and that caused the references in _flat_tensors to not reflect the module's current params.

This small change forces a resync of the _flat_tensors and the actual params on each _apply. This lets .to('xla') work for LSTMs, for example. A test will be added to PyTorch/XLA (which runs in our CI) to validate this behavior after the change appears in PyTorch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28562

Differential Revision: D18138863

Pulled By: mruberry

fbshipit-source-id: 284092cbe4ecff9dd334a9413c330cacdd5e04fd
2019-10-25 16:02:19 -07:00
0eeda56632 Add nn.ReLU6 to default mapping (#28516)
Summary:
https://discuss.pytorch.org/t/quantized-hard-sigmoid/59013
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28516

Differential Revision: D18128717

Pulled By: jerryzh168

fbshipit-source-id: 4d06d1b54cf9f84a610d79fbadde2c8ef38c33f8
2019-10-25 14:52:44 -07:00
2049e45999 Kill zero_dim_tensor_only codegen, it's not used anymore. (#28514)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28514

Verified that there are no generated code changes after applying the diff.

Test Plan: Imported from OSS

Differential Revision: D18086966

Pulled By: gchanan

fbshipit-source-id: 86c660ca78dfeeda2c888947d557cee2c4df08aa
2019-10-25 14:24:16 -07:00
24f0bca8e2 Remove zero_dim_tensors only from _th_masked_fill_. (#28513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28513

This just ensures that (since we only have a Scalar implementation), if you pass in a Tensor that's not zero-dim you get a nice error message.

Instead of doing this with codegen, we do this in code at the ATen level.

Test Plan: Imported from OSS

Differential Revision: D18086969

Pulled By: gchanan

fbshipit-source-id: 83fe2c16046e243d573e033d033aa3844b03930a
2019-10-25 14:24:12 -07:00
b0b852459e Remove zero_dim_tensors only from _th_index_fill_. (#28512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28512

This just ensures that (since we only have a Scalar implementation), if you pass in a Tensor that's not zero-dim you get a nice error message.

Instead, of doing this with codegen, we do this in code at the ATen level.

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D18086965

Pulled By: gchanan

fbshipit-source-id: f3853bbbb0cf5816803a00877a2e94aa89e32c3b
2019-10-25 14:24:08 -07:00
d37c2d7c8d Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test
Test Plan: revert-hammer

Differential Revision:
D17495965

Original commit changeset: 3e8dbe8943f5

fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f
2019-10-25 13:58:16 -07:00
110a931752 Change from HTTP to HTTPS
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28333

Differential Revision: D18143824

Pulled By: soumith

fbshipit-source-id: 613fd2219814addc850c3b9fe7ebfd8510a5e5c8
2019-10-25 13:13:30 -07:00
4996e3aca2 TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426)
Summary:
This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo.
Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426

Reviewed By: hl475

Differential Revision: D17495965

Pulled By: houseroad

fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693
2019-10-25 13:01:57 -07:00
b19bbde561 Migrate all the Caffe2 Centos builds to explicity use devltoolset (#28465)
Summary:
Continues https://github.com/pytorch/pytorch/pull/28431 with a new branch name that can trigger all the CI

https://github.com/pytorch/pytorch/issues/28059

pytorch/ossci-job-dsl@b2c823a
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28465

Differential Revision: D18104647

Pulled By: bddppq

fbshipit-source-id: 24decf44bdf73bd8a9c64d5fcaf34eec7a356f6e
2019-10-25 12:35:26 -07:00
0253e23d3f Remove unused USE_ROCM environment variable (#28641)
Summary:
All USE_ROCM logics have been moved to cmake now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28641

Differential Revision: D18139209

Pulled By: bddppq

fbshipit-source-id: bbf0931aa6a3be963b7e0d09b6f99f088c92c94d
2019-10-25 12:33:06 -07:00
1322daa506 Improve error handling for distributed autograd engine. (#27940)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27940

1) If we receive an error for outstanding rpcs, we enqueue an appropriate error
on the local autograd engine.
2) Add an `exit_on_error` mode for the local autograd engine, where the
computation stops if we see an error.
ghstack-source-id: 92603377

Test Plan: Added unit tests to test failures.

Differential Revision: D17916844

fbshipit-source-id: 199a7832f1033c36a9bbcc1e80d86576c04965d0
2019-10-25 12:07:27 -07:00
dc17a2ecc5 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28433
Differential Revision: D18138240

Pulled By: anjali411

fbshipit-source-id: 314e5902f103be1feb4cacde47c90204b3d353cc
2019-10-25 11:44:28 -07:00
3f119a5f52 Port of multilabel_margin_loss from TH to ATen (CPU) [2nd try] (#28504)
Summary:
This is a port of the CPU version of the TH MultiLabelMarginCriterion to ATen.

This reverts the revert of previous PR https://github.com/pytorch/pytorch/issues/28205 which caused a Windows Build to fail, please see comments in the original PR. I refactored the code so that the lambda-bodies of forward & backward of the AT_DISPATCH macro were extracted into separate functions. Similar code can be found at several cases in the ATen code base. Since I was not yet able to successfully compile PyTorch on Windows (due to other compile error) it would be great if somebody could launch a Windows test build for this PR to see if it now can be compiled successfully. Thanks in advance!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28504

Differential Revision: D18115598

Pulled By: ezyang

fbshipit-source-id: b62b6367966e0f6786794213b94eb0820092e572
2019-10-25 10:17:39 -07:00
68ab162099 Don't clobber pytorch image with libtorch build. (#28581)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28581

Fixes #28305

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D18124450

Pulled By: ezyang

fbshipit-source-id: 0d4bb99a6bdff9ddbfb4d25cc0f67cc261ed26ba
2019-10-25 10:13:57 -07:00
42423854f0 add test to ensure that dist autograd contexts are cleaned up incase of nested rpcs (#28485)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28485

This diff adds a test to ensure that when we have multiple nested RPCs
inside a dist autograd context, the context that is created as a result of a
nested rpc is cleaned up after the node creating the context exits the context
manager. For example, worker 0 might send an rpc to worker 1 that results in an
rpc to worker 2, so worker 2 will have 0's context, even though worker 0 never
directly talked to 2. This test ensures that the context on 2 would also be
cleaned up.
ghstack-source-id: 92611018

Test Plan: Ran the unit test.

Differential Revision: D18079212

fbshipit-source-id: d49f0cda0bf2908747546e5c8a967256c848c685
2019-10-25 10:10:02 -07:00
aac3998c27 msvc error C4805 fix (#28156)
Summary:
Fixes MSVC error message
```
15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(173): error C4805: '|=': unsafe mix of type 'bool' and type 'int' in operation
15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(173): error C4805: '|': unsafe mix of type 'bool' and type 'int' in operation
15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(186): error C4805: '|=': unsafe mix of type 'bool' and type 'int' in operation
15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(186): error C4805: '|': unsafe mix of type 'bool' and type 'int' in operation
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28156

Differential Revision: D18115151

Pulled By: ezyang

fbshipit-source-id: ed67a2b1330dfd4c12858ae9ca181163c0c72e51
2019-10-25 09:24:25 -07:00
e212543681 Improve float pickling speed. (#28553)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28553

This change improves double pickling in 1M double list
microbenchmark by roughly 40% (33msec -> 20msec).

The main benefit is avoiding per-byte bounds checks, so
we only bounds-check 2 times rather than 9 times.

Unpickle is already doing something reasonable, so no need to change.

fwiw, putting the swapping logic in a separate func/lambda provided
roughly 20% better results, consistently when microbenchmarking.
Looking at the objdump disassembly, gcc somehow generates better code
when it's separated.
ghstack-source-id: 92585739

Test Plan:
Benchmarks: buck build mode/opt experimental/jeremyl/c2:SerializationBench
               buck-out/opt/gen/experimental/jeremyl/c2/SerializationBench --bm_regex=.*Float.*
   Correctness: buck build mode/dev-nosan caffe2/test/...

Differential Revision: D18089481

fbshipit-source-id: a5f39e5d38c432893844241a7cce244831037e1f
2019-10-25 08:14:07 -07:00
9732c81da4 Cleanup testing of _like operators
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27891

Test Plan: Imported from OSS

Differential Revision: D17980308

Pulled By: VitalyFedyunin

fbshipit-source-id: 268b6a0875c8970885604498eb0991a8cd410b21
2019-10-25 07:29:28 -07:00
69b0e06a49 Add memory format support to randn_like operator (#27890)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27890

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980314

Pulled By: VitalyFedyunin

fbshipit-source-id: a2cf3b1b2df1a4956da971fd47ce69487b2c09e9
2019-10-25 07:29:24 -07:00
02917dd1f4 Add memory format support to randint_like operator (#27889)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27889

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980307

Pulled By: VitalyFedyunin

fbshipit-source-id: f1766c2bcb015ef870bfb92c16b4cd363b3cbc14
2019-10-25 07:29:20 -07:00
c258cd039a Add memory format support to zeros_like operator (#27562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27562

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980313

Pulled By: VitalyFedyunin

fbshipit-source-id: 9ca8453dc1a554ceea93c6949e01263cc576384b
2019-10-25 07:29:16 -07:00
04f5325583 Add memory format support to rand_like operator (#27561)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980316

Pulled By: VitalyFedyunin

fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0
2019-10-25 07:29:12 -07:00
2c339a24ec Add memory format support to ones_like operator (#27270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27270

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980312

Pulled By: VitalyFedyunin

fbshipit-source-id: 5da9530f6b239306dbb66d1dfeefe88237f13bbd
2019-10-25 07:29:08 -07:00
85d5aee863 Add memory format support to full_like operator (#27262)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27262

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980309

Pulled By: VitalyFedyunin

fbshipit-source-id: 1761a9939aa7c5ab23e927b897e25e225089a8e7
2019-10-25 07:29:04 -07:00
baf8488dbd Add memory format support to empty_like operator (#27244)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27244

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D17980310

Pulled By: VitalyFedyunin

fbshipit-source-id: 00a39b40daa4b8ee63c32e60d920222f8be2d6a1
2019-10-25 07:29:00 -07:00
bfbb3e0579 Kill _th_fill binding, which isn't used anymore. (#28511)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28511

We still keep the function in TH, since it's called from within TH.

Test Plan: Imported from OSS

Differential Revision: D18086967

Pulled By: gchanan

fbshipit-source-id: de026fbb076c8bf9d054ed4cf93eba9c7bcfb161
2019-10-25 07:12:32 -07:00
c6628b29a7 unfold: turn off device_guard (#28510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28510

It was off in TH, it can be off in ATen.

Test Plan: Imported from OSS

Differential Revision: D18086968

Pulled By: gchanan

fbshipit-source-id: 9be9a61da1dc82224f04a22008629db982f65230
2019-10-25 07:12:27 -07:00
7ab0a28b21 Port TH/THC implementation of unfold to ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28475

Test Plan: Imported from OSS

Differential Revision: D18074672

Pulled By: gchanan

fbshipit-source-id: 32e44330bf67728af47a6652b1fb70733a06ba20
2019-10-25 07:12:23 -07:00
2793d41a9c Fix scalar handling of unfold. (#28462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28462

Unfold is implemented in TH (as _th_unfold), and uses the standard scalar checks.  That means, even though torch.tensor(5).unfold(dim=0, size=1, step=1) should produce:
torch.tensor([5]), it actually produces torch.tensor(5) because the scalar_check infers it's a scalar.

We can fix this by just turning off the scalar_check.

Test Plan: Imported from OSS

Differential Revision: D18074671

Pulled By: gchanan

fbshipit-source-id: 5db09d614692830d66d6e6d8aba799ebe8144cf5
2019-10-25 07:12:18 -07:00
1a5d32d894 Updating submodules
Summary:
GitHub commits:

59613a5631
ff6fbc6607
4dd4f00512

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 5637007829adcab490bef01db9b1bd60fd856405
2019-10-24 22:28:09 -07:00
2181dd516e fix handling of function attributes. (#28569)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28569

Previously, the inclusion of function attributes would "poison" a
ConcreteModuleType, because we did not have a way of checking whether
they are actually the same function. This PR uses the Python function
object to perform that check. This improves our ability to reuse JIT
types between modules.

Also this PR fixes a bug where we weren't properly adding modules as
attributes when converting from ConcreteType -> JIT type (we were adding
them after the fact--another reason to switch from using `register_x` to
`set_x` during module construction, which is on my to-do list after
this).

Fixes https://github.com/pytorch/pytorch/issues/28559

Test Plan: Imported from OSS

Differential Revision: D18111331

Pulled By: suo

fbshipit-source-id: ec2cccf832d3ddd4cd4d28fe19cb265f1275325a
2019-10-24 22:23:37 -07:00
01aea1f268 Delete ATenDispatch (#28468)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28468

We don't need this anymore.
ghstack-source-id: 92595388

Test Plan: unit tests

Differential Revision: D18073339

fbshipit-source-id: d0ef1332c83e47117fe0a5eadc8faedb259cfba0
2019-10-24 22:15:00 -07:00
ed503596ce Remove c10->ATen registration forwarding (#28186)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28186

Since now all ops are on c10, we don't need to forward any registrations to globalATenDispatch anymore.
ghstack-source-id: 92586962

Test Plan: waitforsandcastle

Differential Revision: D17969011

fbshipit-source-id: 30e6cb072c934b3d24089055754ed3695f8ea693
2019-10-24 22:14:56 -07:00
d04973beda Use c10::variant-based enums for EmbeddingBag mode (#28330)
Summary:
This PR is BC-breaking in the following way:

Previous, we require the use of `std::string` to specify the mode for `EmbeddingBag`. After this PR, we use variant-based enums such as `torch::kSum` / `torch::kMean` / `torch::kMax` to specify the mode for `EmbeddingBag`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28330

Differential Revision: D18127116

Pulled By: yf225

fbshipit-source-id: 15cd86c764777f4d399587be92cda15b6ce8524b
2019-10-24 17:47:42 -07:00
60a1efe138 Eliminate some unnecessary tensor refcount bumps.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28355

Differential Revision: D18129161

fbshipit-source-id: 493cf0c1d754a375ec6c73dd57cd985639c849b7
2019-10-24 17:19:07 -07:00
4182c1183b Add custom op documentation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28557

Differential Revision: D18127241

Pulled By: smessmer

fbshipit-source-id: 684e1dde15520d08aeab603623614dedd1e0cbfc
2019-10-24 16:18:14 -07:00
1dfb8752a6 Define std::strtoll for older Android (#28603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28603

This symbol isn't availble in older Android configs, so import it
from the global namespace in the same file as the rest of our
Android string compatibility hacks.

Test Plan: Internal android build.

Reviewed By: jerryzh168

Differential Revision: D18099515

fbshipit-source-id: f8b0c80ea7344e05975a695afb359b339b6d9404
2019-10-24 15:52:09 -07:00
da6b8a905a Use c10::to_string in more places (#28605)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28605

This was added because std::to_string isn't available in libstc++
on Android.  Use it in more places to get the PyTorch Android
build working with libstdc++.

Test Plan: Internal android build.

Reviewed By: jerryzh168

Differential Revision: D18099520

fbshipit-source-id: 17a2b617c2d21deadd0fdac1db849823637981fc
2019-10-24 15:52:05 -07:00
df81cb22b8 Delete move constructor from TaggedStringStream (#28604)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28604

This isn't used anywhere, and it doesn't work with older libstdc++
because std::ostringstream is not copyable or movable.

Test Plan: Internal android build.

Reviewed By: jamesr66a

Differential Revision: D18099511

fbshipit-source-id: 1ffb49303aa5d7890ca7f057b21886f88c04ce20
2019-10-24 15:52:01 -07:00
52e0a94661 Fix spelling in some comments
Test Plan: CI

Reviewed By: xcheng16, linbinyu

Differential Revision: D18099518

fbshipit-source-id: 3fbf654dc30261eb27b923db0974d8088a3a5783
2019-10-24 15:51:56 -07:00
261a13a84b Enable dist autograd tests (#28606)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28606

Without passing setup_model_parallel=True to dist_init, it the
decorator actually takes function object as the value for the
flag.

Test Plan: Imported from OSS

Differential Revision: D18120507

Pulled By: mrshenli

fbshipit-source-id: afbaa381647e8f284e28fa9dbdd2a7c411073b3f
2019-10-24 15:30:27 -07:00
70e4548fd7 Compute correct strides after type promotion (#28253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28253

Instead of trying to fix strides after changing dtypes, wait until after
promotion to set them.

fixes: https://github.com/pytorch/pytorch/issues/27824
fixes: https://github.com/pytorch/pytorch/issues/28502

Test Plan: Imported from OSS

Differential Revision: D18124950

Pulled By: nairbv

fbshipit-source-id: e4db90b2a6bb0f5d49cb388e0cd1971303c6badd
2019-10-24 15:18:01 -07:00
e885ce6130 C++ parity, grid_sample functional (#28354)
Summary:
https://github.com/pytorch/pytorch/issues/25883
I put grid_sample in vision.h with affine grid.

I have a question in string argument(interpolation mode, padding mode)
I reuse torch::native::detail::GridSamplerInterpolation in GridSampler.h instead of using string.
It follows the way that uses reduction enum in loss functions.
I am not sure this is right.

yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28354

Differential Revision: D18109333

Pulled By: yf225

fbshipit-source-id: 1bf972b671b107464f73b937bbe0de76fb259fbf
2019-10-24 15:14:37 -07:00
92b39434a2 C++ nn::ConstantPad{1,2,3}d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28541

Test Plan: Imported from OSS

Differential Revision: D18115607

Pulled By: yf225

fbshipit-source-id: 736df791ddc3cd30ad9af89eacfb4a0c6b53f2cd
2019-10-24 15:10:27 -07:00
5cf644157c Speed up fill for half and bfloat16 on CPU. (#28397)
Summary:
This is done by replacing Vec<uint16_t> with Vec<int16_t>, which has all
sorts of AVX optimization available.

Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.bfloat16', 'torch.half'):
    for n, t in [(40_000, 600000),
                (400_000, 60000)]:
        print(f'a.fill_(10) for {t} times, a=torch.empty({n}, dtype={dtype})')
        print(timeit.timeit(f'a.fill_(10)', setup=f'import torch; a=torch.empty({n}, dtype={dtype})', number=t))
```

Before:

```
a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16)
11.064065577999827
a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16)
10.618151295000189
a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half)
10.989039544000207
a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half)
10.602233665999847
```

After:

```
a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16)
1.530125006000162
a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16)
1.4807136570002513
a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half)
1.3946152990001792
a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half)
1.457788402999995
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28397

Differential Revision: D18125171

Pulled By: ezyang

fbshipit-source-id: bfb2da13f10bc582e9848073e428af9e36656b13
2019-10-24 15:03:11 -07:00
7f9941c4ea C++ nn::ZeroPad2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28540

Test Plan: Imported from OSS

Differential Revision: D18115610

Pulled By: yf225

fbshipit-source-id: ced7c0917f4712838e753cd2e9fc4fa79fd5d310
2019-10-24 14:23:57 -07:00
d762ad09df Enable Interpolate Tests for ONNX Opset 11 (#28560)
Summary:
- Enable tests for Interpolate in opset 11 for nearest and linear2d modes (linear1d/3d not implemented yet)
- Fix bugs found after enabling tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28560

Reviewed By: hl475

Differential Revision: D18110680

Pulled By: houseroad

fbshipit-source-id: 7f8811e40dc5cedaba6389460dcca52daa048f5f
2019-10-24 14:21:13 -07:00
a783563738 Skip ProcessGroupNCCLTest if CUDA is not available (#28393)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28393

We should skip this test if CUDA is not available and alert the user.
Previously, if this test was ran on cpu it would fail with:
```
terminate called after throwing an instance of 'std::runtime_error'
  what():  cuda runtime error (3) : This binary is linked with CUDA lazy stubs and underlying .so files were not loaded. CUDA functionality is disabled. Set env variable CUDA_LAZY_DEBUG to get messages during startup
```

Test Plan:
Build on CPU and verify that that are no errors when running, we should get the message:
`CUDA not available, skipping test`. Previously, we would get an error:
```
terminate called after throwing an instance of 'std::runtime_error'
  what():  cuda runtime error (3) : This binary is linked with CUDA lazy stubs and underlying .so files were not loaded. CUDA functionality is disabled. Set env variable CUDA_LAZY_DEBUG to get messages during startup. at caffe2/aten/src/THC/THCGeneral.cpp:54
```

Differential Revision: D18054369

fbshipit-source-id: f1d06af88b780a24ca3373a7a133047a2cfe366e
2019-10-24 14:02:09 -07:00
46f96d1538 C++ API parity: at::Tensor::requires_grad_
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26332

Test Plan: Imported from OSS

Differential Revision: D17427575

Pulled By: pbelevich

fbshipit-source-id: 5500169a4fa0ef9cc2a7272e13b6e2d89df09260
2019-10-24 13:24:18 -07:00
78039627ae Minor followup on stringstream cleanups (#28300)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28300

  - Remove trivial stringstream from ScriptModuleSerializer::writeCode;
    I didn't include this in earlier changes to avoid a merge conflict
    with an earlier change.
  - Remove underscore from QualifiedName var ref; no difference in
    current use, but more correct.
ghstack-source-id: 92206909

Test Plan:
Benchmark: buck build mode/opt experimental/jeremyl/c2:
   Correctness: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18012511

fbshipit-source-id: 7db057d77741cf69c4f2fed560771c3201da19ed
2019-10-24 13:05:46 -07:00
303527d733 C++ nn::ReplicationPad{1,2,3}d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28539

Test Plan: Imported from OSS

Differential Revision: D18115609

Pulled By: yf225

fbshipit-source-id: 15f4ab6a114279bb06bf62f1265b62aa12f8700f
2019-10-24 12:49:41 -07:00
78375c02b8 C++ nn::ReflectionPad1d and nn::ReflectionPad2d
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28538

Test Plan: Imported from OSS

Differential Revision: D18115608

Pulled By: yf225

fbshipit-source-id: 3a48d8c11721f013076db2965f5f75b71662c78e
2019-10-24 12:02:51 -07:00
Jie
e263dd3853 (#24396)
Summary:
Initial kernel support added for optimized NHWC tensor.

TODO: currently backwards kernel spits out tensor with NHWC stride.
Unfortunately autograd restores grad to contiguous (in either copy or add). This
makes real perf tuning annoying to do. (since I cannot easily measure end-to-end
time in my python script)

My current kernel is blazing fast comparing to the original NCHW kernel in fp16,
since I avoided atomicAdd. I'll finish perf tuning after we merged some future
PR expanding NHWC support in the core.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24396

Differential Revision: D18115941

Pulled By: VitalyFedyunin

fbshipit-source-id: 57b4922b7bf308430ffe1406681f68629baf8834
2019-10-24 11:57:15 -07:00
2020cc0cd1 Fix compute_non_overlapping_and_dense() (#28551)
Summary:
There are some cases when compute_non_overlapping_and_dense() doesn't work properly:
Example:
```
Tensor t = at::tensor(1).expand({1, 3, 2});
EXPECT_FALSE(t.is_contiguous());
EXPECT_FALSE(t.is_non_overlapping_and_dense()); //FAIL!!!
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28551

Differential Revision: D18115570

Pulled By: ifedan

fbshipit-source-id: 35b1a9473a28037d41f7177a8de23ffefa7faa13
2019-10-24 11:53:52 -07:00
8de8cab247 Migrate remaining ops to the c10 dispatcher (#27978)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27978

-
ghstack-source-id: 92469187

Test Plan: waitforsandcastle

Differential Revision: D17929697

fbshipit-source-id: 01f4f67cd676c719d9d1fb13bdd43aca3dfa1c8a
2019-10-24 11:40:57 -07:00
d8c66c1576 autograd/profiler: make python record_function use JIT methods
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28264

Test Plan: buck test caffe2/test:autograd caffe2/test/cpp/jit:jit

Reviewed By: bddppq

Differential Revision: D17997612

fbshipit-source-id: 8a29ae50c28ce905f63c732fe0aa49edfc9d99e3
2019-10-24 10:28:32 -07:00
f8b758b141 CPU-Strided-Complex Support for reduce ops and linpack ops (#27653)
Summary:
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex)

Changes so far:

- [x]  Renamed references to variable "I" that may be confused for "I" defined in complex.h.  I did this to avoid crazy CI failures messages as complex.h is included by more source files.
     - aten/src/ATen/native/cpu/Loops.h (Renamed I to INDEX)
     - aten/src/ATen/native/cuda/Loops.cuh (Renamed I to INDEX)
     - aten/src/ATen/core/ivalue_inl.h (Renamed I to INDEX)
     - c10/util/Array.h (Renamed I to INDEX)
     - c10/util/C++17.h (Renamed I to INDEX)
    - c10/util/Metaprogramming.h (Renamed I to INDEX)
    - c10/util/SmallVector.h (custom renaming)
- [x]  Added complex support of Linear Algebra Ops.
     - SVD needed to be modified to support mixed data types
     - Example U(std::complex<double)), S(double), V(std::complex<double>)
     - See before and after benchmark below (No observable change in performance).
- [x]  Added complex support of Reduce Ops.
     - var/std computations could have been faster if it was possible to interpret std::complex<double> Tensor as a double Tensor.
- [x]  Added complex derivative support for autograd functionality.
     - derivatives are the same as defined by numpy autograd library for real(), imag(), conj(), angle(). These functions only affect complex numbers.
     - derivative of abs() has not been modified to not interfere with existing code.
     - Autograd defines abs() for complex numbers and fabs() for real numbers. I will look into this further down the road.

 ----------------------------------------
 PyTorch/Caffe2 Operator Micro-benchmarks Before Changes
----------------------------------------
Tag : short

Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162339.425
Forward Execution Time (us) : 162517.479
Forward Execution Time (us) : 162847.775

----------------------------------------
PyTorch/Caffe2 Operator Micro-benchmarks After Changes
----------------------------------------
Tag : short

Benchmarking PyTorch: svd
Mode: Eager
Name: svd_M512_N512
Input: M: 512, N: 512
Forward Execution Time (us) : 162032.117
Forward Execution Time (us) : 161943.484
Forward Execution Time (us) : 162513.786
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27653

Differential Revision: D17907886

Pulled By: ezyang

fbshipit-source-id: a88b6d0427591ec1fba09e97c880f535c5d0e513
2019-10-24 09:31:06 -07:00
136bb07a93 torch.histc added a finite range check to resolve segfaults if tensor has inf. also added checks for nan values, min>max (#27712)
Summary:
https://github.com/pytorch/pytorch/issues/27464
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27712

Differential Revision: D18064544

Pulled By: anjali411

fbshipit-source-id: c9c6d8eb4d55f2b5320409ba238bf44b0be8902e
2019-10-24 09:28:45 -07:00
ae05e48fe8 Kill TH(C)Tensor_squeeze which isn't used anymore.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28435

Test Plan: Imported from OSS

Differential Revision: D18066779

Pulled By: gchanan

fbshipit-source-id: b58180151a92999386085618ff00b56b993b41bb
2019-10-24 09:15:06 -07:00
4f0a3504e1 Port is_set_to from TH/THC to ATen.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28425

Test Plan: Imported from OSS

Differential Revision: D18063328

Pulled By: gchanan

fbshipit-source-id: 86af01a630d88c30947b8c85d1fac86dd7b40585
2019-10-24 09:15:03 -07:00
139fec2d14 remove type information from docstrings of quantization functions (#28556)
Summary:
Following from https://github.com/pytorch/pytorch/issues/28479 let's remove the type information from the docstrings of these functions as well, making them valid python signatures matching the other signatures in the docstrings for the torch API.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28556

Differential Revision: D18115641

Pulled By: ezyang

fbshipit-source-id: e4c3d56981b16f5acabe8be7bfbe6ae506972d7f
2019-10-24 08:13:48 -07:00
dd277e9086 C++ API parity: Linear
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27382

Test Plan: Imported from OSS

Differential Revision: D17766735

Pulled By: pbelevich

fbshipit-source-id: c7a66daeb17550eb9a5d26944427723d4ebdc6c8
2019-10-24 07:11:51 -07:00
59402f51cf Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28226

# Goal

Rendezvous step should be the first step not only for `init_process_group` but also for `init_model_parallel`.

The road block is that there is special step in `init_process_group` where arguments `rank`, `world_size` passed to `init_process_group(..)` are appended to `init_method` url string.

We need to make this argument appending step common and re-usable for both `init_process_group` and `init_model_parallel`.

# Solution

- Put argument appending inside of `rendezvous` function.
- Remove manual `init_method` url construction. Delegate the responsibility to the `rendezvous` function.
- Use the `rendezvous` function for any `RpcAgent`.

Test Plan:
```
buck test mode/dev-nosan caffe2/test:c10d
```

```
buck test mode/dev-nosan caffe2/test:rpc_fork -- test_invalid_names

buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_worker_id
```

```
buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc -- test_sync_rpc
```

```
buck test mode/dev-nosan caffe2/torch/fb/rendezvous:zeus_test
```

```
buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling -- test_single_trainer_multiple_pss
```

Differential Revision: D5524494

fbshipit-source-id: 50be58ec3c928621b0874b044ef4a1640534d8ef
2019-10-23 21:51:08 -07:00
e31adeb4f3 Make RRef::LocalValue return Future (#28025)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28025

Add a PyFuture type which is wrapper of either an OwnerRRef or a
jit::Future. The difference between PyFuture and jit::Future is that
PyFuture can return an custom py::object type.

Test Plan: Imported from OSS

Differential Revision: D17936746

Pulled By: mrshenli

fbshipit-source-id: a7451af3993d98aeab462ffd5318fc6d28f915c8
2019-10-23 17:07:16 -07:00
58873776ff Make RRef::toHere() return a jit::Future (#27943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27943

This is step 1 to make PyRRef::toHere() non-blocking on caller.

Test Plan: Imported from OSS

Differential Revision: D17936747

Pulled By: mrshenli

fbshipit-source-id: 7cf60e5804e72bdc28f0135fed4d7fdce05ea38a
2019-10-23 17:07:11 -07:00
9c345473d8 Updating submodules
Summary:
GitHub commits:

8ac79dbfad
f97c8b2a91
686dbde63b
6a32e3b562
9e79c99421

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: e50b9ecdf91e98932bd82faa210c012cc8b9d48f
2019-10-23 16:52:47 -07:00
61d40b80d3 static initialization order with mutex (#28243)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28243

When building static libs version of pytorch 1.3 on windows (msvc v141), program crashes with bad memory reference because `fusion_backends_lock_` has not been initialized yet.

Test Plan:
sandcastle green,
tested locally on MSVC static builds that this fixes initialization.

Differential Revision: D17985919

fbshipit-source-id: ebd6178dedf5147d01c2c1754a0942a1bbbc7e34
2019-10-23 16:30:19 -07:00
8008322336 workaround for raw string bug in VS2019 (#28349)
Summary:
reported the problem to microsoft [Developer Community](https://developercommunity.visualstudio.com/content/problem/782476/e-preprocess-to-stdout-cant-handle-raw-string-corr.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28349

Differential Revision: D18074620

Pulled By: mingbowan

fbshipit-source-id: 89c2583a0301b1e3055b1f8cd9d493fdb2567b42
2019-10-23 16:30:15 -07:00
896b5d9113 Scripts for setting up benchmark projects (#28469)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28469

### Summary

As described [here](https://github.com/pytorch/pytorch/pull/28405), This PR is the second one that contains scripts for setting up the benchmark projects.

### Test Plan

Don't break CI jobs unless they are flaky.

Test Plan: Imported from OSS

Differential Revision: D18097248

Pulled By: xta0

fbshipit-source-id: 6f9d1275a07aecae21afd81d5e90a89a75d0270f
2019-10-23 16:16:57 -07:00
d83389d327 Ignore F401 in all __init__.py without putting noqa (#25823)
Summary:
By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line.

http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823

Differential Revision: D17252182

Pulled By: soumith

fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b
2019-10-23 15:28:13 -07:00
76d262d4b7 export group_norm (#27071)
Summary:
Updated group_norm symbolic
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27071

Reviewed By: hl475

Differential Revision: D17792249

Pulled By: houseroad

fbshipit-source-id: 08be6071952ca2c256d2c6a0a6bbc19a8442f1fe
2019-10-23 15:14:31 -07:00
d081de67cf fix the document of kaiming initialization (#25638)
Summary:
Based on https://github.com/pytorch/pytorch/issues/25549, I modified the comments for kaiming initialization in torch.nn.init.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25638

Differential Revision: D17915392

Pulled By: vincentqb

fbshipit-source-id: 40f60c65d14790696ec03d7d91c764875efd6cf1
2019-10-23 14:19:38 -07:00
cbddc77ac5 fix docs for lr (#28026)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28026

Documentation for learning rate does not render well. #27730.

Test Plan: Imported from OSS

Differential Revision: D17953395

Pulled By: vincentqb

fbshipit-source-id: 9e84df3e7de43f11399a67bc99c76ef241b1120f
2019-10-23 13:49:34 -07:00
bee4aca259 is_set_to: unify TH/THC implmentation and genericize test_is_set_to. (#28422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28422

The TH implementation had two differences:
1) It explicitly checked for null storages; this isn't supported anymore so can be removed.
2) It collapsed all empty tensors to the same shape for the purpose of checking.  This was introduced to keep BC when we introduced N-dimensional empty tensors,
but since it's been quite a long time since we've had N-dimensional empty tensors and the CUDA implementation didn't support this, we should get rid of it.

Test Plan: Imported from OSS

Differential Revision: D18061916

Pulled By: gchanan

fbshipit-source-id: 1a54cf9ea4fcb35b358a9ab57f84eff059ff1e7b
2019-10-23 13:46:52 -07:00
09ad464d68 Change activation modules in C++ from using Tensor& to Tensor (#28501)
Summary:
Sequential does not like modules added to it to take Tensor&
(const Tensor& and Tensor are both OK).
Functional and others use Tensor when they want to potentially
change things in-place.
This changes ReLU and friends to also do that.

Unfortunately, this seems to be BC breaking on the ABI level.
On the other hand, use of the module ReLU seems rare enough outside
Sequential (in particular in C++ models, the standard seems to be
to use torch::relu instead).

is the BC breaking OK here? (yf225 or anyone else)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28501

Differential Revision: D18089978

Pulled By: yf225

fbshipit-source-id: ac9aba6dc2081117dece57cd8a15bafe14ec8f51
2019-10-23 13:42:22 -07:00
1c53a74e26 Fixed behavior of div_factor parameter in optim.lr_scheduler.OneCycleLR (#28217)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/28216
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28217

Differential Revision: D18070759

Pulled By: vincentqb

fbshipit-source-id: ed032190c0e3eab834fc9a8f408b75b56f0f35ec
2019-10-23 13:39:05 -07:00
76c70559c9 Updating submodules
Summary:
GitHub commits:

3d32597779
2c45426e8b
db7733cb24
9df8a63117
61a3c68470
596b1a7c1c
06751015b3

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 4a48ce9ed3124fc5a37e02c1eb3081a358bb1fb6
2019-10-23 12:42:01 -07:00
657430e1f0 Return 0-numel empty tensor from symeig when eigenvectors=False (#28338)
Summary:
Changelog:
- Changes the behavior of returning a zero tensor when eigenvectors=False, matching behavior of torch.eig
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28338

Test Plan: - test_symeig has been modified appropriately for this change

Differential Revision: D18085280

Pulled By: ezyang

fbshipit-source-id: 43129a96dd01743997157974100e5a7270742b46
2019-10-23 11:44:57 -07:00
e4f40bf3b2 Add multiplicative lr. (#27254)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254

`MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

Test Plan: Imported from OSS

Differential Revision: D17728088

Pulled By: vincentqb

fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9
2019-10-23 11:38:45 -07:00
d1d2358d31 Correct math formatting for lr scheduler (#28467)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28467

Correcting formatting error from #27874. Also making size of parenthesis more natural.

![Screen Shot 2019-10-22 at 5 38 22 PM](https://user-images.githubusercontent.com/3047868/67336492-76ddfa00-f4f3-11e9-9d79-70a0aa4f6d29.png)

Closes #27874

Test Plan: Imported from OSS

Differential Revision: D18076085

Pulled By: vincentqb

fbshipit-source-id: cb7c52b347d6d11ea4a2d3c94d00a42f849c0a83
2019-10-23 11:11:25 -07:00
9d767db493 remove extraneous type information from torch.matrix_rank documentation (#28479)
Summary:
The types don't appear in the docstrings for other functions in the `torch` namespace so I think this was included here because of a copy/paste error.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28479

Differential Revision: D18086150

Pulled By: ezyang

fbshipit-source-id: 2481bccba6df36b12779a330f8c43d4aea68495f
2019-10-23 11:08:30 -07:00
e80e42cb2c Updating submodules
Summary:
GitHub commits:

c535a02822
cf7a2fb510

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: f74d6f4de7a2a4ffe6d9f3689a7e08a429e79ae7
2019-10-23 09:42:44 -07:00
2f16284231 change empty range tolorrance logging
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28489

Differential Revision: D18067322

fbshipit-source-id: 2096d1cce820f4ebe28db0045a2ddacc022e07da
2019-10-23 09:39:39 -07:00
e9336b04fc Update Dockerfile
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28358

Differential Revision: D18041193

Pulled By: ngimel

fbshipit-source-id: d96ce2a01af9c06bd831ddb85fe8807fabacb8a3
2019-10-23 09:29:55 -07:00
e28e38e851 Update C++ torch::nn parity table for LayerNorm (#28484)
Summary:
Since now we have merged https://github.com/pytorch/pytorch/pull/28032 (thanks anjali411!)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28484

Differential Revision: D18085844

Pulled By: yf225

fbshipit-source-id: 4be972687addea8f57f48dfe9707837196593062
2019-10-23 09:25:41 -07:00
e280f93e31 Prepack folding for conv2d (#27119)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27119

att

Test Plan:
python test/test_jit.py 'TestJit.test_fold_prepack'

Imported from OSS

Differential Revision: D17717636

fbshipit-source-id: 97e9f8d927f7eacedf09f47b8ae1bf8216b8cad4
2019-10-23 09:03:14 -07:00
be3808d3b1 Migrate smooth_l1_loss from the TH to Aten (CPU & CUDA) (#27962)
Summary:
This is a port of the TH `SmoothL1Criterion` to ATen using TensorIterator. The forward implementation has been placed in BinaryOpsKernel.cpp/.cu while the backward version was added to PointwiseOpsKernel.cpp/.cu. CPU performance has improved for both forward & backward path. With CUDA the performance of the forward pass has slightly degraded compared to the TH
implementation (see benchmark results).

### Questions:
1. Is the storage location of the implementation ok (I followed https://github.com/pytorch/pytorch/pull/26529) or should we create a separate .cpp/.h file pair for each operator implementation (e.g. to keep things together)?
2. The GPU forward-pass now seems to take consistently longer than the old version. Any ideas what we could try to bring it on par with the old impl?

## WITH patch benchmark result:
```
CPU warmup 1000 took 0.00018124299822375178
CPU warmup 10000 took 0.00021713999740313739
CPU warmup 100000 took 0.0016273759974865243
CPU warmup TOTAL time 0.0020758909959113225
CPU forward 1000 took 6.229899736354128e-05
CPU forward 10000 took 0.00013340599980438128
CPU forward 100000 took 0.0008730469999136403
CPU forward 1000000 took 0.011010036003426649
CPU forward 10000000 took 0.11133221499767387
CPU forward 100000000 took 1.0425375220002024
CPU forward TOTAL time 1.1660894790038583
CPU for- & backward 1000 took 0.0002662249971763231
CPU for- & backward 10000 took 0.00023712700203759596
CPU for- & backward 100000 took 0.002531945996452123
CPU for- & backward 1000000 took 0.010394354998425115
CPU for- & backward 10000000 took 0.23814761800167616
CPU for- & backward 100000000 took 1.2651235049997922
CPU for- & backward TOTAL time 1.516897434994462

GPU warmup 1000 took 0.00020941899856552482
GPU warmup 10000 took 8.128300396492705e-05
GPU warmup 100000 took 8.551499922759831e-05
GPU warmup TOTAL time 0.0004199420000077225
GPU forward 1000 took 7.060499774524942e-05
GPU forward 10000 took 7.116600318113342e-05
GPU forward 100000 took 9.825800225371495e-05
GPU forward 1000000 took 0.000499356996442657
GPU forward 10000000 took 0.002032470001722686
GPU forward 100000000 took 0.018638986002770253
GPU forward TOTAL time 0.02148268099699635
GPU for- & backward 1000 took 0.00035967300209449604
GPU for- & backward 10000 took 0.00032710300001781434
GPU for- & backward 100000 took 0.0003689270015456714
GPU for- & backward 1000000 took 0.0007732619997113943
GPU for- & backward 10000000 took 0.02127284000016516
GPU for- & backward 100000000 took 0.2022330649997457
GPU for- & backward TOTAL time 0.2254496300010942
```

## WITHOUT patch benchmark result:
```
CPU warmup 1000 took 0.00011545199959073216
CPU warmup 10000 took 0.00016227000014623627
CPU warmup 100000 took 0.0013456509987008758
CPU warmup TOTAL time 0.001648657998885028
CPU forward 1000 took 2.627600042615086e-05
CPU forward 10000 took 0.00015939700097078457
CPU forward 100000 took 0.001139313004387077
CPU forward 1000000 took 0.013769682998827193
CPU forward 10000000 took 0.13163026500114938
CPU forward 100000000 took 1.321879123999679
CPU forward TOTAL time 1.4687001089987461
CPU for- & backward 1000 took 0.0002569290008977987
CPU for- & backward 10000 took 0.00033315900509478524
CPU for- & backward 100000 took 0.0016096779945655726
CPU for- & backward 1000000 took 0.014474845003860537
CPU for- & backward 10000000 took 0.1564881520025665
CPU for- & backward 100000000 took 1.5787935900007142
CPU for- & backward TOTAL time 1.7521004869995522

GPU warmup 1000 took 0.00025611399905756116
GPU warmup 10000 took 0.00014123699656920508
GPU warmup 100000 took 0.00012580600014189258
GPU warmup TOTAL time 0.0005591579974861816
GPU forward 1000 took 0.00031183200189843774
GPU forward 10000 took 0.00011483799607958645
GPU forward 100000 took 0.00010807999933604151
GPU forward 1000000 took 0.0007842139966669492
GPU forward 10000000 took 0.0017624700049054809
GPU forward 100000000 took 0.01519905700115487
GPU forward TOTAL time 0.018341148999752477
GPU for- & backward 1000 took 0.00047569099842803553
GPU for- & backward 10000 took 0.0003539700046530925
GPU for- & backward 100000 took 0.000808880002296064
GPU for- & backward 1000000 took 0.001639469999645371
GPU for- & backward 10000000 took 0.021154599002329633
GPU for- & backward 100000000 took 0.19268552300491137
GPU for- & backward TOTAL time 0.2172460189976846
```

### Code used for perforrmance testing
```
import torch
import torch.nn.functional as F
import torch.nn as nn

from timeit import default_timer

torch.manual_seed(0)
cpu = torch.device('cpu')
gpu = torch.device('cuda')

loss_fn = F.smooth_l1_loss

def run_benchmark(name, depth, require_grad, device, fn):
    total_start = default_timer()
    y = None
    a = None
    for i in range(3, 3 + depth):
        start = default_timer()
        n = 10 ** i
        a = torch.rand(n, requires_grad=require_grad, device=device)
        b = torch.rand(n, device=device)
        y = fn(a, b)
        y.cpu() # get result (potentially wait for gpu)
        if a.grad is not None:
            a.grad.cpu()
        end = default_timer()
        print('{} {} took {}'.format(name, n, end-start))
    total_end = default_timer()
    print('{} TOTAL time {}'.format(name, total_end-total_start))

def fwd_only(a, b):
    out = loss_fn(a, b)
    return out

def fwd_bck(a, b):
    out = loss_fn(a, b)
    out.backward()
    return out

def sanity_check(name, device):
    print('{} Operator sanity check:'.format(name))
    a = torch.randn(16, requires_grad=True, device=device)
    b = torch.randn(16, device=device) * 2
    out = loss_fn(a, b)
    print('out', out)
    out.backward()
    print(a.grad)
    print('double backward')
    loss = loss_fn(a, b)
    loss2 = torch.autograd.grad(loss, a, create_graph=True)
    z = loss2[0].sum()
    print(z)
    z.backward()
    print('ok')
    print()

print('PyTorch version:', torch.__version__)
sanity_check('CPU', cpu)
if torch.cuda.is_available():
    sanity_check('GPU', gpu)
print()

run_benchmark('CPU warmup', 3, False, cpu, fwd_only)
run_benchmark('CPU forward', 6, False, cpu, fwd_only)
run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck)
print()

if torch.cuda.is_available():
    run_benchmark('GPU warmup', 3, False, gpu, fwd_only)
    run_benchmark('GPU forward', 6, False, gpu, fwd_only)
    run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27962

Differential Revision: D18061942

Pulled By: ezyang

fbshipit-source-id: 0d1fc528b59d47d4773b03240c3368db021cb9db
2019-10-23 07:56:57 -07:00
ee920b92c4 Move complex extension test to c10 (#28208)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28208

Backend extensions should call torch::RegisterOperators, not globalATenDispatch().
If the op is still on globalATenDispatch, then torch::RegisterOperators will do the right thing and forward it to globalATenDispatch.
ghstack-source-id: 92436988

Test Plan: waitforsandcastle

Differential Revision: D17975369

fbshipit-source-id: 0d4bd5e4e5b86e6dcfba527a7d11c25508896ac1
2019-10-23 01:33:47 -07:00
0f556b62e0 Fix codegen for out operators (#28184)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28184

Out overloads of operators have a different `name` and `operator_name`. Fix the codegen for them.
ghstack-source-id: 92436987

Test Plan: A diff stacked on top enables `use_c10_dispatcher` for out operators. Doesn't work without but works with this diff.

Differential Revision: D17969013

fbshipit-source-id: 7b1118c9a4a36997e7375fac8d870ff08e7ff453
2019-10-23 01:33:43 -07:00
b47d658d04 Allow migrating factory methods to c10 (#28183)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28183

-
ghstack-source-id: 92436986

Test Plan: waitforsandcastle

Differential Revision: D17969015

fbshipit-source-id: 0e2eac09c9622fc6c6e90bb80d2a250f37bbd148
2019-10-23 01:33:39 -07:00
005d6ea495 Fix overload names (#28182)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28182

They haven't been unique. Fixing it...
ghstack-source-id: 92436985

Test Plan: waitforsandcastle

Differential Revision: D17969010

fbshipit-source-id: 1aacbfb3c18a75ca6743b03cc2eea5fc4d3685c9
2019-10-23 01:33:35 -07:00
a94bf1d326 Add unsupported types to schema type parser (#28181)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28181

These types are needed to parse the schemas from native_functions.yaml.

Note: This doesn't actually add the functionality to JIT, it only makes the parser pass.
ghstack-source-id: 92436989

Test Plan: waitforsandcastle

Differential Revision: D17969014

fbshipit-source-id: 41ebe256baec81ed8fb165e7b7cffa5160d285c3
2019-10-23 01:33:31 -07:00
b05d0fa671 Updating submodules
Summary:
GitHub commits:

4341008007
d3174ece89
88db55e055
d32d4344ec
00dfc2c82e

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 5d1b8300a428c65bc35f222ae19c656585ba897b
2019-10-22 23:32:58 -07:00
4beaf1cf1c add typing runtime dependency for py2
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28442

Test Plan: Imported from OSS

Differential Revision: D18075498

fbshipit-source-id: 075f63b1ed2c83d9a64eb81224e0d67c6a63b22c
2019-10-22 22:02:08 -07:00
0d4009d777 Fix avx for c++14 (#28207)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28207

Enabling c++14 causes these lines to fail with the error "error: the last argument must be an 8-bit immediate".
So let's make them an 8 bit immediate before we enable C++14.
ghstack-source-id: 92419812

Test Plan: Enabling C++14 before this PR shows the error, after this PR does not.

Differential Revision: D17975236

fbshipit-source-id: aa53cdb2d38d89ede2212ed7374fedeb5896f254
2019-10-22 21:49:07 -07:00
0c4878d550 Update index.rst 2019-10-22 21:43:58 -07:00
d2eb08d17b Fix tracing slice/select with dynamic inputs (#26549)
Summary:
Fix Slice/Select trace arguments. This PR stashes arguments to functions in order to avoid tracing them as constants.
This PR depends on a fix for select op in PR:
https://github.com/pytorch/pytorch/pull/25273
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26549

Reviewed By: hl475

Differential Revision: D17623851

Pulled By: houseroad

fbshipit-source-id: ae314004266688d2c25c5bada2dcedbfc4f39c5b
2019-10-22 17:09:40 -07:00
9705d60a2f get rid of deprecated thread.isAlive() to use py2.6 modern form is_alive()
Summary:
Codemod to remove all thread.isAlive() since it throws a warning that is breaking some tests that monitor the output of their cli's

is_alive() was added in python 2.6 this is super safe

This is a codemod I don't care if the code supports python3, just that its python code

Test Plan: unittests

Reviewed By: cooperlees

Differential Revision: D18069520

fbshipit-source-id: 4ca4dcb541c0b0debeb194aba5d060152ad0ef0e
2019-10-22 15:37:31 -07:00
177c95e9bc Migrate return type void to () for native functions. (#28290)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28290

ghstack-source-id: 92368250

Test Plan:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28290
ghstack-source-id: 92368250

Differential Revision: D17565528

fbshipit-source-id: f4870bb9ee4f4e7c48df4d68508b512d25ed277c
2019-10-22 15:23:20 -07:00
f94b6cef43 Use FunctionSchema instead of char* for dispatch (#28295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28295

Previous PR was landed in a broken state

Test Plan: Imported from OSS

Differential Revision: D18066217

Pulled By: bwasti

fbshipit-source-id: 665de7b28145885d6b01f5f212897ac3f8f6270f
2019-10-22 14:38:43 -07:00
2cc0f1bbc6 Run pytorch mobile benchmark in PEP (#28437)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28437

Add target to build speed_benchmark_torch for PEP.
Added a new argument `--report_pep` to print total runtime information for PEP. Can add per-op stats under this later.

Test Plan: https://our.intern.facebook.com/intern/aibench/details/664440309179004

Reviewed By: hl475

Differential Revision: D18062059

fbshipit-source-id: ca80e980ce8e48604782a15ac44dd8d403832817
2019-10-22 14:21:49 -07:00
5f1563296b remove AutoNonVariableTypeMode from jit-op-registry (#28402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28402

Revert PR #27274 as it's absorbed by PR #28398.

Test Plan: - make sure all mobile models can load and run

Differential Revision: D18055993

Pulled By: ljk53

fbshipit-source-id: 0d0ffdf2cfae18577189d3b69de15fa892210916
2019-10-22 14:08:58 -07:00
d0d8b8c31c change detach() & detach_() to no-op for USE_STATIC_DISPATCH mode (#28400)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28400

This is yet-another fix to issue #26764.

Some mobile models call tensor.detach() which won't work with
static-dispatch mode. We disable autograd for static-dispatch / mobile
build anyway so it seems fine to make these op-ops.

Test Plan: - With stacked PRs, confirmed it can run failed models now.

Differential Revision: D18055852

Pulled By: ljk53

fbshipit-source-id: bff3a55fee2ca68ac5333fb4978c11fd18dfcc91
2019-10-22 14:08:54 -07:00
04bfc213ab remove AutoNonVariableTypeMode guard around forward() call (#28399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28399

This is also to address issue #26764

Turns out it's incorrect to wrap the entire forward() call with
NonVariableTypeMode guard as some JIT passes has is_variable() check and
can be triggered within forward() call, e.g.:
jit/passes/constant_propagation.cpp

Since now we are toggling NonVariableTypeMode per method/op call, we can
remove the guard around forward() now.

Test Plan: - With stacked PRs, verified it can load and run previously failed models.

Differential Revision: D18055850

Pulled By: ljk53

fbshipit-source-id: 3074d0ed3c6e05dbfceef6959874e5916aea316c
2019-10-22 14:08:49 -07:00
38433e33a1 Make static dispatch turn off variable before entering the kernel (#28398)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28398

Redo PR #26908 for issue #26764

Test Plan: - make sure quantized mobilenetv2 no longer suffers from perf regression

Differential Revision: D18055851

Pulled By: ljk53

fbshipit-source-id: d533bc8979b1d2892adfb39924678a3f9b591855
2019-10-22 14:08:45 -07:00
a5354adb08 Eliminate the use of CUDA_HOME in setup.py. (#28373)
Summary:
Variables read from CMakeCache.txt are more reliable.

Close https://github.com/pytorch/pytorch/issues/28365
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28373

Differential Revision: D18061855

Pulled By: ezyang

fbshipit-source-id: c550a365e23464411d75eca167f7e6e053f94872
2019-10-22 14:04:54 -07:00
30712f6e30 Move the CUDA implementation of sqrt to ATen. (#27372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27372

Fix #24638

Test Plan: Imported from OSS

Differential Revision: D18037944

Pulled By: VitalyFedyunin

fbshipit-source-id: d3dbbc167954c7bbee25be13b5b669433bca6ee5
2019-10-22 14:01:07 -07:00
19aeb472aa Move the CUDA implementation of log1p to ATen. (#26923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26923

Fix #24588

Test Plan: Imported from OSS

Differential Revision: D17984184

Pulled By: VitalyFedyunin

fbshipit-source-id: 3bc2be4f08e800b1de274940f2bd3d5b418b45ee
2019-10-22 14:00:59 -07:00
4f70b5a4de Export det (#26958)
Summary:
Added symbolic to export det in opset 11
Updating ONNX submodule is required for det export
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26958

Reviewed By: hl475

Differential Revision: D17844887

Pulled By: houseroad

fbshipit-source-id: 224ae3ff82939dc7ae8584c5a30a31fe6afa05f6
2019-10-22 13:33:15 -07:00
456d9a0dbe Enable Scatter/Gather ORT Test for opset 11 (#27876)
Summary:
Enable ONNX Runtime Test for scatter in opset 11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27876

Reviewed By: hl475

Differential Revision: D18063347

Pulled By: houseroad

fbshipit-source-id: f26104770b9c0d0dfe6a4111189436bea13e9460
2019-10-22 13:27:00 -07:00
2a2cdc8aeb Revert D18001407: Port of multilabel_margin_loss from TH to ATen (CPU)
Test Plan: revert-hammer

Differential Revision:
D18001407

Original commit changeset: 68cbd9ce0aac

fbshipit-source-id: b43a83bfa087ea017b2b8bd09050c78c725ecd9e
2019-10-22 13:26:56 -07:00
636fbcdd0a add benchmark code to iOS TestApp (#28405)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28405

### Summary

As discussed with AshkanAliabadi  and ljk53, the iOS TestApp will share the same benchmark code with Android's speed_benchmark_torch.cpp. This PR is the first part which contains the Objective-C++ code.

The second PR will include the scripts to setup and run the benchmark project. The third PR will include scripts that can automate the whole "build - test - install" process.

There are many ways to run the benchmark project. The easiest way is to use cocoapods. Simply run `pod install`. However, that will pull the 1.3 binary which is not what we want, but we can still use this approach to test the benchmark code. The second PR will contain scripts to run custom builds that we can tweak.

### Test Plan
- Don't break any existing CI jobs  (except for those flaky ones)

Test Plan: Imported from OSS

Differential Revision: D18064187

Pulled By: xta0

fbshipit-source-id: 4cfbb83c045803d8b24bf6d2c110a55871d22962
2019-10-22 12:52:30 -07:00
7b59174882 torch::nn::LayerNorm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28032

Differential Revision: D18047371

Pulled By: anjali411

fbshipit-source-id: fb61aea52d6622a67ec1d84950e17e85686461ae
2019-10-22 12:50:22 -07:00
3fce612cb1 preserve original tensoriterator behavior when not explicitly promoting (#28231)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28231

Fix: https://github.com/pytorch/pytorch/issues/28010

A mixed-type index assignment that would have been an error in 1.2 was unintentionally made possible (with incorrect results) in 1.3. This PR restores the original behavior.

This is BC-breaking because:
```
        a = torch.ones(5, 2, dtype=torch.double)
        b = torch.zeros(5, dtype=torch.int)
        a[:, [1]] = b.unsqueeze(-1)
```
now raises an error (as in 1.2) whereas it did not in 1.3.

Test Plan: Imported from OSS

Differential Revision: D18049637

Pulled By: nairbv

fbshipit-source-id: 11a37dc98364ae70aac0e9dbc090d2a500aa7ccc
2019-10-22 10:38:27 -07:00
6d689e27c7 clean up NamedTuple creation API (#28189)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28189

This makes it a separate createNamed function. The existing API resulted
in poor usage in fbcode, which in turn caused bugs in TorchScript programs.

Test Plan: Imported from OSS

Differential Revision: D17970220

Pulled By: zdevito

fbshipit-source-id: 59b082a726f56bec1c8d10d410db829f4aa271ea
2019-10-22 10:18:07 -07:00
03d24dba6c Fix static linking cuDNN without static CUDA (#28378)
Summary:
Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765

The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378

Differential Revision: D18061841

Pulled By: ezyang

fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986
2019-10-22 10:08:09 -07:00
682da8eb43 Port of multilabel_margin_loss from TH to ATen (CPU) (#28205)
Summary:
This is a port of the CPU version of TH MultiLabelMarginCriterion to ATen.

Benchmark results ([source of script used](https://gist.github.com/andreaskoepf/ce96eedb09e9480ae2263d31822ef26e)):

Slightly slower forward (probably acceptable), slightly faster forward & backward combination.

###  WITH patch:
```
CPU forward 1000 took 0.0002544010058045387
CPU forward 10000 took 0.0022866200015414506
CPU forward 100000 took 0.02240650000749156
CPU forward 1000000 took 0.22985397902084514
CPU forward 10000000 took 2.227811124001164
CPU forward TOTAL time 4.282580643019173
CPU for- & backward 1000 took 0.0006969539972487837
CPU for- & backward 10000 took 0.004804529016837478
CPU for- & backward 100000 took 0.07736711099278182
CPU for- & backward 1000000 took 0.5985556179948617
CPU for- & backward 10000000 took 4.761040163983125
CPU for- & backward TOTAL time 7.318476865999401
```

### WITHOUT patch:
```
CPU forward 1000 took 0.00026982801500707865
CPU forward 10000 took 0.002569925010902807
CPU forward 100000 took 0.024335263995453715
CPU forward 1000000 took 0.2151200629887171
CPU forward 10000000 took 2.114590842014877
CPU forward TOTAL time 4.184845258976566
CPU for- & backward 1000 took 0.0007158009975682944
CPU for- & backward 10000 took 0.005468863993883133
CPU for- & backward 100000 took 0.05931608600076288
CPU for- & backward 1000000 took 0.5732014369859826
CPU for- & backward 10000000 took 5.2500802429858595
CPU for- & backward TOTAL time 7.7646528169861995
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28205

Differential Revision: D18001407

Pulled By: ezyang

fbshipit-source-id: 68cbd9ce0aacf99dd8c44fb4da9c09b3ffc1e59a
2019-10-22 09:37:59 -07:00
c1bb2676f3 Update C++ torch::nn parity table (#28419)
Summary:
This PR updates `test/cpp_api_parity/parity-tracker.md` to reflect changes in https://github.com/pytorch/pytorch/issues/25883.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28419

Differential Revision: D18061479

Pulled By: yf225

fbshipit-source-id: dbdc2e44e835f6125a42cf11e59723ef61903cff
2019-10-22 09:34:10 -07:00
30d6cf7bc1 Updating submodules
Summary:
GitHub commits:

913ad446c7

Test Plan: n/a

Reviewed By: cdelahousse

fbshipit-source-id: 3eba9fdc3c588489b516e6f87bee4954f4295da6
2019-10-22 09:14:24 -07:00
5e73e1fff8 Enabled torch.unique for bool tensors (#28374)
Summary:
Enabled torch.unique for bool tensors.
Tested via unit tests

[issue](https://github.com/pytorch/pytorch/issues/27691)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28374

Differential Revision: D18043413

Pulled By: izdeby

fbshipit-source-id: 295ff03b9b61d33bbd2e05e6211c4f35a0ee23ea
2019-10-22 09:09:46 -07:00
373e9096c2 Revert D18012804: Use FunctionSchema instead of char* for dispatch
Test Plan: revert-hammer

Differential Revision:
D18012804

Original commit changeset: 9b6acdeb0656

fbshipit-source-id: ca2c89c87dc3757083bae8466e6c9ab17266f07f
2019-10-22 08:18:18 -07:00
73c1030328 Support logging tensorboard embedding visualizations to generic filesystem (#27716)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27716

This uses the gfile filesystem abstraction that allows for writing to any filesystem that satisfies the interface (including S3).

Test Plan: Tested with local files and using internal S3 equivalent.

Reviewed By: natalialunova

Differential Revision: D17530694

fbshipit-source-id: c1f88c035fc03d91186b39092e42489f1c03d2cd
2019-10-22 08:12:25 -07:00
95650b152a remove deprecated torch.Tensor in test_distributed.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28316

Test Plan: Imported from OSS

Differential Revision: D18019147

Pulled By: mrshenli

fbshipit-source-id: eb0fb08031d810ea85fb6ea54b1b25791178131b
2019-10-22 07:47:36 -07:00
db298732c1 remove deprecated torch.Tensor in test_c10d.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28315

Test Plan: Imported from OSS

Differential Revision: D18019148

Pulled By: mrshenli

fbshipit-source-id: 9aff891c6df0b1cfa5ff01e7551973a16d512909
2019-10-22 07:47:33 -07:00
079b3cc02c Add C++ nn::functional pad
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26601

Test Plan: Imported from OSS

Differential Revision: D17517468

Pulled By: yf225

fbshipit-source-id: 9ee8b93b88a60f91f2ae78c242f9eaa246b3293c
2019-10-21 22:20:38 -07:00
94757e035d Do not insert observers for empty sequential modules (#28384)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28384

ghstack-source-id: 92340259

Test Plan:
buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details

 buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details

Differential Revision: D18047293

fbshipit-source-id: 7e18b1aa76cc0fd26e8ee48a70c3a45688e73549
2019-10-21 20:32:13 -07:00
d403410e0d Fastlane update (#28356)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28356

### Summary

I'm working on setting up a benchmark test project for iOS, which will reuse this Fastlane file. This PR removes the "cert install" code from "before_all" to a standalone lane target.

### Test Plan

- don't break any existing CI jobs

Test Plan: Imported from OSS

Differential Revision: D18053675

Pulled By: xta0

fbshipit-source-id: e4760a8494916c410af19ca43f040fc463551d11
2019-10-21 19:31:55 -07:00
783c9c8445 Adding docstring to the observers (#27791)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27791

This is the first part of the change. The next ones will amend more :)

Test Plan: Imported from OSS

Differential Revision: D17889913

Pulled By: z-a-f

fbshipit-source-id: ff74007903dd789d4c68684e83b50c0c86a25149
2019-10-21 19:09:50 -07:00
0ddb50010e enable test_invalid_names test in rpc_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28376

Test Plan: Imported from OSS

Differential Revision: D18045158

Pulled By: mrshenli

fbshipit-source-id: 42821ef40afbdff8662abacd447e307ccf4853d3
2019-10-21 18:43:37 -07:00
d9bca33d2c Use FunctionSchema instead of char* for dispatch (#28295)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28295

Previous PR was landed in a broken state

Test Plan: Imported from OSS

Differential Revision: D18012804

Pulled By: bwasti

fbshipit-source-id: 9b6acdeb0656d2d7911b0ed63f4d47ecca5473b9
2019-10-21 18:24:52 -07:00
6335d91c38 Disable tsan for test_c10d multiprocess test cases. (#28385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28385

TSAN doesn't work with multiprocessing with fork() since we end up
forking in a multithreaded environment which is dangerous. As a result, I'm
disabling TSAN in this change.

Similar to https://github.com/pytorch/pytorch/pull/27410 and
https://github.com/pytorch/pytorch/pull/25005
ghstack-source-id: 92319347

Test Plan: waitforbuildbot

Differential Revision: D18047778

fbshipit-source-id: 6c4e251639f74f4c772bd09bc6f2dfa83cf18fad
2019-10-21 18:14:38 -07:00
07a181da1d Add more logging in net modifier
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28327

Test Plan:
Failed as expected and the full protobuf is logged
f145060005

Reviewed By: ffjiang, wx1988

Differential Revision: D17975560

fbshipit-source-id: 5375acffc1f9dede16622b06eb58b6c3a26ebe5a
2019-10-21 17:53:00 -07:00
4e033b0040 split TestLogging, TestDict, TestList
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28038

Test Plan: Imported from OSS

Differential Revision: D17954441

Pulled By: suo

fbshipit-source-id: 4703fb577adea3aa00fabb13c577b055e9ab4d7c
2019-10-21 17:15:15 -07:00
f36497e687 split test_type_sharing
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28037

Test Plan: Imported from OSS

Differential Revision: D17954442

Pulled By: suo

fbshipit-source-id: 6edee4d7dee0e52b58e71d3b520c0503fb7bd0ed
2019-10-21 17:15:11 -07:00
0a364108d2 use base sha in clang-tidy instead of base ref (#28388)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28388

The clang-tidy script diffs the PR head ref against the base ref so that
it works only on changed lines. If the base ref is a stale `master`,
then the script will fetch upstream `master` and potentially report
unrelated changes in the diff

Use the base sha instead of ref so that the revision that the script
diffs against is stable.

Test Plan: Imported from OSS

Differential Revision: D18051363

Pulled By: suo

fbshipit-source-id: 80ead2f837e2d6244245ed7b576e84a99f0ea035
2019-10-21 17:07:57 -07:00
06bb74ce96 Tolerate small amount of embedding corruptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28371

Reviewed By: xianjiec

Differential Revision: D18031155

fbshipit-source-id: a51d2a62a919f032dc04372b30cf9071aa2dd629
2019-10-21 16:23:25 -07:00
70e9ef518f c10::string_view (#26616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616

Implement C++17 std::string_view for C++11.

This is useful for compile time type name retrievaly which I'm going to stack on top of this.
It is also useful to replace `const std::string&` with throughout our codebase.
ghstack-source-id: 92100314

Test Plan: unit tests

Differential Revision: D17518992

fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08
2019-10-21 16:10:40 -07:00
9ea42f8d7c C++ API: torch::nn::LPPool1d (#27800)
Summary:
Add torch::nn::LPPool1d module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27800

Differential Revision: D18045040

Pulled By: yf225

fbshipit-source-id: e61fefe9efec3423f7a93dd1e946f3e380122927
2019-10-21 15:33:51 -07:00
a3902c901a Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)" (#28310)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310

This reverts commit 3d3bff5ff1bc277306d15a3caa96c2a6fdb924bb.

Test Plan: Imported from OSS

Differential Revision: D18042859

Pulled By: ezyang

fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de
2019-10-21 14:45:17 -07:00
ba59d720cd Change error message for torch.linspace(). (#28274)
Summary:
Fix for https://github.com/pytorch/pytorch/issues/25810

Basically moves the error checking from the device-specific function to the native function.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28274

Differential Revision: D18032189

Pulled By: ezyang

fbshipit-source-id: 9072b5980aa2057274e79bc7241db853bfc36f11
2019-10-21 13:03:02 -07:00
bc57967e07 max_pool2d cuda should have channel last optimized kernels[Performance improvement] (#24872)
Summary:
max_pool2d_with_indices_cuda and max_pool2d_with_indices_backward_cuda should have channel last optimized kernels(https://github.com/pytorch/pytorch/issues/23815)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24872

Differential Revision: D16964577

Pulled By: ifedan

fbshipit-source-id: 296dfef8e511a7ae2ed423e34e902d5401b3becb
2019-10-21 11:28:12 -07:00
4d9c017dee Fix the padding issue of quantized average pool operator (#28260)
Summary:
This is actually a bug in both testing and the average pool implementation.
In testing, we used the quantized value as float input and failed to padding the value with zero_point.
In op implementation, the size for averaging is not correct for padding case when count_include_pad is true.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28260

Differential Revision: D18039960

Pulled By: lly-zero-one

fbshipit-source-id: 7b5d34498b60f5d574a276a22798c9f576944734
2019-10-21 11:06:31 -07:00
d9b4788e5d cleanup dist autograd context on other nodes when it is released on one node (#27951)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27951

we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`).

This PR does a few things to implement the above:
1) Add classes to encapsulate messages for requesting this context release and the response
2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it.
3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in https://github.com/pytorch/pytorch/pull/26324)
4) Relevant unit tests

In follow up PRs, we will add error checking + retries for this call.

ghstack-source-id: 92269279

Test Plan: Added/modified unit tests in `test/dist_autograd_test.py`

Differential Revision: D17920137

fbshipit-source-id: 7403512ab5fcbc28d21c548b2e45319dd472e26a
2019-10-21 07:34:08 -07:00
f6c0a89acc Updating submodules
Summary:
GitHub commits:

c8a45b6945

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: ad3b35d4ac5a168a316ee60450d1e825760e1433
2019-10-20 18:55:40 -07:00
e8165f4b00 Updating submodules
Summary:
GitHub commits:

c2ee2f1935

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: b8c9176484ed9670583574e9465b5517cef1b71b
2019-10-20 18:55:36 -07:00
6301d62e0b Updating submodules
Summary:
GitHub commits:

a797bf1e3d
963d2bf4c4

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: f17da4f8eb54b7f317714139770ccd08fdb4dab6
2019-10-20 11:48:28 -07:00
15be189f0d Add quantized torch mean implementation (#27675)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27675

This leverages QNNPACK global average pooling to perform torch.mean on input feature maps
Currently can only support mean along HxW plane in NCHW tensor.

Test Plan:
python test/test_quantized.py TestQuantizedOps.test_mean

Imported from OSS

Differential Revision: D17989336

fbshipit-source-id: 8d4cbcbed5f146290b1580d26e5b45359d293761
2019-10-19 19:20:59 -07:00
29f56eb920 Revert D17937850: Tolerate small amount of embedding corruptions
Test Plan: revert-hammer

Differential Revision:
D17937850

Original commit changeset: e9c633768d98

fbshipit-source-id: 5c2c837c7867504392b19965d91a60cadd3b8101
2019-10-19 14:17:01 -07:00
56eb4f7daa Add autograd hook for python rpc call (#28312)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28312

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92240367

Test Plan: unit tests

Differential Revision: D18017554

fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d
2019-10-19 07:38:14 -07:00
6fcefc917e Minor tweaks to rpc message api (#28326)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28326

 - Message::type() should return a MessageType, not const MessageType&,
   since MessageType is just an enum.
 - Add moveTensors() method for parallelism with movePayload().
ghstack-source-id: 92236443

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D18021692

fbshipit-source-id: 5b2f5806f104a221de8df0282f3e395d15e5bfe4
2019-10-18 23:18:26 -07:00
99271ad411 Split out data_parallel tests from test_nn.py into a separate (#28297)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28297

Splitting data parallel tests out of test_nn.py since its easier to
manage and track these tests separately and failures can be routed to
appropriate POCs.

Test Plan: waitforbuildbot

Differential Revision: D18011663

fbshipit-source-id: 17ebf7c04e7dc7ff4c8d38458daab5b911bed75d
2019-10-18 17:48:40 -07:00
eb4bb00a9c Use c10::variant-based enums for Nonlinearity and FanMode
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27933

Test Plan: Imported from OSS

Differential Revision: D18009044

Pulled By: yf225

fbshipit-source-id: e88229ee30badf7a699f62af61d1e88debc0dc7d
2019-10-18 17:48:34 -07:00
a1e14a6626 PixelShuffle module and functional (#28140)
Summary:
Added `PixelShuffle` module and functional https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28140

Differential Revision: D18008474

Pulled By: yf225

fbshipit-source-id: f482495bb56998701c79a61ef065a121bf5a5154
2019-10-18 15:54:14 -07:00
ca6ba06f95 Tolerate small amount of embedding corruptions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28299

Reviewed By: Wakeupbuddy

Differential Revision: D17937850

fbshipit-source-id: e9c633768d9819fd734ddd59017c33688ebbdcca
2019-10-18 14:59:06 -07:00
9cb003a94f Add typing check of alpha for torch.sub and code clean up.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28298

Differential Revision: D18017923

Pulled By: nairbv

fbshipit-source-id: 2c4b3f96eb005dfb70e1b7ff87d28eb79b9300dd
2019-10-18 14:49:42 -07:00
b4db590e3b Fix type promotion of complex32 and complex32 (#27929)
Summary:
torch.promote_types(torch.complex32, torch.complex32) reports
RuntimeError.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27929

Differential Revision: D18013017

Pulled By: nairbv

fbshipit-source-id: 14de1adb7e81694d0f1463b11f8d4c284b25502b
2019-10-18 14:45:25 -07:00
0aa694ebe5 Move Method::lowered_graph to a separate pass out of the Method class. (#28242)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28242

There is no reason to have it in a general API of Module/Method - it's
just another graph pass. It was there because some time ago modules were
not first class and all graphs were lowered. After that changed, this
API was added for easier transition, but now we don't need it anymore.

Test Plan: Imported from OSS

Differential Revision: D17986724

Pulled By: ZolotukhinM

fbshipit-source-id: 279a1ec450cd8fac8164ee581515b09f1d755630
2019-10-18 12:48:40 -07:00
c813503f05 Update hyperlink syntax for XLA, torchaudio, torchtext, and C++ API (#28019)
Summary:
Tested locally. Should render as such:

![image](https://user-images.githubusercontent.com/8042156/66861657-4373fc00-ef44-11e9-8a5b-52abc3ddcd51.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28019

Differential Revision: D18012303

Pulled By: brianjo

fbshipit-source-id: 4b3bd9f63f5d94d474ab13bb06220a112185e924
2019-10-18 12:15:17 -07:00
af88537483 Back out "Add autograd hook for python rpc call"
Summary: Original commit changeset: 070324c57312

Test Plan: revert

Reviewed By: pritamdamania87

Differential Revision: D18011308

fbshipit-source-id: 4185e4c6f51c1d11b23b8ab44e6e958b09f27c53
2019-10-18 11:53:39 -07:00
243298668c Remove confusing torch::jit::RegisterOperators for custom ops (#28229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28229

We have `torch::RegisterOperators` for custom ops. `torch::jit::RegisterOperators` had a dual state of being able to register custom ops if called one way and being able to register pure JIT ops if called another way.
This is confusing because you end up in different operator libraries depending on which API exactly you're using.

This PR removes the ability for torch::jit::RegisterOperators to register custom ops and forces people to use the new torch::RegisterOperators.

This was already deprecated before but we now remove it.
ghstack-source-id: 92137305

Test Plan: unit tests

Differential Revision: D17981895

fbshipit-source-id: 0af267dfdc3c6a2736740091cf841bac40deff40
2019-10-18 10:46:31 -07:00
d2eceee54b Fix hub when branch name contains slash. (#27960)
Summary:
fixes https://github.com/pytorch/pytorch/issues/27844
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27960

Differential Revision: D17964360

Pulled By: ailzhang

fbshipit-source-id: f5054fc251d2ebbf09ea4ea9fa4d1ce87db5fc52
2019-10-18 10:18:12 -07:00
109c467559 Add generate-wrapper.py with its generated wrapper files. (#28285)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28285

1. Add generate-wrapper.py to route different code path base on different platform.
2. Append all the generated wrapper files by running generate-wrapper.py, and they will be used in the next diff for buck build targets.
ghstack-source-id: 92071247

Test Plan: Will be tested in the next diff when these files are linked.

Reviewed By: dreiss

Differential Revision: D17967339

fbshipit-source-id: 8af88af9e8d2e4640bcf9d29c4daf10666aa88dc
2019-10-18 10:13:54 -07:00
56c4215fcc Add autograd hook for python rpc call (#27576)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27576

1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached.
it still send rpc with autograd meta. This is not ideal.
This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads

2. meanwhile create a utiliy to attach autograd info and functions as needed

3. add autograd send/recv functions for python rpc call

4. make changes to support nested python rpc calls

5. disallow nested dist autograd context (was landed in #27022)
ghstack-source-id: 92154535

Test Plan: unit tests

Differential Revision: D17819153

fbshipit-source-id: 37d8a85855bf591f2f2da48d475a06e870a30ea1
2019-10-18 10:11:45 -07:00
46fefc98e2 Change dper3 loss module to match dper2 (#28265)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28265

Fix the difference in dper3 and dper2 when regressionLoss is used.

Test Plan:
test using dper2 model id f134632386
Comparison tool output before change:
```
FOUND OP DIFFERENT WITH DPER2!!!
OP is of type ExpandDims
OP inputs ['supervision:label']
OP outputs ['sparse_nn/regression_loss/mean_squared_error_loss/ExpandDims:0']
===============================
Finished all dper3 ops, number of good ops 11, bad ops 1, skipped 26
run_comparison for dper2 / dper3 nets running time: 0.0020143985748291016
result type: <class 'NoneType'> result: None
```

After change:

```
FOUND OP DIFFERENT WITH DPER2!!!
OP is of type ExpandDims
OP inputs ['sparse_nn_2/regression_loss_2/mean_squared_error_loss_8/Squeeze:0_grad']
OP outputs ['sparse_nn_2/over_arch_2/linear_2/FC_grad']
===============================
Finished all dper3 ops, number of good ops 19, bad ops 1, skipped 16
run_comparison for dper2 / dper3 nets running time: 0.0017991065979003906
result type: <class 'NoneType'> result: None
```

dper2  label part of net P111794577
dper3  label part of net after change P116817194

Reviewed By: kennyhorror

Differential Revision: D17795740

fbshipit-source-id: 9faf96f5140f5a1efdf2985820bda3ca400f61fa
2019-10-18 10:08:38 -07:00
bd6f9e1d6c torch.nn.functional.gumbel_softmax #27078 (#28121)
Summary:
**Comments:**
* Grad check from 848d1ba13a/test/test_nn.py (L8898) not added
* Double data type as seen in     848d1ba13a/test/test_nn.py (L8916) not tested

**Issue:**
https://github.com/pytorch/pytorch/issues/27078
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28121

Differential Revision: D18008515

Pulled By: yf225

fbshipit-source-id: 9363fe9430df0f2bfd337cc788b11ac93adaa360
2019-10-18 09:41:40 -07:00
3629974c1e Fix quantized avg_pool2d test to support non-zero padding (#28246)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28246

Updated the reference fp32 implementation to use the dequantized input tensor to correctly take padded values into account

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d

Imported from OSS

Differential Revision: D17989334

fbshipit-source-id: 848ce78713280f529f71ff48e930db8de18abc62
2019-10-18 09:14:54 -07:00
4b64ada531 Fix typo (#28281)
Summary:
I know this is really a minor one and the list of people to mention will be significantly larger in the future. Nevertheless I would love to see my name written in correct international spelling (the strange German o-umlaut in my name becomes oe).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28281

Differential Revision: D18007518

Pulled By: ezyang

fbshipit-source-id: 1d03065636d7f65ac6b376690256c0d021482958
2019-10-18 08:51:12 -07:00
3d745508eb String optimizations related to serialization. (#28230)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28230

This change improves the pickling small data benchmark by roughly 30%.
(25.8usec -> 18.05usec).

One of the main issues was that we were spending 25%+ of the cpu profile
time in std::[o]stringstream constructors alone.

Two main parts
 - Change some std::stringstream to std::ostringstream, when they
   showed up on hot-ish paths, and it was trivial to convert them.
   Roughly 27% of the std::stringstream constructor time is spent
   building the constituent std::basic_istream. If the istream isn't
   needed, don't construct it.

 - For a couple of very hot paths (e.g. Pickler::pushGlobal), just
   convert to traditional string::append(). std::ostringstream is
   convenient, but not particularly efficient.
ghstack-source-id: 92153103

Test Plan:
Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench
  Correctness: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17982181

fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f
2019-10-18 07:39:30 -07:00
ac61adb5ef String opts related to deserialization. (#28263)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28263

When looking at profiles of deserializing small data from torch::load(),
we found some straightforward string-related changes that in aggregate
improve the base time by 25%.

One of the main problems was over-use of std::stringstream - the
constructors alone were 18%+ of the time spent. This change improves
unpickling/deserializing by converting a handful of the hottest
usecases from the profiles:

 - unpickler's readString() goes from 10.3% of time to mostly out of the picture
 - QualifiedHame constructor (particularly Join call) was 8.9% of time,
   but afterwards disappears from the profiles.
 - getRecordID/hasRecord were ~5% each, but also get somewhat smaller.
ghstack-source-id: 92158727

Test Plan:
Benchmark in buck build mode/opt experimental/jeremyl/c2:SerializationBench
  Correctness in buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17997056

fbshipit-source-id: fc6d6c7da7557ff23c8e8c7dbe4c060abf860018
2019-10-18 07:36:17 -07:00
a1ac15081e Implement lerp's derivative w.r.t. weight (#28219)
Summary:
Closes https://github.com/pytorch/pytorch/issues/22444.
It seemed low priority, but the necessary change seems trivial, so I made this PR anyway.
Thanks in advance for reviewing this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28219

Differential Revision: D17989123

Pulled By: ezyang

fbshipit-source-id: d122b50e90b63dc5d2eeb7689b5ea29d973424ed
2019-10-18 07:18:07 -07:00
91a260cef9 Adding MSELoss, KLDivLoss and BCELoss to C++ front-end (#27156)
Summary:
This PR adds ```MSELoss```, ```KLDivLoss``` and ```BCELoss```. The tests for ```BCELoss``` fail with the following error:
```
unknown file: Failure
C++ exception with description "autograd_meta() INTERNAL ASSERT FAILED at /home/shahriar/Contrib/pytorch/c10/core/TensorImpl.h:533, please report a bug to PyTorch. set_requires_grad is not implemented for Tensor (set_requires_grad at /home/shahriar/Contrib/pytorch/c10/core/TensorImpl.h:533)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27156

Differential Revision: D17960323

Pulled By: yf225

fbshipit-source-id: 84b8431064f2f573679c03a8d7994e3e2f81a4d1
2019-10-17 22:07:01 -07:00
9c41b61e3f Disable blobs_queue_db_test in ROCm CI (#28268)
Summary:
Flaky failures on master:

https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/41550/
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/41512/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28268

Differential Revision: D18000538

Pulled By: bddppq

fbshipit-source-id: 23a13724eeafb915d6f1e1f2da9bd87be0c498b2
2019-10-17 21:41:53 -07:00
53d9456adf Clean up the stale item in bc white list (#28269)
Summary:
Remove one stale item
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28269

Reviewed By: hl475, BIT-silence

Differential Revision: D18000957

Pulled By: houseroad

fbshipit-source-id: bc50f80453ce9c675928e6db784d5ebe05861f2a
2019-10-17 21:35:30 -07:00
5c768ec380 Minor: add static_assert to Pickler buffering. (#28114)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28114

This is followup on the pickler buffering change.
ghstack-source-id: 92019521

Test Plan: This just adds an static assert, hence if it builds, we're good.

Differential Revision: D17955006

fbshipit-source-id: d7fd69935d23f39db18029703f63c8f18d23047a
2019-10-17 21:16:48 -07:00
d7ff34c0f8 In torch::save() avoid zip compressing small header records. (#28180)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28180

ScriptModuleSerializer::writeCode() is the only place during torch::save()
serialization where we attempt to zip compress records.

This change avoids compressing these string records if they are
sufficiently small - e.g. in the example I looked at:
  - the strings were 123 and 28 bytes, respectively.
  - the cost in the compression routines was 16.5% of the torch::save() cost.
    (we're building a huffman table for a 28 byte string).

We'd save time and not significantly affect the space if we add these
1-line conditional compressions, rather than making it unconditional.
ghstack-source-id: 92104517

Test Plan:
Benchmark: experimental/jeremyl/c2:SerializationBench
  Correctness: normal buck mode/dev-nosan caffe2/test/...

Differential Revision: D17967995

fbshipit-source-id: 7ff934388533645dc987e105c814ffe6324f4596
2019-10-17 21:10:07 -07:00
5498a15d10 Add tests for libtorch macOS binary (#25208)
Summary:
This PR adds basic and dependency tests for libtorch macOS binary, so that we don't have issues like https://github.com/pytorch/pytorch/issues/14727 in the future.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25208

Differential Revision: D18001189

Pulled By: yf225

fbshipit-source-id: 89be1947b5bc094fcc02b0f268b9d8ebaf0f6700
2019-10-17 20:39:09 -07:00
2e7dd54796 Fix RNN nonlinearity (#28058)
Summary:
This was referenced in the `RNN` docs but wasn't actually assigned
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28058

Pulled By: driazati

Differential Revision: D17945867

fbshipit-source-id: 0f0dc2633183a7e67a12352a2a7ac0545284666a
2019-10-17 16:46:09 -07:00
0b243e9c4c Disable c10d test_sync_params_with_buffers on ROCm (#28190)
Summary:
Failed runs on master:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2097/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2144/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2154/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2167/

```
19:59:03 ======================================================================
19:59:03 FAIL: test_sync_params_with_buffers (__main__.DistributedDataParallelTest)
19:59:03 ----------------------------------------------------------------------
19:59:03 Traceback (most recent call last):
19:59:03   File "/var/lib/jenkins/workspace/test/common_distributed.py", line 130, in wrapper
19:59:03     self._join_processes(fn)
19:59:03   File "/var/lib/jenkins/workspace/test/common_distributed.py", line 211, in _join_processes
19:59:03     self._check_return_codes(elapsed_time)
19:59:03   File "/var/lib/jenkins/workspace/test/common_distributed.py", line 235, in _check_return_codes
19:59:03     self.assertEqual(first_process.exitcode, 0)
19:59:03   File "/var/lib/jenkins/workspace/test/common_utils.py", line 748, in assertEqual
19:59:03     super(TestCase, self).assertLessEqual(abs(x - y), prec, message)
19:59:03 AssertionError: 10 not less than or equal to 1e-05 :
19:59:03
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28190

Differential Revision: D17971146

Pulled By: bddppq

fbshipit-source-id: d3f527c14ca81073c1c236d5b3bb07a6ef1dde51
2019-10-17 15:09:50 -07:00
12dde7f58a cdist performance improvement for euclidean distance (#25799)
Summary:
jacobrgardner https://github.com/pytorch/pytorch/issues/15253#issuecomment-491467128 preposed a way to speedup euclidean distance calculation. This PR is implementation of this solution for normal and batch version.

Also simonepri provided performance metrics https://github.com/pytorch/pytorch/issues/15253#issuecomment-502363581
![image](https://user-images.githubusercontent.com/12058312/64460756-44a24580-d0c9-11e9-9f7f-a5942f4c832d.png)

Current implementation has speedup comparing to jacobrgardner approach
![image](https://user-images.githubusercontent.com/12058312/64461495-5553bb00-d0cb-11e9-87e6-302b8cc7e12b.png)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25799

Differential Revision: D17964982

Pulled By: ifedan

fbshipit-source-id: bf7bd0dbfca51fd39e667da55139347480f30a2f
2019-10-17 14:56:54 -07:00
7c1df06efa default caffe2_tvm_min_ops to 10 (#28250)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28250

We've been running canaries with this setting for a while

Test Plan: build, sanity canary

Reviewed By: yinghai

Differential Revision: D17872108

fbshipit-source-id: fb7f0373eac1c8aaae007a17f6ffb91482952813
2019-10-17 14:35:22 -07:00
07b5666a87 Add default arg to prepare_qat mapping. (#28193)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28193

Fixes #28015

Test Plan: Imported from OSS

Differential Revision: D17973121

Pulled By: z-a-f

fbshipit-source-id: 03b3f70c70b89060c1f03d7ed8ab6002fe60bd49
2019-10-17 14:11:54 -07:00
7ebe8328e1 Address review comments on #28011.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28109

Differential Revision: D17966067

fbshipit-source-id: 9e4a03a1813835b67cd614d8fac18524f5b36cc5
2019-10-17 14:07:58 -07:00
95922c90b5 Export update for arange and _dim_arange (#26875)
Summary:
Export arange and _dim_arange using onnx::range in opset 11
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26875

Reviewed By: hl475

Differential Revision: D17623848

Pulled By: houseroad

fbshipit-source-id: 41f0066ca1c42882ccc051a3ee5448dca25ee5d2
2019-10-17 13:55:45 -07:00
a5ac7f6387 Changing observer name
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27779

Test Plan: Imported from OSS

Differential Revision: D17886605

Pulled By: z-a-f

fbshipit-source-id: 68c50b482e65015336ff27171fd730da493525b6
2019-10-17 11:36:03 -07:00
86e7e872bf Port of multi_margin_loss from TH to ATen (CPU) (#28062)
Summary:
This is a port of the existing TH CPU C MultiMarginCriterion to function multi_margin_loss for ATen. ~~The ATen/C++ version is unfortunately significantly slower than the original. It is currently unclear to me what causes the performance degradation since the Tensor access is raw-pointer based similar to the original C implementation. (A first implementation I had created using TensorAccessor was even about 2x slower than the one in this PR).~~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28062

Differential Revision: D17980636

Pulled By: ezyang

fbshipit-source-id: bba27a13436adff5e687d95cc984ec2386ce7a73
2019-10-17 11:16:51 -07:00
618cb40e30 Add doc copy-edits from review (#26322)
Summary:
Add edits from doc review
](https://our.intern.facebook.com/intern/diff/17859654/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26322

Pulled By: driazati

Differential Revision: D17859654

fbshipit-source-id: f3a116cddb5393bdfbef670c56efb2ee62ccf252
2019-10-17 11:12:35 -07:00
5c2bf8abe5 change linear benchmark shapes (#28228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28228

as title

Test Plan:
```
buck run //caffe2/benchmarks/operator_benchmark/pt:linear_test
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N32_IN1024_OUT256
# Input: N: 32, IN: 1024, OUT: 256
Forward Execution Time (us) : 1501.918

# Benchmarking PyTorch: linear
# Mode: Eager
# Name: linear_N64_IN256_OUT100
# Input: N: 64, IN: 256, OUT: 100
Forward Execution Time (us) : 1175.672

Reviewed By: hl475

Differential Revision: D17980463

fbshipit-source-id: c8aaf6fa4d847037accb1e5b9ee04900690fd6ae
2019-10-17 11:09:10 -07:00
21c3997974 Disable schema inference for unboxedOnly kernels (#27977)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27977

The only remaining reason why we couldn't move some ops from globalATenDispatch to the c10 dispatcher was that schema inference didn't support some use cases.
But actually, we don't need schema inference for these ops. By disabling it, we can move the remaining ops from globalATenDispatch to the c10 dispatcher.
ghstack-source-id: 92104807

Test Plan: waitforsandcastle

Differential Revision: D17929696

fbshipit-source-id: 05ec65b615487fde784293e3b533fa3ec09cf234
2019-10-17 10:49:56 -07:00
8fff54ec39 Enables non-default CUDA stream in test_nn (#28192)
Summary:
Per title. Several stream fixes have gone in that may make this pass in CI.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28192

Differential Revision: D17974219

Pulled By: mruberry

fbshipit-source-id: 543d000789c83711a8b4bef169a87635fda7508b
2019-10-17 10:19:49 -07:00
951dd03037 Add memory format support to typecasting shortcuts byte,char,double,bool,half,int,long,short,float,bfloat16 (#27228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980315

Pulled By: VitalyFedyunin

fbshipit-source-id: fd5615621bc4968aa4ef2a26430c492c552ed671
2019-10-17 09:16:25 -07:00
15df371934 Add memory format support to typecasting shortcuts byte,char,double,bool,half,int,long,short,float,bfloat16 (#27228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17980128

Pulled By: VitalyFedyunin

fbshipit-source-id: b2646bab72c4475b7a82bb271d204a9d96d28bd4
2019-10-17 09:16:21 -07:00
c36552c4cb Fixing dispatch error in windows debug builds (#24360)
Summary:
nullptr initialization values for dispatch pointers were overwriting values set using the REGISTER_DISPATCH macro.

Relevant issue: https://github.com/pytorch/pytorch/issues/22681
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24360

Differential Revision: D17952241

Pulled By: ezyang

fbshipit-source-id: 4bf86dc24153e504bbeacb526c58fd8230bb972a
2019-10-17 09:13:19 -07:00
e1be08fcf5 out-variant for torch.batch_norm_elemt (#27621)
Summary:
Following dicussion with ezyang in https://github.com/pytorch/pytorch/issues/26288
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27621

Differential Revision: D17978858

Pulled By: ezyang

fbshipit-source-id: f843b691a67f1dc48b87ed6a633007d193150cf7
2019-10-17 09:09:46 -07:00
4e71be449e Remove tools/setup_helpers/nvtoolext.py (do not seem to be used) (#28125)
Summary:
`git grep nvtoolext` shows nothing (meaning that it is never imported).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28125

Differential Revision: D17979164

Pulled By: ezyang

fbshipit-source-id: 7cfe770c9f7140c8ad58676f912037e6226647d3
2019-10-17 09:07:09 -07:00
4cc368e3a6 Declare the LAPACK and MAGMA dispatchers instead of defining them with a default error (#28133)
Summary:
This clears a lot of dead code that isn't reachable due to `AT_DISPATCH`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28133

Test Plan: - All existing tests should pass to ensure that the change is valid.

Differential Revision: D17978803

Pulled By: ezyang

fbshipit-source-id: 8fdaa74f9addb1d7987c5d625557b8a463a25500
2019-10-17 09:04:56 -07:00
076b116a41 In ProcessGroupAgent, use non-iostream torch::load()/save(). (#28063)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28063

Avoid using the iostream versions of torch::load()/torch::save(), which
incur at least one additional full data copy.
ghstack-source-id: 92059608

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17945206

fbshipit-source-id: ba24376c13762a28e569530e3b1a939ac6f72f43
2019-10-17 07:39:30 -07:00
4a69d048e0 Move the CUDA implementation of log2 to ATen. (#26769)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26769

Fix #24589

Test Plan: Imported from OSS

Differential Revision: D17960122

Pulled By: VitalyFedyunin

fbshipit-source-id: 58dff236886bbf3a0a152d7422aa8a5c478ee1de
2019-10-17 07:27:55 -07:00
6923b93ebc Revert D17972725: [pytorch][PR] Update onnx-tensorrt
Test Plan: revert-hammer

Differential Revision:
D17972725

Original commit changeset: 01933b3f9e2b

fbshipit-source-id: 43f3560a7a3922dd676678b61d6cce7f2006b3f1
2019-10-17 07:07:04 -07:00
bb0e46b65a Remove preallocation of type ids (#28024)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 92051872

Test Plan: unit tests

Differential Revision: D17936165

fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c
2019-10-16 23:08:11 -07:00
58ed8ca9e1 clean up exported source format (#28129)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28129

The previous PR in the stack removed the need to order classes/functions
or have correct import statements. This resolved circular depedency issues
that can arise when class constructors like ModuleList put new instances
of themselves in a common namespace.

This PR changes our export format to no longer produce this information.
By doing so we can make the logic signficantly simpler, since we just
keep track of an individual PythonPrint object per file.

Notes:
* PythonPrint was changed to manage its own stream/list of ranges. It
was doing this anyway internally, this just makes the API more clear.
* Since we are changing the serialization format, I also removed op_version_set.
It is now replaced with the VERSION number that written in the zip archive.
This further simplifies the code emission process.
* A test of op_version_set was removed since there is no longer any behavior
to test.

Test Plan: Imported from OSS

Differential Revision: D17961610

Pulled By: zdevito

fbshipit-source-id: ada362c4ca34d05393a1a7e799c94785ab9d9825
2019-10-16 22:47:24 -07:00
aad5071206 Use torch::variant for enums in C++ API
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26837

Test Plan: Imported from OSS

Differential Revision: D17579438

Pulled By: yf225

fbshipit-source-id: 9ac59df28a317fdb3be2cc02c65962ad99117127
2019-10-16 22:40:57 -07:00
de0f9567a3 Add quantized avg_pool2d for pytorch mobile (#27631)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27631

Add support to perform avg_pool2d on mobile. Tested using existing avg_pool2d python tests
Uses qnnpack backend, which currently only support 4 dim inputs.

Test Plan:
python test/test_quantized.py TestQNNPackOps.test_avg_pool2d

Imported from OSS

Differential Revision: D17973792

fbshipit-source-id: 95ffffb2da656ed911a618b9cb68d6b728c16c74
2019-10-16 22:02:23 -07:00
62e281fbcf Add CI builds (#27925)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27925

Add extra CI builds for TBB and native builds

Test Plan: check CI

Differential Revision: D17914952

Pulled By: ilia-cher

fbshipit-source-id: 16995038909d17eb6f9c69b9bddd8f12981ad36b
2019-10-16 21:53:40 -07:00
19956b200d Relax set_num_threads restriction in parallel native case (#27947)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27947

Don't throw exception if the requested size is the same as the currently
used one

Test Plan:
ATEN_THREADING=NATIVE python setup.py develop --cmake

Imported from OSS

Differential Revision: D17919416

fbshipit-source-id: 411f7c9bd6a46e7a003b43a200c2ce3b76453a2e
2019-10-16 21:53:36 -07:00
2265cddbd2 Cleanup torch::jit::script::Module API for accessing attributes/parameters/submodules. (#27260)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27260

This PR has the following changes:
- Slot class is removed. In all use cases except `lower_graph` we really
just needed the attribute name and thus having an extra layer of
abstraction through Slot only made the code harder to understand.
- get_parameters, get_attributes, get_modules, and get_slots now return
a list of <name, item> pairs instead of a list of Slots.

Differential Revision: D17728910

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: 94781611752dd88e7fddfe8b8e0252d6ec32ba68
2019-10-16 21:32:08 -07:00
d083b443b4 Fix LayerNorm Bug (#28196)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28196

Fix LayerNorm Bug

Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm"

Reviewed By: okhonko, houseroad

Differential Revision: D17973451

fbshipit-source-id: 865e4f295b8d6c0438ec8872da0b43d3c5d3d3c6
2019-10-16 20:38:46 -07:00
edc28676ef Adds @overridePrecision decorator (#28131)
Summary:
Adds the overridePrecision decorator, which allows device generic tests to specify per-dtype precision overrides.

Precision is overridden on the test class instance itself, and so is thread-local (so that running multiple tests in parallel will not conflict). It can be accessed directly from a test with self.precision, as before.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28131

Differential Revision: D17969774

Pulled By: mruberry

fbshipit-source-id: c4e0b71afac6bdc7cbf4e799f3054922de764820
2019-10-16 19:47:55 -07:00
35a5df8c94 Update onnx-tensorrt (#28158)
Summary:
We need https://github.com/onnx/onnx-tensorrt/pull/290 to be able to switch PyTorch to C++14. This PR updates the onnx-tensorrt dependency so we have that fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28158

Differential Revision: D17972725

Pulled By: smessmer

fbshipit-source-id: 01933b3f9e2b6f79a00ef919ab1633a8c63571dd
2019-10-16 19:20:29 -07:00
f279b68a48 Update gloo (#28174)
Summary:
We need https://github.com/facebookincubator/gloo/pull/225 to be able to switch PyTorch to C++14. This PR updates the gloo dependency so we have that fix.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28174

Differential Revision: D17973079

Pulled By: smessmer

fbshipit-source-id: 887996d1c2850bb97bf2eb081544b67ca5c9ae5f
2019-10-16 19:02:05 -07:00
86e93bde90 Back out "Use FunctionSchema instead of char* for dispatch"
Summary: Original commit changeset: cb8e21d4b8d2

Test Plan: revert

Reviewed By: jerryzh168

Differential Revision: D17971815

fbshipit-source-id: 92ca62b4ca20c3d083d1fc87e0080b988a981cc8
2019-10-16 18:57:30 -07:00
d9de2e0ba9 Back out "Revert D17936166: [wip] Constexpr type ids" (#28155)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155

Original commit changeset: 92c63a96dedd
ghstack-source-id: 92051874

Test Plan: unit tests

Differential Revision: D17964410

fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb
2019-10-16 18:24:04 -07:00
ff00e8c9eb Fix pushLong() issue in pickler. (#28057)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28057

For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value.

We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert.

ghstack-source-id: 92048325

Test Plan: buck test mode/dev-nosan caffe2/test/...

Differential Revision: D17934174

fbshipit-source-id: ecc1ca37dbcc87151fc5bf2ffb6b05dff91d3667
2019-10-16 18:07:26 -07:00
aa6c394e39 Use FunctionSchema instead of char* for dispatch
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27159

Test Plan: Imported from OSS

Differential Revision: D17693481

Pulled By: bwasti

fbshipit-source-id: cb8e21d4b8d29dcc1cd75cb6b681986679b835fe
2019-10-16 17:14:28 -07:00
3214f134b6 fix python rpc handler exit crash (#27251)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27251

 Explicitly clean up py::objects to avoid segment faults when py::objects with CPython are cleaned up later at program exit.

See similar issues reported https://github.com/pybind/pybind11/issues/1598
and https://github.com/pybind/pybind11/issues/1493.

Our local tests also caught this segment faults if py::objects are cleaned
up at program exit. The explaination is: CPython cleans up most critical
utitlies before cleaning up PythonRpcHandler singleton, so when
PythonRpcHandler signleton cleans up py::objects and call dec_ref(), it
will crash.

The solution is to clean up py::objects earlier when Rpc agent join().
Be note that py::objects can not be cleaned up when Rpc agent is destroyed
as well, as Rpc agent is global variable and it will have same issue as
PythonRpcHandler.

close #27182
ghstack-source-id: 92035069

Test Plan: unit tests on python 3.6 and python 3.5

Differential Revision: D17727362

fbshipit-source-id: c254023f6a85acce35528ba756a4efabba9a519f
2019-10-16 16:57:38 -07:00
7d277b0670 Multi Label Margin loss (#27659)
Summary:
In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `MultiLabelMarginLoss` module and `multilabel_margin_loss` functional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27659

Differential Revision: D17931905

Pulled By: yf225

fbshipit-source-id: 3642f75c79843dda55ac38de9f6f970f3e237847
2019-10-16 15:44:38 -07:00
cbcb70f84c print last 50 runs when using ai_pep_format (#28128)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128

as title

Test Plan:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"}
...

Reviewed By: hl475

Differential Revision: D17957611

fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4
2019-10-16 15:22:23 -07:00
97257e257e clean up test_cat_empty (#28115)
Summary:
Remove spurious parts from test_cat_empty
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28115

Test Plan: no additional tests needed.

Differential Revision: D17956669

Pulled By: ngimel

fbshipit-source-id: cffcfa9e5b50afba62c6dbc8ca5d9de95d0c020e
2019-10-16 14:42:14 -07:00
cbb4c87d43 Improve the doc and test of logical_xor (#28031)
Summary:
Following up https://github.com/pytorch/pytorch/issues/27248. per suggestion by gchanan
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28031

Differential Revision: D17962226

Pulled By: gchanan

fbshipit-source-id: 788e4e1fc78b1cfc7915aedaa10c8656b19edc4d
2019-10-16 13:57:53 -07:00
3523e5427a Add master to OSS RPC test (#27776)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27776

I think it’s not worth it to equip other `RPCAgent` with collective communication capability, i.e. 1) have GLOO contained in `RPCAgent`, or 2) Implemented ::barrier() and ::drain() based on RPC messaging.

The only use case that does not have a master is the OSS unit test suite, caffe2/test/rpc_test.py.

I think having those unit tests to have a master is simpler than equipping `RPCAgent` with collective communication capability.

Differential Revision: D5445858

fbshipit-source-id: 56ee24703abd8c5b366829430bef657e0f1dfeba
2019-10-16 13:45:45 -07:00
174e1ba3b8 Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457)
Summary:
1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type
2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call
from TensorIterator when tensors are the same
3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides`
DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6
4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to.

These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27457

Test Plan: should be covered by existing tests

Differential Revision: D17784532

Pulled By: ngimel

fbshipit-source-id: e6a8ee58be5de14461bdbc2e2b0b6d16a96c309f
2019-10-16 13:06:20 -07:00
3ac4267763 Force building with GCC 5 (#28098)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28098

Make sure that we're building with GCC 5 everywhere
ghstack-source-id: 92013998

Test Plan: waitforsandcastle

Differential Revision: D17953640

fbshipit-source-id: 26d978c60fc973c787383297d730b45d40fa300b
2019-10-16 12:49:59 -07:00
dc8785a022 Refactoing names for consistency
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27670

Test Plan: Imported from OSS

Differential Revision: D17846269

Pulled By: z-a-f

fbshipit-source-id: ed3c7441c185bf11b2e62879aa3ecbc654aa2d4e
2019-10-16 12:18:26 -07:00
9540f6c3fe Soft Margin loss (#27660)
Summary:
In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `SoftMarginLoss` module and `soft_margin_loss` functional.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27660

Differential Revision: D17958325

Pulled By: yf225

fbshipit-source-id: c14422765e6e1fdabf6c9687080e6d5ff490d300
2019-10-16 12:04:08 -07:00
c67d3533a7 Update C++ torch::nn parity table, and temporarily disable C++ API parity test (#28117)
Summary:
This PR updates `test/cpp_api_parity/parity-tracker.md` to reflect our progress on C++ `torch::nn` parity. It also disables the C++ API parity test temporarily, and as the next step I will refactor the parity test to make it simpler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28117

Differential Revision: D17957948

Pulled By: yf225

fbshipit-source-id: 1dd836c25665f57ba8efc6d1abf671a95c03eff7
2019-10-16 11:54:13 -07:00
735463f210 ONNX Export Scripted Interpolate Op (#27566)
Summary:
We currently support exporting traced interpolate ops to ONNX.

Scripting interpolate op invokes aten::__interpolate in the Torch IR (instead of aten::upsample_[mode][dim]d), which we do not support yet.
This PR implements the ONNX symbolic for __interpolate() to support exporting interpolate in scripting scenarios.

Related open issue: https://github.com/pytorch/pytorch/issues/25807
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27566

Reviewed By: hl475

Differential Revision: D17817731

Pulled By: houseroad

fbshipit-source-id: e091793df503e2497f24821cf2954ff157492c75
2019-10-16 11:22:22 -07:00
5136ed0e44 Remove attempToRecoverType (#26767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26767

Now that we have tagged ivalues, we can accurately recover the type with
`ivalue.type()`. This reomoves the other half-implemented pathways that
were created because we didn't have tags.

Test Plan: Imported from OSS

Differential Revision: D17561191

Pulled By: zdevito

fbshipit-source-id: 26aaa134099e75659a230d8a5a34a86dc39a3c5c
2019-10-16 11:07:13 -07:00
fb4517132f Allow 'Any' to appear as a type argument. (#26572)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26572

Combined with isinstance specialization this allows a degree of polymorphic
functions to work without needing to use our weirder overload hacks.

We do not define any operators on Any, so the only thing you can do with it
is to put it in containers or type refine it using an isinstance check.
Any is restricted from appearing in non-argument position because we
cannot restore type tags if it ends up as a field in a class.

Test Plan: Imported from OSS

Differential Revision: D17530643

Pulled By: zdevito

fbshipit-source-id: f06f78ce84819f7773953a492f3d4c49219ee94c
2019-10-16 11:07:08 -07:00
97b39a296f Fix error report highlight for unmatched type annotation (#27195)
Summary:
This PR fixes https://github.com/pytorch/pytorch/issues/25801 (see there for my verbose analysis).

As an example, for the following code:

```
import torch

torch.jit.script
def f1(x):
    # type: (int, int) -> None
    pass
```

this PR will change error message from this:

```
RuntimeError:
Number of type annotations (2) did not match the number of function parameters (1):
# type: (int, int) -> None
```

to this:

```
RuntimeError:
Number of type annotations (2) did not match the number of function parameters (1):
at __scratch__/example.py:4:0
torch.jit.script
def f1(x):
~~~~~~~~ <--- HERE
    # type: (int, int) -> None
    pass
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27195

Differential Revision: D17910902

Pulled By: driazati

fbshipit-source-id: af5c6353069d005752d6c7f0bd6a0c6db8437e55
2019-10-16 10:39:36 -07:00
8cdc262063 Add support for @staticmethod (#27163)
Summary:
Resolve static methods as functions

Fixes #26792
](https://our.intern.facebook.com/intern/diff/17695094/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27163

Pulled By: driazati

Differential Revision: D17695094

fbshipit-source-id: 4671cae1a92526a35c83b8d9c12a50aa5442412b
2019-10-16 10:36:38 -07:00
e3e54282cd Updating submodules
Summary:
GitHub commits:

509dd6da09
6b95a33c60
90debac03b

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 20fbd2548722418516e602c9a538d0a541a19fee
2019-10-16 10:22:22 -07:00
2d2fe14a60 Install CUDA for clang-tidy (#27967)
Summary:
fixes: https://github.com/pytorch/pytorch/issues/28009

clang-tidy is reporting `'cuda_runtime_api.h' file not found` when a PR modifying some file including this header.

Installation script take from official site:
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27967

Differential Revision: D17952383

Pulled By: ezyang

fbshipit-source-id: 85807d93bd46eb902a84b2126784349ce3a01cfa
2019-10-16 10:02:19 -07:00
94c1ff4388 Devirtualize allow_tensor_metadata_change() getter/setter. (#27667)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27667

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17886548

Pulled By: ezyang

fbshipit-source-id: b99db2e163e5621920f12b150709f0defbce13da
2019-10-16 09:57:31 -07:00
4f4c69b1de Make set_grad_accumulator private (friend class SavedVariable) (#27666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27666

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17886544

Pulled By: ezyang

fbshipit-source-id: b9ff845cb1e5ec6f7cb4f2fa171403d555014248
2019-10-16 09:57:27 -07:00
e1f58b7c4c Make AutogradMeta a private struct in Variable. (#27654)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27654

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17886547

Pulled By: ezyang

fbshipit-source-id: ea0c5b40a5f34bc37657ed5d3bce9140063ddcbb
2019-10-16 09:57:23 -07:00
34522c212a Add trailing underscore to member variable. (#27651)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27651

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17886546

Pulled By: ezyang

fbshipit-source-id: b8f7c74b1004d35690a815b0c7671a07ca612e94
2019-10-16 09:57:19 -07:00
f38beff800 Add nn.Bilinear to C++ Frontend (#26082)
Summary:
Adds support for the Bilinear layer to the C++ frontend
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26082

Differential Revision: D17954148

Pulled By: yf225

fbshipit-source-id: 5e746bdea29b00e25969cd7a22044b8059b53687
2019-10-16 09:54:01 -07:00
3ed9a6e2ab Buffer in Pickler to improve performance. (#27720)
Summary:
This change adds a small fixed-size buffer to Pickler to
avoid calling writer_() and the associated downstream checks
on a per-opcode/per-byte basis.

We end up still doing a bounds check in the common case,
but the memcpy() is a fixed size. And we reduce the number
of backend calls.

In practice, this change speeds up the Pickle1MInts benchmark
for me locally from roughly 56msec to 22msec.

Additionally, in this change we convert a few pushIValue() on
typed lists, where we know the type to be double/int/boot to be
pushInt() to bypass a bit of logic.

We should additionally change the Unpickler, though keeping
this separate, since the std::function<> prototype needs to be
changed for this to work (i.e. return size_t rather than bool).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27720

Test Plan:
buck test mode/dev-nosan caffe2/test:...
  Benchmark in experimental/jeremyl/c2/SerializationBench.cpp (run in mode/opt)

Differential Revision: D17847174

Pulled By: jjlilley

fbshipit-source-id: 22e5e5fd33f1a369c124ea5aac7880538e2bf6a0
2019-10-16 09:37:15 -07:00
3d3bff5ff1 Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607

As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files.

Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the  dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828.  I don't have a windows setup to confirm though.

Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887

Differential Revision: D17929440

Pulled By: ezyang

fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64
2019-10-16 09:21:47 -07:00
4f1f084d22 Make layer_norm dispatch from yaml file to fix XLA test (#28051)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28051

Make layer_norm dispatch from yaml file to fix XLA test

Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm"

Reviewed By: houseroad

Differential Revision: D17939919

fbshipit-source-id: 384b6a8008dabfc1aaeb0357c1bd195be68f1edb
2019-10-16 07:29:38 -07:00
5c153de26b Nicer promotion error message when pr. (#27941)
Summary:
Instead of an abstruse "unsupported scalarType", we print more.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27941

Differential Revision: D17933972

Pulled By: ezyang

fbshipit-source-id: 51e0e1c11e530606612482e24ff28898323e54fc
2019-10-16 07:04:13 -07:00
1819fade35 Revert D17936166: [wip] Constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17936166

Original commit changeset: 68cfa926c721

fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2
2019-10-16 06:47:10 -07:00
054239dc0e Updating submodules
Summary:
GitHub commits:

4727542db2

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 02d786c67323b3b9aa5822ebcd7c497798424ef7
2019-10-15 23:40:10 -07:00
08f4a244d3 Eliminate unnecessary Tensor refcount bump.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28011

Differential Revision: D17936915

fbshipit-source-id: 457ecd09bbe9af4f1fa8ede66ba1265763dc70dd
2019-10-15 22:50:00 -07:00
2e0294cb39 Make JIT Serialization support arbitrary std::function<> IO (#28039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28039

Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17939034

fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7
2019-10-15 22:12:04 -07:00
9cc4405dc9 Constexpr type ids (#28023)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023

ghstack-source-id: 91987335

Test Plan: waitforsandcastle

Differential Revision: D17936166

fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4
2019-10-15 21:21:20 -07:00
e9a91756cd Back out "[pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)"
Summary: Original commit changeset: 9ddffe4dbbfa

Test Plan: ci

Reviewed By: yf225

Differential Revision: D17939581

fbshipit-source-id: 44a3b843bf1e7059fec57b9e3d12ed4886816145
2019-10-15 21:12:10 -07:00
ab50abca5c Export masked_select and masked_scatter in opset 11 (#25949)
Summary:
- masked_select is exported as ONNX::GatherND
- masked_scatter is exported as ONNX::ScatterND
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25949

Reviewed By: hl475

Differential Revision: D17465489

Pulled By: houseroad

fbshipit-source-id: 4c3732617733ca2024a5e306ffa9f6bfcf9725d5
2019-10-15 21:09:37 -07:00
705958be5b Update GCC for CentOS build (#28059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28059

ghstack-source-id: 91987332

Test Plan: waitforsandcastle

Differential Revision: D17945780

fbshipit-source-id: 044a0d24837545eab6d637d6cbe644bb694f318f
2019-10-15 19:04:02 -07:00
d2c2501eb3 Minor improvements in RPC api docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28030

Test Plan: Imported from OSS

Differential Revision: D17937426

Pulled By: mrshenli

fbshipit-source-id: 74e03542ab40abcd71441a188215cb1562b558df
2019-10-15 19:00:46 -07:00
e4f5224ebd Revert D17935286: Update GCC for centos CI builds
Test Plan: revert-hammer

Differential Revision:
D17935286

Original commit changeset: 12f584d4a240

fbshipit-source-id: ecc49bbf1d6f78752bdb834b8a1b145a359c8240
2019-10-15 17:51:08 -07:00
59cd0faeff Defer pg agent listener thread until contexts are initialized (#28013)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28013

ProcessGroupAgent currently kicks off the listener thread in its
constructor. However, serving requests requires contexts to be
initialized, e.g., RRefContext and agent_ global var in api.py,
which might not be done yet when the first request arrives.
ProcessGroupAgent does not know what would be the appropriate time
to start the listener thread, hence exposing an API for higher
layer code to explicitly start listeners.

Test Plan: Imported from OSS

Differential Revision: D17932271

Pulled By: mrshenli

fbshipit-source-id: 3b408477594d4d19319e7cd08dd6f383a7ed7670
2019-10-15 17:45:43 -07:00
00a2b36188 improve error handling in getNCCLVersion in NCCLUtils (#27883)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27883

Returns early if NCCL version code returned to us is < 100, to prevent
division errors. This shouldn't actually happen since the nvidia nccl version is way past 0.1.0 but nice to have this safeguard.
ghstack-source-id: 91861083

Test Plan: Follow same process as https://github.com/pytorch/pytorch/pull/27068. Also force version to be < 100 and ensure that "Unknown NCCL Version" is returned.

Differential Revision: D17903234

fbshipit-source-id: c4df63bb1c18f1b2ef9e4cd434d4ca6c5ac556df
2019-10-15 17:33:09 -07:00
871b1419de Test graceful termination of RPCAgent with asymmetric load (#27761)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27761

# Problem

`rpc_test` currently only has test cases that put equal amount of work on every worker node.
The problem is that even if the `RpcAgent::sync` is implemented as an empty method. There is no termination misbehavior detected.

# Solution

At least add one imbalanced-loaded test.
ghstack-source-id: 91785984

Differential Revision: D5361435

fbshipit-source-id: 92d1f7cad61b27cdeadc2825ceab6e88d5e4b459
2019-10-15 16:45:21 -07:00
7b06f958cf Update GCC for centos CI builds (#28018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28018

We need a newer GCC, GCC 4 is discontinued.
ghstack-source-id: 91953133

Test Plan: waitforsandcastle

Differential Revision: D17935286

fbshipit-source-id: 12f584d4a240453c62a854438b8579c1cbfd1e94
2019-10-15 16:37:56 -07:00
cf43aa3e16 add type refinements for isinstance checks (#27772)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27772

This replaces unchecked_unwrap_optional with unchecked_cast. This
enables the generalization of type refinement so that it works for
isinstance checks as well. This also removes unchecked_unwrap_optional from
code we generate, which is good because it is a hard op to serialize well
since it doesn't directly encode the Optional[T] being unwrapped. In contrast,
unchecked_cast always explicitly lists the type.

Test Plan: Imported from OSS

Differential Revision: D17885424

Pulled By: zdevito

fbshipit-source-id: ce81077d6fbeaf2a802a2e0b17349aca85670466
2019-10-15 16:00:42 -07:00
5d26ba08b7 Remove unnecessary Node* closures in operator registration
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27007

Test Plan: Imported from OSS

Differential Revision: D17696525

Pulled By: zdevito

fbshipit-source-id: b329b77afa0e6dbe9cb920a98cf07bb329d01023
2019-10-15 16:00:38 -07:00
3de34744b3 Make PythonPrint a class (#26787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26787

A follow up PR will remove the need to issue import statements,
or write classes in order since they are no longer needed.
 This change allows the same PythonPrint class
to be used for an entire file which will be needed in that patch.

Test Plan: Imported from OSS

Differential Revision: D17566440

Pulled By: zdevito

fbshipit-source-id: 1ee896da0cdfe6a003298e1d4b0238403b9ed6dd
2019-10-15 16:00:34 -07:00
f62c8f48e8 remove dead LEGACY_PythonPrint
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26786

Test Plan: Imported from OSS

Differential Revision: D17566439

Pulled By: zdevito

fbshipit-source-id: ae42b67fc00f9b1bb6ceb81bf278d213636c7f07
2019-10-15 16:00:30 -07:00
2aa84d927b Revert D17939700: Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)
Test Plan: revert-hammer

Differential Revision:
D17939700

Original commit changeset: 4fc6156ba388

fbshipit-source-id: dded0a2140d2c14cd2f2a574987ecc164b0e5bfe
2019-10-15 15:24:36 -07:00
c44e33b578 Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)
Test Plan: revert-hammer

Differential Revision:
D17889288

Original commit changeset: 9ddffe4dbbfa

fbshipit-source-id: 4fc6156ba38834512b2f735ac0d03e34e69b7286
2019-10-15 14:35:28 -07:00
5797f5dd27 Update 'einsum' docstring to conform to PEP-484 (#27563)
Summary:
[PEP-484](https://www.python.org/dev/peps/pep-0484/#arbitrary-argument-lists-and-default-argument-values) specifies that arbitrary argument lists, here `*operands`, should be annotated with the type of the single arguments, i.e. not indicating that the whole thing is wrapped into a `list` (which is a Python internal anyway). The previous docstring caused problems with type checkers for IDEs such as PyCharm ([see here](https://youtrack.jetbrains.com/issue/PY-38035)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27563

Differential Revision: D17904748

Pulled By: soumith

fbshipit-source-id: 0a7fcbbb12e388e6fc40d48bf533652a96024757
2019-10-15 14:35:24 -07:00
a6cbbd2196 Revert D17843468: Save Docker image to workspace instead of pushing to ECR.
Test Plan: revert-hammer

Differential Revision:
D17843468

Original commit changeset: c3f549e562c6

fbshipit-source-id: abb61692238c8b3ad54d31ef6bffe42ecc2f090e
2019-10-15 14:32:22 -07:00
964d3d8b38 Revert D17822962: [pytorch][PR] Make JIT Serialization support arbitrary std::function<> IO
Test Plan: revert-hammer

Differential Revision:
D17822962

Original commit changeset: d344a7e59707

fbshipit-source-id: ba153a2110faf91d103bd0f8dea4e9613bd6b0da
2019-10-15 13:55:11 -07:00
fd3d6587e6 Make TripletMarginLossImpl subclass from Cloneable (#27956)
Summary:
Continuing from https://github.com/pytorch/pytorch/pull/27770 to make all `torch::nn` layers subclass from `torch::nn::Cloneable`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27956

Differential Revision: D17936555

Pulled By: yf225

fbshipit-source-id: 75f7982e7893675cf6da0f5419224b92af579818
2019-10-15 13:38:39 -07:00
d39ab0312a Add memory_format support to and type operators (#27107)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27107

Adds memory_format keyword argument (positional for cpp).

'Preserve' behavior now follows next rules:
1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor.
2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format.
3) Output tensor is going to be contiguous in all other cases.

 ---
Dense tensor is the tensor that store values in a contiguous block of memory.
Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory.

Test Plan: Imported from OSS

Differential Revision: D17931062

Pulled By: VitalyFedyunin

fbshipit-source-id: 2c5dd3dd05bf58a9a29f25562cd45190b009c3f9
2019-10-15 12:55:56 -07:00
cbe5ab1109 Make JIT Serialization support arbitrary std::function<> IO (#27586)
Summary:
Right now, torch::save() uses std::ostream, which results in unnecessary
data copies in practice. Similar for torch::load().

Adding a std::function<size_t(const void*, size_t)> as an output option,
parallel to the existing filename and std::ostream apis, gives users the
flexibility to emit directly to a backing store.

For a simple case of appending the output to a std::string, we observe
significant benchmark savings (on order of -50%), even with the
minor std::function<> dispatch overhead. The main reason is that
std::ostringstream effectively requires 2 extra copies of the data
beyond a simple string.append lambda.

We also provide a parallel api for the load(), though this one is
slightly more complex due to the need to do arbitrary position reads.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27586

Test Plan:
buck test mode/dev-nosan caffe2/test/...
      (Basic serialization test in caffe2/test/cpp/api/serialize.cpp)
      Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443
        (1M time goes from 90ms -> 40ms, albeit with crc patch applied)

Differential Revision: D17822962

Pulled By: jjlilley

fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10
2019-10-15 12:39:58 -07:00
d482ed44f5 Fix test_docs_coverage.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27888

Test Plan: unit tests.

Reviewed By: ezyang

Differential Revision: D17911956

fbshipit-source-id: 141e2f883176a2c743514b9b3ab5272e5ea230e4
2019-10-15 12:20:13 -07:00
182abb2580 accept -1 in iterations and warmup iterations (#28014)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28014

as title

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations -1 --warmup_iterations -1

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 30827.046
...

Reviewed By: hl475

Differential Revision: D17932071

fbshipit-source-id: e4d9d256a0a4958110f61af13afdde70fc0f746c
2019-10-15 11:55:37 -07:00
f461184505 Use grad_out for cudnn CTC loss (#27039)
Summary:
Using grad_out for CuDNN CTC loss fixes: https://github.com/pytorch/pytorch/issues/26797, https://github.com/pytorch/pytorch/issues/25833.

We also fix a cudnn incompatible change that surfaced during the testing: As of CuDNN 7.6 the semantics of the CTC loss gradients are different.
This leads us to disable CuDNN CTC for CuDNN < 7.6. To mitigate the impact on users, we convert the parameters for the native implementation if CuDNN isn't applicable (previously this would give an error.)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27039

Differential Revision: D17910815

Pulled By: ngimel

fbshipit-source-id: 465b33612d3402f10c355aa7026a7e1ffaef3073
2019-10-15 11:36:37 -07:00
7e8420b7f6 Buffer to speed Unpickler (#27727)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27727

This change uses a small buffer in the Unpickler to avoid
calling reader_() byte-by-byte. Particularly, the unpickler has a
tight loop reading 1-byte opcodes.

This can be more efficient because we avoid the variable-sized
memcpy (due to templating) and std::function indirection for the
common fast path.

This improves the unpickle-1m-ints benchmark by ~20%.

This change requires changing the std::function<> interface
to Unpickler to return size_t rather than bool, but there are
only a few uses of this api.

Test Plan:
buck test caffe2/test/...
benchmark in experimental/jeremyl/c2/SerializationBench

Differential Revision: D17869980

fbshipit-source-id: 37e752744d19e12b7282252c8963355970bd4feb
2019-10-15 11:32:53 -07:00
b65540cc27 Remove named tensor builds from CI (#27762)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27762

These are now unnecessary because all of the named tensor tests run on
regular CI.

Test Plan: - verify that there are no named tensor builds on this PR.

Differential Revision: D17915432

Pulled By: zou3519

fbshipit-source-id: 64b0c0bc41af65762fa953b273c64f1b338b80ca
2019-10-15 11:24:27 -07:00
1054ab213d improve error message for scatter in processGroupGloo (#27458)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27458

Same as the previous diff - improve error message by passing back the
size discrepancy.
ghstack-source-id: 91864213

Test Plan: `python test/test_c10d.py`

Differential Revision: D17785296

fbshipit-source-id: f939b8091aede768ea215f69df2c83e438c430cf
2019-10-15 11:09:47 -07:00
3397d41b8a Wrapping namespace Reduction in namespace at (#26606) (#27422)
Summary:
1) Wrapped namespace `Reduction` in namespace `at`
2) Prefixed `at::` wherever `Reduction::` is used
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27422

Differential Revision: D17913759

Pulled By: yf225

fbshipit-source-id: 8f00ca01cad2e7f673d316b128abf59c026e216c
2019-10-15 11:05:40 -07:00
e6a71405a0 Let logical_xor support non-bool tensors (again) (#27248)
Summary:
f362a5a04b3708355b08e5c1285e46ca1b537ad6 reverted
5ca612b55ec1205f98e6bc5d5e64b1bf35f3b3cd due to build time conerns (also
see https://github.com/pytorch/pytorch/issues/25254). Now we come back to this by reusing the underlying code in
comparison operators: Logical operators on non-bool variables are
essentially comparison operators that semantically output bool
values. Compared with the previous implementation, we compromise by
always applying XOR on the same input type, while output can be either
the input type or the bool type.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27248

Differential Revision: D17929356

Pulled By: ezyang

fbshipit-source-id: dbac08c7614b36f05d24c69104fee9df9ca523d5
2019-10-15 10:56:32 -07:00
76bf8f62f7 fix loss_weight for self_supervision
Summary: previously loss_weight is not used correctly for self-supervision branch

Test Plan: buck test mode/dev-nosan //caffe2/caffe2/fb/dper/layer_models/models/experimental/tests:tum_test

Reviewed By: xianjiec

Differential Revision: D17862312

fbshipit-source-id: 554b793a5caa3886946c54333c81a0d8a10230d9
2019-10-15 10:40:48 -07:00
801b6cd0bd Allow passing undefined Tensor to Module::register_parameter (#27948)
Summary:
C++ API `Module::register_parameter` should accept undefined Tensor as parameter, which is equivalent to `module.register_parameter("param", None)` in Python API.

This unblocks https://github.com/pytorch/pytorch/pull/26082 and https://github.com/pytorch/pytorch/pull/27156.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27948

Differential Revision: D17931739

Pulled By: yf225

fbshipit-source-id: 21bdfc88e66e3dc39f3caf608a6a3de48c510fa9
2019-10-15 10:10:42 -07:00
70838ad08b Fix typo in TransformerEncoder and TransformerEncoderLayer documentation (#26230)
Summary:
Fixes a few small typos in the documentation, changing "endocder" to "encoder" and "sequnce" to "sequence"
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26230

Differential Revision: D17910820

Pulled By: vincentqb

fbshipit-source-id: 58c63f8dbbd8e2079201d4485a0d4ef323ecfb49
2019-10-15 10:07:22 -07:00
ef8bcfe2c7 Revert D17488861: constexpr type ids
Test Plan: revert-hammer

Differential Revision:
D17488861

Original commit changeset: ce7b059d7c86

fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df
2019-10-15 09:59:21 -07:00
1865f31efa Revert D17490109: Remove preallocation of type ids
Test Plan: revert-hammer

Differential Revision:
D17490109

Original commit changeset: 800c340d9d35

fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982
2019-10-15 09:59:17 -07:00
cf01f53b5a Remove preallocation of type ids (#26509)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509

We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139.
However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning.
caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType.

I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned
and remove the functionality for preallocated type ids. This simplifies our type ids.
ghstack-source-id: 91896918

Test Plan: unit tests

Differential Revision: D17490109

fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59
2019-10-15 08:47:13 -07:00
6f865c1e37 constexpr type ids (#26502)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502

Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick.

This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately.

ghstack-source-id: 91896920

Test Plan: unit tests

Differential Revision: D17488861

fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343
2019-10-15 08:47:09 -07:00
f1d4e887e0 Updating submodules
Summary:
GitHub commits:

62569d9749
bc04fdfec2
6e9db9ddcf

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 77f4dda3bef09cfb6049a3bf5715821390cdecc1
2019-10-15 08:45:02 -07:00
9033ace9c4 Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#27673)
Summary:
Replaces fused TH kernels with a 2-liner of regular Tensor functions.
Benchmarking revealed that performance improves compared to PyTorch 1.2.

Refs: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765
VitalyFedyunin

### Benchmarking results on my laptop:

## 1.4.0a0+f63c9e8 output
```
PyTorch version: 1.4.0a0+f63c9e8
CPU Operator sanity check:
tensor(0.5926, grad_fn=<MeanBackward0>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
        -0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok

GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<MeanBackward0>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
        -0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok

CPU warmup 1000 took 9.025700273923576e-05
CPU warmup 10000 took 0.0009383050055475906
CPU warmup 100000 took 0.0015631120040779933
CPU warmup TOTAL time 0.0026368020044174045
CPU forward 1000 took 6.919399311300367e-05
CPU forward 10000 took 0.00014462800754699856
CPU forward 100000 took 0.0011234670091653243
CPU forward 1000000 took 0.014555767003912479
CPU forward 10000000 took 0.13409724000666756
CPU forward 100000000 took 1.246048310000333
CPU forward TOTAL time 1.3961777170043206
CPU for- & backward 1000 took 0.0003219560021534562
CPU for- & backward 10000 took 0.00037290599721018225
CPU for- & backward 100000 took 0.001975035003852099
CPU for- & backward 1000000 took 0.02621342398924753
CPU for- & backward 10000000 took 0.2944270490115741
CPU for- & backward 100000000 took 1.6856628700043075
CPU for- & backward TOTAL time 2.0091958299890393

GPU warmup 1000 took 0.0002462909906171262
GPU warmup 10000 took 9.991199476644397e-05
GPU warmup 100000 took 0.00034347400651313365
GPU warmup TOTAL time 0.0007382350013358518
GPU forward 1000 took 9.67290106927976e-05
GPU forward 10000 took 9.349700121674687e-05
GPU forward 100000 took 9.384499571751803e-05
GPU forward 1000000 took 0.0004975290066795424
GPU forward 10000000 took 0.0017606960027478635
GPU forward 100000000 took 0.003572814996005036
GPU forward TOTAL time 0.006185991995153017
GPU for- & backward 1000 took 0.00035818999458570033
GPU for- & backward 10000 took 0.0003240450023440644
GPU for- & backward 100000 took 0.0003223370003979653
GPU for- & backward 1000000 took 0.00036740700306836516
GPU for- & backward 10000000 took 0.0003690610028570518
GPU for- & backward 100000000 took 0.0003672500024549663
GPU for- & backward TOTAL time 0.002197896988946013
```

## 1.2 output
```
PyTorch version: 1.2.0
CPU Operator sanity check:
tensor(0.5926, grad_fn=<SoftMarginLossBackward>)
tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262,
        -0.0078, -0.0129])
double backward
tensor(-0.1540, grad_fn=<SumBackward0>)
ok

GPU Operator sanity check:
tensor(0.5601, device='cuda:0', grad_fn=<SoftMarginLossBackward>)
tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238,
        -0.0054, -0.0151], device='cuda:0')
double backward
tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>)
ok

CPU warmup 1000 took 8.422900282312185e-05
CPU warmup 10000 took 0.00036992700188420713
CPU warmup 100000 took 0.003682684007799253
CPU warmup TOTAL time 0.004169487991021015
CPU forward 1000 took 5.521099956240505e-05
CPU forward 10000 took 0.00036948200431652367
CPU forward 100000 took 0.003762389998883009
CPU forward 1000000 took 0.03725024699815549
CPU forward 10000000 took 0.3614480490068672
CPU forward 100000000 took 3.6139175269927364
CPU forward TOTAL time 4.016912263003178
CPU for- & backward 1000 took 0.0002734809968387708
CPU for- & backward 10000 took 0.0006605249946005642
CPU for- & backward 100000 took 0.005437346000690013
CPU for- & backward 1000000 took 0.051245586000732146
CPU for- & backward 10000000 took 0.5291594529990107
CPU for- & backward 100000000 took 5.23841712900321
CPU for- & backward TOTAL time 5.8253340990049765

GPU warmup 1000 took 0.0005757809994975105
GPU warmup 10000 took 0.0004058420017827302
GPU warmup 100000 took 0.0003764610009966418
GPU warmup TOTAL time 0.0013992580061312765
GPU forward 1000 took 0.0003543390048434958
GPU forward 10000 took 0.0003633670130511746
GPU forward 100000 took 0.0004807310033356771
GPU forward 1000000 took 0.0005875999922864139
GPU forward 10000000 took 0.0016903509967960417
GPU forward 100000000 took 0.014400018990272656
GPU forward TOTAL time 0.0179396449966589
GPU for- & backward 1000 took 0.0006167769897729158
GPU for- & backward 10000 took 0.0006845899915788323
GPU for- & backward 100000 took 0.000631830989732407
GPU for- & backward 1000000 took 0.0010741150035755709
GPU for- & backward 10000000 took 0.0017265130009036511
GPU for- & backward 100000000 took 0.014847910992102697
GPU for- & backward TOTAL time 0.01965981800458394
```

### Code used for performance test
```
import torch
import torch.nn.functional as F
import torch.nn as nn

from timeit import default_timer

torch.manual_seed(0)
cpu = torch.device('cpu')
gpu = torch.device('cuda')

loss_fn = F.soft_margin_loss

def run_benchmark(name, depth, require_grad, device, fn):
    total_start = default_timer()
    for i in range(3, 3 + depth):
        start = default_timer()
        n = 10 ** i
        a = torch.rand(n, requires_grad=require_grad, device=device)
        b = torch.rand(n, device=device)
        fn(a, b)
        end = default_timer()
        print('{} {} took {}'.format(name, n, end-start))
    total_end = default_timer()
    print('{} TOTAL time {}'.format(name, total_end-total_start))

def fwd_only(a, b):
    out = loss_fn(a, b)

def fwd_bck(a, b):
    out = loss_fn(a, b)
    out.backward()

def sanity_check(name, device):
    print('{} Operator sanity check:'.format(name))
    a = torch.rand(10, requires_grad=True, device=device)
    b = torch.rand(10, device=device)
    out = loss_fn(a,b)
    print(out)
    out.backward()
    print(a.grad)
    print('double backward')
    loss = loss_fn(a, b)
    loss2 = torch.autograd.grad(loss, a, create_graph=True)
    z = loss2[0].sum()
    print(z)
    z.backward()
    print('ok')
    print()

print('PyTorch version:', torch.__version__)
sanity_check('CPU', cpu)
sanity_check('GPU', gpu)
print()

run_benchmark('CPU warmup', 3, False, cpu, fwd_only)
run_benchmark('CPU forward', 6, False, cpu, fwd_only)
run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck)
print()

run_benchmark('GPU warmup', 3, False, gpu, fwd_only)
run_benchmark('GPU forward', 6, False, gpu, fwd_only)
run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27673

Differential Revision: D17889288

Pulled By: ezyang

fbshipit-source-id: 9ddffe4dbbfab6180847a8fec32443910f18f0a9
2019-10-15 08:44:57 -07:00
498ca083a6 adding IterableDataset to dataset.pyi (#27966)
Summary:
this shall fix https://github.com/pytorch/pytorch/issues/27820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27966

Differential Revision: D17929633

Pulled By: ezyang

fbshipit-source-id: ff3e0fb7f998b0771183288200c0859eb5f381dd
2019-10-15 08:41:59 -07:00
ba7919601f Save Docker image to workspace instead of pushing to ECR. (#26720)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26720

I'm trying to get rid of the need for CI jobs to have write access to ECR. Right now, they require write access because they push intermediate build results, which then get sucked down by downstream jobs. Instead of pushing back to ECR, we could just save them to CircleCI workspace. (There are some downsides to this approach: in particular, we save ALL layers to the workspace, not the new layers.) My original idea was to save to `~/built_image.tgz` and then load it.

Unfortunately, the Android tests have a substantially more complicated Docker structure which means my simple idea doesn't work. The current structure is that there are instantiations of `pytorch_linux_build` per configuration (e.g., `x86_32`, `x86_64`, ...). Then `gradle_build` collates these Docker images together and combines them to publish. To handle this case, the upstream jobs have to save Docker images to distinct filenames in the workspace for the load to work correctly. This is achieved by adding a new parameter to `pytorch_linux_build`, `saved_docker_filename`, which specifies where to put the image. Additionally, to pass this parameter to the jobs, I stopped using configuration generation for this case, as I couldn't figure out how to get the generator to conditionally add another line to the YAML for this case.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17843468

Pulled By: ezyang

fbshipit-source-id: c3f549e562c691b8f3f447705d4717c1fbb64040
2019-10-15 08:39:05 -07:00
817cb4182e Fix Sphinx warning about '_images' not existing (#27927)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27927

This fixes
`WARNING: html_static_path entry '_images' does not exist`
by removing '_images' from conf.py. As far as I can tell, '_images' in
`html_static_path` is only necessary if images already exist in the
`_images` folder; otherwise, sphinx is able to auto-generate _images
into the build directory and populate it correctly.

Test Plan: - build and view the docs locally.

Differential Revision: D17915109

Pulled By: zou3519

fbshipit-source-id: ebcc1f331475f52c0ceadd3e97c3a4a0d606e14b
2019-10-15 07:50:26 -07:00
e5d6b75319 Bag of documentation fixes; fix more sphinx warnings (#27850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850

Many of these are real problems in the documentation (i.e., link or
bullet point doesn't display correctly).

Test Plan: - built and viewed the documentation for each change locally.

Differential Revision: D17908123

Pulled By: zou3519

fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a
2019-10-15 07:31:14 -07:00
ad47788647 Add Polygamma to the docs (#27696)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/25347
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27696

Differential Revision: D17916790

Pulled By: ezyang

fbshipit-source-id: ac2635a300b1ef0ab437e3ffac152239754fe828
2019-10-15 07:00:57 -07:00
f10ea7a2e1 Add test for requires_process_group_agent decorator
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27879

Test Plan: Imported from OSS

Differential Revision: D17924096

Pulled By: mrshenli

fbshipit-source-id: 91aaad12daf985768dfb05fb9630cee21a81a366
2019-10-15 06:57:34 -07:00
19d83ab800 Updating submodules
Summary:
GitHub commits:

f40e2d0d42

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: a36a52442fb8a578d89e298bf1059ace42d7959a
2019-10-14 21:29:45 -07:00
8b87f9a510 Add fused layer norm impl on CUDA in PyTorch (#27634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27634

Add fused layer norm impl on CUDA in PyTorch

Performance benchmark compare to apex.FusedLayerNorm on a V100 machine.

**************************************
Shape = (128, 2097152)
  curr LayerNorm forward: 7.252584544941783ms
  apex LayerNorm forward: 10.366813436849043ms
  curr LayerNorm backward: 15.568048988003284ms
  apex LayerNorm backward: 20.869979876093566ms
**************************************
Shape = (256, 1048576)
  curr LayerNorm forward: 5.185673736967146ms
  apex LayerNorm forward: 6.3868385690730065ms
  curr LayerNorm backward: 13.942008479032665ms
  apex LayerNorm backward: 15.469660016940907ms
**************************************
Shape = (512, 524288)
  curr LayerNorm forward: 4.672068868065253ms
  apex LayerNorm forward: 4.717993081081659ms
  curr LayerNorm backward: 13.46354596503079ms
  apex LayerNorm backward: 14.04774487693794ms
**************************************
Shape = (1024, 262144)
  curr LayerNorm forward: 4.547273400006816ms
  apex LayerNorm forward: 5.378365494078025ms
  curr LayerNorm backward: 13.425063178874552ms
  apex LayerNorm backward: 14.235145597020164ms
**************************************
Shape = (2048, 131072)
  curr LayerNorm forward: 4.526399010093883ms
  apex LayerNorm forward: 4.775081946980208ms
  curr LayerNorm backward: 13.222738380078226ms
  apex LayerNorm backward: 13.59594238596037ms
**************************************
Shape = (4096, 65536)
  curr LayerNorm forward: 4.28789056581445ms
  apex LayerNorm forward: 4.48913648002781ms
  curr LayerNorm backward: 13.026655421825126ms
  apex LayerNorm backward: 13.57052089786157ms
**************************************
Shape = (8192, 32768)
  curr LayerNorm forward: 4.243518367875367ms
  apex LayerNorm forward: 4.34588153520599ms
  curr LayerNorm backward: 13.140627697808668ms
  apex LayerNorm backward: 13.49891544203274ms
**************************************
Shape = (16384, 16384)
  curr LayerNorm forward: 4.181216162163764ms
  apex LayerNorm forward: 4.268723972840235ms
  curr LayerNorm backward: 13.035593512002379ms
  apex LayerNorm backward: 13.463351831072941ms
**************************************
Shape = (32768, 8192)
  curr LayerNorm forward: 4.097899778978899ms
  apex LayerNorm forward: 4.109480210812762ms
  curr LayerNorm backward: 13.041268918896094ms
  apex LayerNorm backward: 13.586135944118723ms

Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm"

Reviewed By: houseroad

Differential Revision: D17462420

fbshipit-source-id: d4a67d160bb4eff73ffac64af46c56c3845cf211
2019-10-14 21:26:33 -07:00
30d9316f35 refactor tryMatchSchema (#26499) (#27773)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27773

We've changed how these functions are used over time, so I cleaned up
the header file API to match. In particular:

* tryMatchSchemas was added since the overload logic got copy/pasted
into three separate locations.
* With this change, tryMatchSchema is no longer public, as it is not needed
  outside of tryMatchSchemas
* emitBuiltinFunction no longer needs a requires argument (it was always true)

* Argument order for all the schema matching stuff now puts the 'self'
builtin override last. This is only rarely used and was inconsistent with
matchSchema

Test Plan: Imported from OSS

Differential Revision: D17885425

Pulled By: zdevito

fbshipit-source-id: 064bc9fa4bd57b2e5366fff9f3c6ab9b9945e08b
2019-10-14 20:45:25 -07:00
09464a7bf5 cleanup lint scripts a bit (#27805)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27805

The expressions syntax for actions is pretty cool! Using it to clean up
some of my convoluted checks from before

Test Plan: Imported from OSS

Differential Revision: D17909353

Pulled By: suo

fbshipit-source-id: 8b9a85476ba19452f48c532a2daed830f074088a
2019-10-14 20:19:48 -07:00
11172c19be codemod at::ArrayRef and torch::IntArrayRef to std::vector in C++ API tests (#27884)
Summary:
`at::ArrayRef` / `torch::IntArrayRef` should be discouraged in user code, because users might not be aware of the fact that it doesn't own the underlying data, which already leads to memory access bugs when they try to write the following:
```cpp
auto expected_sizes = torch::IntArrayRef({2, 16, 6});  // The memory that represents `{2, 16, 6}` is released after this line
ASSERT_EQ(output.sizes(), expected_sizes);  // `expected_sizes` is pointing to invalid memory region
```
This PR changes all usage of `at::ArrayRef` and `torch::IntArrayRef` to the corresponding `std::vector` version, so that users won't pick up the habit of using `ArrayRef` by looking at the test code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27884

Differential Revision: D17921646

Pulled By: yf225

fbshipit-source-id: 461e79fc22b598aac230d36cc028085ce6cbe937
2019-10-14 18:00:30 -07:00
a4a5b6fcaa Revert D17913708: [pytorch][PR] [JIT] throw on custom forward for module containers
Test Plan: revert-hammer

Differential Revision:
D17913708

Original commit changeset: 1cc2a8a4b573

fbshipit-source-id: 19ad68a1b0fd8e0f17e1b7ab92879106517e13d2
2019-10-14 17:48:31 -07:00
0af60a5c06 (#27299)
Summary:
Removing in-place operator for num_batches_tracked increment. The in-place
operator used here turns out to block many optimization opportunities due to
alias assumption for inputs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27299

Differential Revision: D17909341

Pulled By: ngimel

fbshipit-source-id: 7d635be94dfd2002af435acf6ea71995adaa40f6
2019-10-14 17:48:27 -07:00
937e3f1db4 Enable RRef tests for other RPCAgent
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27789

Differential Revision: D5444828

fbshipit-source-id: a2fa5a603e4b2970755bc5d16f6b2c84d65b0811
2019-10-14 17:42:23 -07:00
66f74783c3 Eliminate unnecessary Tensor refcount bumps.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27886

Differential Revision: D17915160

fbshipit-source-id: 3b6a6d89b71cb576f0bd6d330b884926d1ce659f
2019-10-14 16:31:02 -07:00
4b1096c652 Fix predict net issue with LRU hash eviction
Summary:
We are seeing error "[enforce fail at BlackBoxPredictor.cpp:134] ! !parameter_workspace->HasBlob(out). Net REMOTE of type predict_net writes to blob cat/NGRAM_QRT_VERSIONS_x_EVENT_TYPE_AUTO_FIRST_X/Pool_Option_0/Repeat_0/sparse_lookup/w which exists in the parameter workspace" in online testing for calibration models.
I'm suspecting it's due to the op CopyRowsToTensorOp are being used in prediction

Test Plan:
f143080108 offline predict net does not contain CopyRowsToTensorNet, which looks right.
Waiting for Olga to test online behavior
dper2 canary:
https://fburl.com/fblearner/sv3o3yj1

Differential Revision: D17741823

fbshipit-source-id: 19721b632b5ea9ebfa1ef9ae0e99d3a10c926287
2019-10-14 16:08:14 -07:00
aaedf1b38b break out test_recursive_script (#27819)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27819

The idea here is to preserve the fact that `test_jit.py` contains all the JIT tests. So we import `JitTestCase`s from `jit/` into `test_jit.py` so that the test loader will find and run them when you do `python test_jit.py`. This also means that things like `-k` will work as expected.

The individual test files in `jit/` will throw if run directly, to prevent cases where the CI accidentally runs multiple versions of the same test.

Differential Revision: D17898105

Test Plan: Imported from OSS

Pulled By: suo

fbshipit-source-id: 0cd6f8421c86c90a6e1bae33a3fdbe998f570e07
2019-10-14 16:00:35 -07:00
151483e25d move import_class_test files around (#26722)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26722

Put them in a directory under jit/ to prep for test splitting

Test Plan: Imported from OSS

Differential Revision: D17550582

Pulled By: suo

fbshipit-source-id: a592b671ffe808f02d0a597d441bd98a18c9109e
2019-10-14 16:00:31 -07:00
382917bbd1 report per iteration execution time (#27923)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923

As title

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"}
PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"}
...

Reviewed By: hl475

Differential Revision: D17911718

fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9
2019-10-14 15:44:42 -07:00
7929a4157a Fix TBB builds (#27937)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27937

Fix TBB buidls

Test Plan: ATEN_THREADING=TBB USE_TBB=1 python setup.py develop --cmake

Differential Revision: D17916565

Pulled By: ilia-cher

fbshipit-source-id: 292f07bcff63ae611299383d16527e8e24412102
2019-10-14 15:41:30 -07:00
104bb57c43 Run all docker images with --cap-add=SYS_PTRACE --security-opt seccomp=unconfined (#27787)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27787

This makes it possible to directly run gdb after 'docker exec'ing into a
Docker image run from CircleCI (useful if you're doing the rerun with
SSH workflow).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17889312

Pulled By: ezyang

fbshipit-source-id: 522a75be18be69ff6ad83d47185ae3068bf725d4
2019-10-14 14:02:28 -07:00
93030f68be Changing the hypothesis dev verbosity to 'normal'
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27781

Test Plan: Imported from OSS

Differential Revision: D17887043

Pulled By: zafartahirov

fbshipit-source-id: be22c417cef5a00b702e2e54e065ea0449208fc0
2019-10-14 13:44:34 -07:00
2cae3928b0 Multi-Label Soft Margin loss (#27669)
Summary:
In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `MultiLabelSoftMarginLoss` module and `multilabel_soft_margin_loss` functional.

It looks like there isn't a C++ ATen implementation of `multilabel_soft_margin_loss`, so I translated the python version, which does not rely on a C/C++ backend either.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27669

Differential Revision: D17907608

Pulled By: yf225

fbshipit-source-id: ccb02951e009973c2adbe604593ce929f10c39eb
2019-10-14 13:29:45 -07:00
0003771423 C++ API parity: Unfold (#27809)
Summary:
Adds `unfold` functional and module support for the C++ API.

Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27809

Differential Revision: D17901792

Pulled By: yf225

fbshipit-source-id: ff58a1866bf240f37ebc589463c60593b8931f51
2019-10-14 13:21:59 -07:00
fdea0cbe40 s/TestEndToEndHybridFrontendModels/TestModels/
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27877

Test Plan: Imported from OSS

Differential Revision: D17909137

Pulled By: jamesr66a

fbshipit-source-id: d8d730eed562b0f08caed7be302dd122af61e877
2019-10-14 13:13:30 -07:00
cd6b37afa7 throw on custom forward for module containers (#27763)
Summary:
Custom forwards of containers would silently not be compiled previously. Throw an error now instead.

Fix for https://github.com/pytorch/pytorch/issues/26671
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27763

Differential Revision: D17913708

Pulled By: eellison

fbshipit-source-id: 1cc2a8a4b57356ba7f007a95ede0a31e5d61aa82
2019-10-14 13:08:10 -07:00
169327f557 Add note that cuda quantization is not supported (#27829)
Summary:
People get confused with partial support otherwise: https://github.com/pytorch/pytorch/issues/27811 #27729

Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27829

Differential Revision: D17910931

Pulled By: dzhulgakov

fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c
2019-10-14 11:25:51 -07:00
4f6b567245 Remove sharding code from tests (#27818)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27818

This has been turned off since january. Might as well clean it up. I want to do a bit of refactoring in this area.
ghstack-source-id: 91827750

Test Plan: sandcastle

Differential Revision: D17898077

fbshipit-source-id: e70bf8ee72b4703767d4e38f8c346a7849a866f5
2019-10-14 11:04:44 -07:00
d23d62cb1e Fix unaries to export fp16 instead of fp32 when rest of the model export to int8
Summary: Currently accelerators does not have the concept for fp32, it only has understandings of fp16 and int8 in terms of data input. In order to fixe the issue here, we want to make sure unaries are turned into fp16 when we have the int8 exporter turned on.

Reviewed By: kennyhorror

Differential Revision: D17743791

fbshipit-source-id: 7322d23eb12ac3f813b525fc0ddd066f95c8ca85
2019-10-14 10:51:17 -07:00
b5e0fd4c56 add known worker ids to distributed autograd context (#26324)
Summary:
Per https://github.com/pytorch/pytorch/issues/25525 we want to clean up distributed autograd context on all nodes, in addition to the local one. To do this, we want to send async RPCs to the other nodes telling them to clean up the context.

The first step for this is for a node's context to know about the other workers. This PR does two things:

1) Adds the necessary data structures and getter functions to `DistAutogradContext`
2) Refactors calls to `addSendRpcBackward` to take in the `worker_id` as an additional argument
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26324

Differential Revision: D17769411

Pulled By: rohan-varma

fbshipit-source-id: b7327d1209a574e2e88cb197edff3103024d51ad
2019-10-14 10:43:09 -07:00
5321f4553f Remove GCC4 from CI (#26522)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26522

Our binaries are already built using GCC5, so there is no reason to test for GCC4 anymore.

This is an important prerequisite for switching to C++14, but even without the C++14 switch, this enables a gcc feature that I need for constexpr type ids.
ghstack-source-id: 91851144

Test Plan: unit tests

Differential Revision: D17494507

fbshipit-source-id: 7c0beb5e532ad9caa5cb02c1af26341c1017ff57
2019-10-14 09:51:50 -07:00
524d9003f3 Kill unused THNN operators.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26972

Test Plan: Imported from OSS

Differential Revision: D17628457

Pulled By: gchanan

fbshipit-source-id: 009e2847b8ab6724f066a6f5a95b3324eceb3f30
2019-10-14 09:38:03 -07:00
3714ca58d9 Kill more unused THCUNN operators. (#26971)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26971

I believe this is currently exhaustive of the unused operators in THCUNN:
LookupTable, SpatialSubSampling, Sqrt, Square, TemporalConvolution, TemporalMaxPooling.

Test Plan: Imported from OSS

Differential Revision: D17628380

Pulled By: gchanan

fbshipit-source-id: a3ebd24765d00073e60212f6f664ec4a6d8c1d1b
2019-10-14 09:37:59 -07:00
7583f87fa6 Kill a number of unused THCUNN operators. (#26970)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26970

I believe these were only being (self)-referenced by direct THCUNN bindings, which were killed in the https://github.com/pytorch/pytorch/pull/25358 stack.

This list is NOT exhaustive of what can be removed, and notably doesn't include THNN:
Abs, DistKLDivCriterion, FeatureLPPooling, IndexLinear, L1Cost, LookupTableBag, MarginCriterion, SpatialConvolutionLocal, SpatialCrossMapLRn.

Test Plan: Imported from OSS

Differential Revision: D17628216

Pulled By: gchanan

fbshipit-source-id: 0a0b17b446cf8ec9adef631f6f5c515182b560bb
2019-10-14 09:37:54 -07:00
a23edd6b9c Fix Type Errors in Examples about Named Tensor (#27828)
Summary:
`names` should be `tuple`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27828

Differential Revision: D17908112

Pulled By: zou3519

fbshipit-source-id: bd1454c5d6e6b690955f49380e34c4b0ddaf879b
2019-10-14 09:24:45 -07:00
82a69a690f Add documentation for torch.lgamma (#27812)
Summary:
Changelog:
- Add doc string in _torch_docs.py, _tensor_docs.py
- Expose in docs/source/torch.rst, docs/source/tensors.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27812

Test Plan:
- Remove `lgamma`, `lgamma_` from the blacklist

Fixes https://github.com/pytorch/pytorch/issues/27783

Differential Revision: D17907630

Pulled By: ezyang

fbshipit-source-id: 14e662a4e5262126889a437e5c4bfb21936730e8
2019-10-14 08:47:04 -07:00
cc5c34a0d0 Add nn::functional::normalize() to C++ Frontend (#27280)
Summary:
Addresses https://github.com/pytorch/pytorch/issues/27048

PR Summary:

Files Added:

_torch/csrc/api/include/torch/nn/options/normalization.h
torch/csrc/api/include/torch/nn/functional/normalization.h_

Files Modified:

_test/cpp/api/functional.cpp
torch/csrc/api/include/torch/nn/functional.h_

 ---

yf225 : I couldn't find a C++ equivalent of gradcheck(), is there such a function or is it sufficient to call .backward() in the test body? I don't think any solutions are checked for the Python tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27280

Differential Revision: D17902109

Pulled By: yf225

fbshipit-source-id: 1bce1a88103d0f1848633fec90fde95ea8f3d1ed
2019-10-14 08:39:02 -07:00
32c56747f7 Mention C++14 in the README (#26670)
Summary:
Technically, we don't need a C++14 compiler yet, but we will soon stop support for GCC 4. Requiring a "C++14" compiler excludes GCC 4, so it is a defensive statement. Some time later, we will actually require a C++14 compiler.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26670

Differential Revision: D17907257

Pulled By: smessmer

fbshipit-source-id: 5363d714f8d93597db008135f681b2e14d052fa0
2019-10-14 08:12:42 -07:00
0e8d4836e4 add feature name into module and update position weighted to match dper2
Test Plan:
The notebook showed no diff for id score list
https://our.intern.facebook.com/intern/anp/view/?id=154764

Reviewed By: alyssawangqq

Differential Revision: D17649974

fbshipit-source-id: 84cb4ae372fc215295c2d0b139d65f4eacafae4a
2019-10-14 08:06:19 -07:00
b7b73e43c0 Delete TEST_NAMEDTENSOR; run named tensor tests on all CIs (#27760)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27760

There's nothing special about the named tensor tests that requires that
they be run in their own CI job. In this PR we delete the
TEST_NAMEDTENSOR flag that hides named tensor tests from regular jobs.
In the future, we'll delete the named tensor CI job so that we do not
duplicate signals.

Test Plan: - wait for CI

Differential Revision: D17882262

Pulled By: zou3519

fbshipit-source-id: f90c71cb939e53b8ea23f7e2ab95a5c41b8be0e3
2019-10-14 08:01:41 -07:00
73521a0316 Roll more version numbers to 1.4.0 (#27751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27751

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17886488

Pulled By: ezyang

fbshipit-source-id: 1c8d98b6f7ee3127ebec9a0b03132c38c97523c3
2019-10-14 07:16:27 -07:00
4bcedb6670 Mark sampler and batch_sampler arguments as optional in the DataLoader interface (#27821)
Summary:
Changelog:

- DataLoader argument `sampler` is now of type `Optional[Sampler[int]]`instead of `Sampler[int]`
- DataLoader argument `batch_sampler` is now of type `Optional[Sampler[Sequence[int]]]` instead of `Sampler[Sequence[int]]`

Fixes https://github.com/pytorch/pytorch/issues/27737
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27821

Differential Revision: D17906623

Pulled By: ezyang

fbshipit-source-id: 088cacbb7e9f7988995f40b71adc3e719815f5ad
2019-10-14 06:57:27 -07:00
19df7e7e84 Fix typo
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27831

Differential Revision: D17904698

Pulled By: soumith

fbshipit-source-id: 3923dd36bc29f0f6e814d299afd8eef224035ccd
2019-10-14 01:32:23 -07:00
848d1ba13a Fix padding_idx in the new embedding cuda kernel. (#27731)
Summary:
The current embedding backwards CUDA kernel is somewhat broken. It effectively ignores padding_idx and also incorrectly drops an index from the input.

This commit fixes that bug and fixes the unit test so that this behavior won't break in the future.

This fixes https://github.com/pytorch/pytorch/issues/26302.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27731

Differential Revision: D17893803

Pulled By: ngimel

fbshipit-source-id: 4ba02a17ec0e29a7016d65480d4ff0c276550616
2019-10-13 21:18:49 -07:00
1c2cb6d523 Edits to ReadMe file (#27808)
Summary:
Grammar edits to the Readme file to make it read better in English
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27808

Differential Revision: D17901414

Pulled By: soumith

fbshipit-source-id: 02e67289dafaf9280cb1c3bb2f37087cd134cc23
2019-10-13 17:09:02 -07:00
07d4374239 C++ API: torch::nn::Softmax2d (#27509)
Summary:
Add torch::nn::Softmax2d module support for the C++ API.
Softmax2d only supports module in Python API, so this PR adds only module support as well.

This PR is WIP because it uses the function in https://github.com/pytorch/pytorch/issues/27446 .
After https://github.com/pytorch/pytorch/issues/27446 is merged, I will remove WIP.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27509

Differential Revision: D17899715

Pulled By: yf225

fbshipit-source-id: bd891bc995f5a92bf4f5405f8bf07d1bd5de2479
2019-10-13 11:00:56 -07:00
52528c041a - TripletMarginLoss (#27713)
Summary:
Hi yf225 , I had to create a new branch to tackle merge conflict since I am using cloud due to some limitations on my PC. Therefore, I don't have enough command there.

Also, I have incorporated the changes you have put before here
https://github.com/pytorch/pytorch/pull/27613

Also, it would be great if you could recommend me some resources to work smmothly on GCP..:-D

Thank you
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27713

Differential Revision: D17899695

Pulled By: yf225

fbshipit-source-id: eb6643223148774a5cbbd093bdcc5623872e5bba
2019-10-13 10:57:37 -07:00
23bffc4f14 Fix most documentation warnings (#27782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782

Warnings show up when running `make html` to build documentation. All of
the warnings are very reasonable and point to bugs in our docs. This PR
attempts to fix most of those warnings.

In the future we will add something to the CI that asserts that there
are no warnings in our docs.

Test Plan: - build and view changes locally

Differential Revision: D17887067

Pulled By: zou3519

fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166
2019-10-13 10:34:01 -07:00
446a79b959 C++ API parity: Threshold
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27538

Test Plan: Imported from OSS

Differential Revision: D17835415

Pulled By: pbelevich

fbshipit-source-id: 2a887704655be79ee458081c46a7eea31eca51dc
2019-10-13 09:38:31 -07:00
cbdd55c669 C++ API parity: Tanhshrink
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27537

Test Plan: Imported from OSS

Differential Revision: D17835409

Pulled By: pbelevich

fbshipit-source-id: ad4120cfe01ea2508bf3ce1054022a2da649ac74
2019-10-13 08:12:13 -07:00
2750ea25b2 C++ API parity: Tanh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27536

Test Plan: Imported from OSS

Differential Revision: D17835411

Pulled By: pbelevich

fbshipit-source-id: c8984aec2f4bae48ff901fafc8c53a4122192ac5
2019-10-13 06:34:18 -07:00
27027a4804 Fix torch::nn layers to always subclass from torch::nn::Cloneable (#27770)
Summary:
The impl class of `torch::nn` layers must always subclass from `torch::nn::Cloneable`, otherwise `module->clone()` doesn't work on them. This PR fixes layers that don't conform to this rule.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27770

Differential Revision: D17893051

Pulled By: yf225

fbshipit-source-id: 37cdf8c09e22f0f164cbd0e8700965a1778ec4c1
2019-10-12 16:23:46 -07:00
aa73701f03 Disable pytorch_short_perf_test_gpu CI job (#27797)
Summary:
The `pytorch_short_perf_test_gpu` CI job hasn't been giving useful signal compared to https://apaszke.github.io/pytorch-perf-hud/ or the FAI-PEP effort. This PR disables it to reduce maintenance workload for CI admins.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27797

Differential Revision: D17897180

Pulled By: yf225

fbshipit-source-id: 91a66ebac3d15a44094a669da38c43e3ea9c20d2
2019-10-12 16:19:43 -07:00
f6bda1e07b Removes @default_floating_dtype decorator (#27628)
Summary:
One fewer legacy decorator cluttering the test suite.

Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default.

Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628

Differential Revision: D17896254

Pulled By: mruberry

fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f
2019-10-12 12:39:34 -07:00
341262754f module dedupe (#26666)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26666

Changes:
- Introduce a `ConcreteModuleType` concept. This acts both as the key into the type
  cache, and as the source of truth for `ModuleValue::attr` queries. It needs
  to do both jobs because that's how we ensure correctness (if the types are
  different, it's because `ModuleValue::attr` would return different things).
- Now `recursive_script` will first construct a `ConcreteModuleType` and search for a
  pre-existing type before starting compilation.
- All previous paths to creating a `ScriptModule` (including inheriting from
  `ScriptModule`) are now rewritten to go through `create_script_module`, so
  that we have only a single place where construction happens.

Behavioral changes:
- Big change to `torch.jit.ScriptModule` inheritance: all attributes are now
  recursively scripted if possible, matching recursive scripting semantics.
  This makes it hard to keep something from being scripted (for example, a
  Python submodule). Possibly we'll need an `ignore()` type thing for
  attributes. In particular, this adds `self.training` to *every* ScriptModule, since
  it's present on every `nn.Module`.
- I believe this change to be transparent to existing users of the inheritance API, since if you had an attribute that is unscriptable that you never used, there is no error. In some cases, we will create new attributes (even if they are unused), which will increase serialized model size from before.

Test Plan: Imported from OSS

Differential Revision: D17551196

Pulled By: suo

fbshipit-source-id: b476d1c9feb3ddfd63406d90989aaf9dfe890591
2019-10-12 09:51:57 -07:00
ffa422a8b3 kill _parameter_list (#27399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27399

This was devised in a time when we didn't have module attributes. They
are essentially just tensor lists, so represent them that way. This has
the additional benefit of making the RNN forward pass faster because we
effectively cache the flattened weights.

The only complication part is that someone may come along and do:
```
my_rnn_mod.w_ih_l0 = torch.nn.Parameter(...)
```

This means we need to override setattr to keep the flattened weights
cache up to date.

Test Plan: Imported from OSS

Differential Revision: D17785658

Pulled By: suo

fbshipit-source-id: 7789cd1d0d4922bfd5eba1716976442fbf150766
2019-10-12 09:51:53 -07:00
759c99c2e3 [jit Python None should have its type inferred as NoneType (#26665)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26665

This is actually useful. For example: in batchnorm.py, all the tracked
stats are either `nn.Parameter` or `None`. We should register them as
params if they are set, or attributes with type NoneType if they are
not.

Test Plan: Imported from OSS

Reviewed By: shannonzhu

Differential Revision: D17551197

Pulled By: suo

fbshipit-source-id: 8d6f6d76d4dab0d524c4ffdfe0c1dd465771cd00
2019-10-12 09:51:49 -07:00
3bccd3fc0d Distributed Autograd - FAST mode backward pass implementation. (#27022)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27022

This change implements the "FAST" mode distributed autograd backward
pass as described in https://github.com/pytorch/pytorch/issues/23110.

At a high level the backward pass works as follows:
1. We start by computing dependencies on the node that calls
`torch.distributed.backward`.
2. This node computes the dependencies starting from the root nodes provided in
the backward call and all the 'send' functions present in the current autograd
context. The "FAST" mode assumes all 'send' functions are part of the autograd
computation.
3. Once the dependency computation is done, the distributed autograd engine
calls the local autograd engine to execute the autograd graph. Note that the
autograd graph on a single node is not necessarily connected because of
inter-node communication. As a result, we have special handling to ensure the
local autograd engine ensures we execute the entire graph starting from the
provided roots and all 'send' functions on the node.
4. When the local autograd engine hits a 'recv' function, it performs an async
RPC to send the gradients over to the appropriate node and stores a future in
the autograd context to keep track of this RPC.
5. On the destination node, the appropriate 'send' function is looked up and
enqueued on the local autograd engine. If this is the first time the node is
hearing about this autograd context id on the backward pass, then the node
computes dependencies for the local autograd engine.
6. As part of compute dependencies, the distributed autograd engine discovers
all leaf nodes and ensures those are passed as 'outputs' to the local autograd
engine. This avoids running the 'AccumulateGrad' function.
7. The gradients computed for the leaf nodes are then actually accumulated in
`DistAutogradContext` for the appropriate autograd context id.
8. The distributed autograd engine waits for the local autograd engine
to complete and also waits for all the 'Futures' (stored in 4.) for respective
RPCs to finish.

We have made the following changes to the local autograd engine for this
purpose:

1. Expose GraphTask and NodeTask so that the distributed autograd engine can
use them.
2. Expose a `execute_with_graph_task` API which gives the distributed engine
to build a GraphTask and pass it to the local autograd engine.
3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build
a `NodeTask` for a 'send' function and enqueue it on the local autograd engine.

In addition to this a few general improvements:
1. Added a `PropagateGradients` RPC call for the 'recv' function to pass
gradients to the appropriate node during the backward pass.
2. Use IValues as much as possible in serialization for RpcWithAutograd.
3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate
exception instead of just returning the message. This is inline with what most
Future.wait() APIs do.
4. Added a `get_gradients(context_id)` API which allows users to retrieve a map
from Tensor to respective gradient for the provided context_id on the local
node.
ghstack-source-id: 91794926

Test Plan: unit tests.

Differential Revision: D17652615

fbshipit-source-id: 96f65c52adb2706ee29f4b49e1655afaa0a3bec3
2019-10-12 09:47:49 -07:00
96aafc3cdc C++ API parity: Softsign
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27535

Test Plan: Imported from OSS

Differential Revision: D17835408

Pulled By: pbelevich

fbshipit-source-id: 8548deab91f6fe0f7285fdd919c25129ed042181
2019-10-12 08:30:10 -07:00
fcb6dd079e C++ API parity: Softshrink
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27534

Test Plan: Imported from OSS

Differential Revision: D17835404

Pulled By: pbelevich

fbshipit-source-id: 7b9f3d3ea793f82840496912f248b0c48bb7463e
2019-10-12 06:36:20 -07:00
c3c0dcf6e3 Upgrade MKL-DNN to v0.21.1 (#27597)
Summary:
1. Upgrade MKL-DNN to v0.21.1
2. Fix runtime error on legacy hardware with gcc8
3. Remove workaround for issue https://github.com/pytorch/pytorch/issues/21597
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27597

Differential Revision: D17891492

Pulled By: bddppq

fbshipit-source-id: ab390a655f7ab7fb7144e2c333f25af85a0f5183
2019-10-12 02:40:43 -07:00
039acbea90 Revert D17757197: Add CI builds
Test Plan: revert-hammer

Differential Revision:
D17757197

Original commit changeset: e0522e159387

fbshipit-source-id: 10c20ff703676635afcb17ea36b0b48cd3688b7c
2019-10-11 23:15:51 -07:00
abaa44122d C++ API: torch::nn::Softmin (#27459)
Summary:
Add torch::nn::Softmin module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27459

Differential Revision: D17892852

Pulled By: yf225

fbshipit-source-id: db15b06e8ad33947e7d65995df700f5e90c3b6a8
2019-10-11 23:03:55 -07:00
86fb63f4a0 add testing code to iOS nightly jobs (#27784)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27784

## Summary

Since the nightly jobs are running without any testing code, we don't really have a way to verify the binary before uploading it to AWS. To make the work more solid, I came up with an approach to test our builds.

## How it works

The XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. So the approach is link our binaries to a testing app and run `xcodebuild` to see if there is any linking error. The PRs below have already done some of the preparation jobs

- [#26261](https://github.com/pytorch/pytorch/pull/26261) adds a dummy testing app
- [#26632](https://github.com/pytorch/pytorch/pull/26632) adds a ruby script that does all the XCode configuration.

The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a  [tutorial](https://circleci.com/docs/2.0/ios-codesigning/)  but is too complicated to implement. Anyway, I figured out an easier way to do it

1. Disable automatically code sign in XCode (done #27591 )
2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done)
3. Install the developer certificate to the key chain store on CI machines via Fastlane. (done #27593 )
4. Add the testing code to PR jobs and verify the result. (done #27594 )
5. Add the testing code to nightly jobs and verify the result.

## Test Plan

- Both PR jobs and nightly jobs can finish successfully.
- `xcodebuild` can finish successfully

Test Plan: Imported from OSS

Differential Revision: D17893271

Pulled By: xta0

fbshipit-source-id: cb7679224e062a4884615f625a2933cad8bd4c11
2019-10-11 21:49:30 -07:00
907ce80321 Update onnx landing page for 1.3 (#27581)
Summary:
* Update supported operator list.
* Update FAQ on implicit scalar casting. Traced models are now more robust.

cc spandantiwari lara-hdr neginraoof Please feel free to add any missing points. Thank you!

cc houseroad for review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27581

Reviewed By: hl475

Differential Revision: D17882147

Pulled By: houseroad

fbshipit-source-id: c1d745ca647fce2daf897bbb6d1ff8c283f18839
2019-10-11 20:53:50 -07:00
130127ca59 Rename BACKEND to be RPC_BACKEND to be seperated from COMMUNICATION_BACKEND like gloo,nccl, in rpc_test.py (#27792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27792

Close https://github.com/pytorch/pytorch/issues/27232
ghstack-source-id: 91807741

Differential Revision: D5474297

fbshipit-source-id: 5b230a6857813ec981e5056880abb5859655daa2
2019-10-11 19:49:46 -07:00
ccd460d415 use gloo enum instead of hardcoding stirng (#27652)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27652

Changes "gloo" to dist.backend.GLOO in rpc_test.py.
ghstack-source-id: 91764460

Test Plan: python test/test_rpc_fork.py && python test/test_rpc_spawn.py

Differential Revision: D17845067

fbshipit-source-id: b220d3672d1e0b237da474276663d157230a4fdb
2019-10-11 19:06:23 -07:00
5b88dd6a29 fix checkout for clang-tidy (#27796)
Summary:
whoops, this got left in by accident
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27796

Differential Revision: D17892482

Pulled By: suo

fbshipit-source-id: f92255d78fe70d3c22c4422b6333ac288cb330d6
2019-10-11 18:43:25 -07:00
e8c23c9f85 Add various flags for fakefp16 conversion
Summary: ATT

Test Plan: manually tested

Reviewed By: hyuen

Differential Revision: D17849416

fbshipit-source-id: 85ae8fb9c31a0f0139a3c61d5a164b342851d847
2019-10-11 18:06:18 -07:00
6e3a53e774 Sanitize module names on legacy import
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27764

Test Plan: Imported from OSS

Differential Revision: D17882924

Pulled By: jamesr66a

fbshipit-source-id: 89809798d29b971ffb7898188a94667c08641801
2019-10-11 17:43:06 -07:00
2a23654880 Switch to official releases of katex and update doc for installing katex. (#27758)
Summary:
katex is a deprecated package in Ubuntu and has been removed in recent
releases of Debian. Use npm instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27758

Differential Revision: D17891039

Pulled By: ezyang

fbshipit-source-id: 53de6e14b2638298e5b61996dcd7ba8de02420a3
2019-10-11 17:20:06 -07:00
fab48eb200 Makes some CPU-only tests in test_torch generic (#27688)
Summary:
Per title. Also testing putting test_advancedindex back on the default stream.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27688

Differential Revision: D17888351

Pulled By: mruberry

fbshipit-source-id: af8adeca89f575fc276921b39049b07135ed9776
2019-10-11 17:13:41 -07:00
57d608d1f9 Suppress info messages in qnnpack (#27774)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27774

Printing messages with warning and above severity only

Test Plan:
python test/test_quantized.py TestQNNPackOps

Imported from OSS

Differential Revision: D17886364

fbshipit-source-id: 62a1009f63b049f78b5e13990f758f0fdb0cbc4d
2019-10-11 17:10:01 -07:00
ba20ad999c port the rest of the linters over to github actions
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27768

Test Plan: Imported from OSS

Differential Revision: D17888973

Pulled By: suo

fbshipit-source-id: 635bef7854084404d08673d99b1bae502e0dc833
2019-10-11 17:01:59 -07:00
57d4f8e3d7 kill azure pipelines flake8 (#27767)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27767

Note that this kills flake8 for py2.7. I think it's reasonable given the
impending removal of py2 support entirely, but someone sanity check me
on this

Test Plan: Imported from OSS

Differential Revision: D17888975

Pulled By: suo

fbshipit-source-id: 87559f9e18d39e035e0c781c67025b194a593bc6
2019-10-11 17:01:54 -07:00
640b486339 add clang-tidy to github actions (#27755)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27755

This gives us nice annotations. See
https://github.com/suo/pytorch/pull/22/files for an approximation of
what it will look like (ignore the warnings on the lint.yml file).

I deleted the old azure pipelines one since making the code work for
both was annoying, and unlike flake8 this one does not affect master

Test Plan: Imported from OSS

Differential Revision: D17888974

Pulled By: suo

fbshipit-source-id: d8928a1451b6ef500dc1889284cab2845ecdeeea
2019-10-11 17:01:50 -07:00
3d2c90131a opset 11 updates (#27578)
Summary:
Opset 11 updates:
- Enabled ORT tests for updated ops in opset 11
- Updated index_copy and index_fill symbolic for opset 11 to modify onnx::Scatter -> onnx::ScatterElemets
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27578

Reviewed By: hl475

Differential Revision: D17852462

Pulled By: houseroad

fbshipit-source-id: c88747804054d0f3455f2c58fd1d8725e0b2f803
2019-10-11 16:18:40 -07:00
4da68227e9 Clarify that when the divisor in div is zero and the dividend is integral, the behavior is undefined. (#25968)
Summary:
Currently when an integral tensor is divided by zero, it emits a
"floating point exception" (which can be different from system to
system). Clarify in the document that nothing would be guaranteed under
this circumstance.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25968

Differential Revision: D17888097

Pulled By: ezyang

fbshipit-source-id: 7c3ce3ac4080479d637cc2710b6aa3ae7e42431d
2019-10-11 15:37:09 -07:00
a710a8b758 Makes CUDA tests in test_autograd generic (#27709)
Summary:
Per title.

test_autograd.py no longer needs to import common_cuda as a result of this change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27709

Differential Revision: D17881298

Pulled By: mruberry

fbshipit-source-id: 8b0351b65a49a072ce5ed7e7099b712847983877
2019-10-11 14:43:00 -07:00
6eef469074 Enable mgpu unit tests for rocm
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27518

Differential Revision: D17880153

Pulled By: bddppq

fbshipit-source-id: 5b6210104ec66747558a08f97dda1e7796f681df
2019-10-11 14:35:36 -07:00
eb5222397e Better hashing for constant pool (#27733)
Summary:
Some models many contain thousands constants (like list of ints) and Constant Pooling and CSE pass will move the constant around and update the constant pooling.

However our existing hash function only consider the node type + input type + output node (https://bddppq.github.io/codebrowser/pytorch/pytorch/torch/csrc/jit/node_hashing.cpp.html#_ZNK5torch3jit8HashNodeclEPKNS0_4NodeE), which will have a lot of conflicts... I have profiled, one insert may take as long as about 0.2 second...  And loading the model will take 200 second, which is insane.

So we should fix this performance issue by considering the constant value as well to avoid the conflict.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27733

Reviewed By: bddppq

Differential Revision: D17873733

Pulled By: houseroad

fbshipit-source-id: 2338d7bf67174a8e56caa19a30401199f68b592a
2019-10-11 14:30:13 -07:00
a22e8f90cd Add CI builds (#27357)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27357

Add extra CI builds for TBB and native builds

Test Plan: check CI

Differential Revision: D17757197

Pulled By: ilia-cher

fbshipit-source-id: e0522e15938710fbf6404478725620282d1287ec
2019-10-11 14:18:25 -07:00
977445b635 Disable TSAN test for LiteInterpreterConv (#27748)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27748

There's TSAN test failure. From stack it's likely related to mkldnn (https://github.com/pytorch/pytorch/issues/27497). Before the issue is resolved, disable TSAN test.
ghstack-source-id: 91761706

Test Plan: buck test mode/dev-tsan caffe2/test/cpp/jit:jit -- 'JitTest\.LiteInterpreterConv' --run-disabled

Reviewed By: bddppq

Differential Revision: D17880082

fbshipit-source-id: 251d9b9577838146231c8e122f755936edd1c281
2019-10-11 14:05:33 -07:00
7135f7c263 Revert D17412856: [JIT] add type refinements for isinstance checks
Test Plan: revert-hammer

Differential Revision:
D17412856

Original commit changeset: ded47eb086c4

fbshipit-source-id: 854a6c8f322435c3f3416dbedcb642cb2d2902b1
2019-10-11 13:02:30 -07:00
f35d7d4614 Pr v130 doc changes oct10 take2 (#27721)
Summary:
resolves issues:
https://github.com/pytorch/pytorch/issues/27703

Updates to index for v1.3.0
* add javasphinx to the required sphinx plugins
* Update "Package Reference" to "Python API"
* Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
* Add "Other Languages" section, add in C++ docs, add in Javadocs
* Add link to XLA docs under Notes: http://pytorch.org/xla/

this includes changes to:
docs/source/conf.py
docs/source/index.rst
docs/source/nn.rst
docs/requirements.txt
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27721

Differential Revision: D17881973

Pulled By: jlin27

fbshipit-source-id: ccc1e9e4da17837ad99d25df997772613f76aea8
2019-10-11 11:49:14 -07:00
275dfa3485 Initial commit for L0 norm approx (#27756)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27756

Implement approximate L0 norm for use in the dense feature regularizer that will be used for feature importance. The formula is as follows:
{F212246801}

Reviewed By: wx1988

Differential Revision: D17432708

fbshipit-source-id: 57d6c9c3dd1b4e210b9f10264075c57dbc9c8cb6
2019-10-11 11:24:34 -07:00
c5ec0a7ede Don't run dist_autograd_fork on Python 2 (#27612)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27612

The file imports from torch.distributed.rpc, which won't be
initialized when running on Python 2.

Test Plan: Imported from OSS

Differential Revision: D17855033

Pulled By: pietern

fbshipit-source-id: 6e6b0ca248d0512dac5a44e10e153c710cefe02c
2019-10-11 11:18:46 -07:00
f36345eb0b improve error message on incorrect inputs into gather for (#27439)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27439

When users call dist.gather, they have to pass in a `gather_list` to
the function on the destination worker, and this list needs to have the same
size as the number of processes in the group. When the user initializes this
list incorrectly, the current error message is not very helpful:

This changes the error message so that the incorrect gather_list size is
pointed out and the correct one is given.
ghstack-source-id: 91413442

Test Plan: Added a unit test and tested with an incorrect gather_list size.

Differential Revision: D17781370

fbshipit-source-id: b49aad1b1197daf77daa10911296664e6340e2fa
2019-10-11 11:00:42 -07:00
726bbfffb9 Add possibility for miniz to use an external crc definition. (#27558)
Summary:
We add an #ifdef check for USE_EXTERNAL_MZCRC, in which case miniz
will look for an external mz_crc32 definition. The default behavior
is unchanged.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27558

Test Plan: Unchanged default behavior, but buck test caffe2/test/...

Differential Revision: D17814440

Pulled By: jjlilley

fbshipit-source-id: e4ecbe37ee2f9eec176093372f21b3b8e52a5f81
2019-10-11 10:16:01 -07:00
15f9fe1d92 Add missing Optional annotation.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27564

Differential Revision: D17816121

Pulled By: ailzhang

fbshipit-source-id: 5a4ac12ed81bf5d900ec3e7ab616082cb98d832d
2019-10-11 09:04:29 -07:00
c79d3a4a98 C++ API parity: Softplus
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27489

Test Plan: Imported from OSS

Differential Revision: D17835410

Pulled By: pbelevich

fbshipit-source-id: 51a8c4ab2ff4b860c96eda1ed8f073017b8cf9ae
2019-10-11 09:00:32 -07:00
9d448099fd C++ API parity: Sigmoid
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27488

Test Plan: Imported from OSS

Differential Revision: D17835405

Pulled By: pbelevich

fbshipit-source-id: 78e13047a2a1f2776c59e778db7ba120716e93d3
2019-10-11 07:45:31 -07:00
795c913636 C++ API parity: CELU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27487

Test Plan: Imported from OSS

Differential Revision: D17835406

Pulled By: pbelevich

fbshipit-source-id: a8282ae65d8996efcc8b8d846cfa637c3f89eda6
2019-10-11 06:23:57 -07:00
cddc147267 Back out "Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup" (#27728)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27728

Original commit changeset: 15ad64e49f92

Test Plan: same as previous one.

Reviewed By: dreamingleo

Differential Revision: D17872553

fbshipit-source-id: fd9d180d5e02e2c17285898c79cdd9509ffb8bbf
2019-10-10 23:52:43 -07:00
6294a9a877 C++ API parity: RReLU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27437

Test Plan: Imported from OSS

Differential Revision: D17835413

Pulled By: pbelevich

fbshipit-source-id: 5d943fdac4fd2633e7f7ca13db1a7fed5636ca50
2019-10-10 19:14:48 -07:00
07fc7d05ce Revert D17488297: [jit] refactor tryMatchSchema
Test Plan: revert-hammer

Differential Revision:
D17488297

Original commit changeset: a32d838ce355

fbshipit-source-id: 2bd319d9554d81d09231bf1e34c8417bff468940
2019-10-10 17:39:48 -07:00
6385a39eec add testing code to PR jobs (#27594)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27594

## Summary

Since the nightly jobs are lack of  testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary.

Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs

- [#26261](https://github.com/pytorch/pytorch/pull/26261)
- [#26632](https://github.com/pytorch/pytorch/pull/26632)

The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a  [tutorial](https://circleci.com/docs/2.0/ios-codesigning/)  but is too complicated to implement. Anyway, I figured out an easier way to do it

1. Disable automatically code sign in XCode
2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done)
3. Install the developer certificate to the key chain store on CI machines via Fastlane.
4. Add the testing code to PR jobs and verify the result.
5. Add the testing code to nightly jobs and verify the result.

## Test Plan

- Both PR jobs and nightly jobs can finish successfully.
- `xcodebuild` can finish successfully

Test Plan: Imported from OSS

Differential Revision: D17850703

Pulled By: xta0

fbshipit-source-id: ab220061c6e2ec75cae23684ad999c4f9c276820
2019-10-10 17:36:12 -07:00
352092ca95 C++ API parity: ReLU6
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27436

Test Plan: Imported from OSS

Differential Revision: D17835414

Pulled By: pbelevich

fbshipit-source-id: 77e743d2f6b71fb3ba5643f9d676f2bb8f236cfa
2019-10-10 17:12:17 -07:00
5d495a11cb add unused and is_scripting to docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27630

Differential Revision: D17868856

Pulled By: eellison

fbshipit-source-id: 7cf183d5c0d5436fbaa549a02e6b8fd47fa15b67
2019-10-10 17:02:17 -07:00
2488c29129 Revert D17846079: [TSAN unittest] Disable TSAN test in LiteInterpreterConv
Test Plan: revert-hammer

Differential Revision:
D17846079

Original commit changeset: 669d63856902

fbshipit-source-id: 996d64f12efab52d571fc81a7c602d7f18da7255
2019-10-10 16:29:16 -07:00
6711969dd8 C++ API: torch::nn::LogSoftmax (#27462)
Summary:
Add torch::nn::LogSoftmax module and functional support for the C++ API.

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27462

Differential Revision: D17867121

Pulled By: yf225

fbshipit-source-id: dae8ac981c1c6ccdef013cd2d886ad4a043f6243
2019-10-10 16:18:15 -07:00
b3cb072de7 Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup
Test Plan: revert-hammer

Differential Revision:
D17826873

Original commit changeset: 23c4a96d9252

fbshipit-source-id: 15ad64e49f922a859abc574b261ac0f857682ff4
2019-10-10 16:16:06 -07:00
d8df8aa842 Remove deprecated script_rref_proto
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27697

Test Plan: Imported from OSS

Differential Revision: D17855448

Pulled By: mrshenli

fbshipit-source-id: b3d39e79dfc1f8745ac9617ca618df3ea38b1b86
2019-10-10 16:05:46 -07:00
f7d7c4b72f Fix a bug of C++ L-BFGS optimizer (#27606)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model:
```
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245)
```

Add some `if (!parameter.grad().defined()) {...}` in the ` torch/csrc/api/src/optim/lbfgs.cpp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27606

Differential Revision: D17866550

Pulled By: yf225

fbshipit-source-id: bcaf0bf75b93c57304856b03d8984c1617ebbfef
2019-10-10 15:38:05 -07:00
415b17e81c Fix for flaky caffe2 dataio test (test_time_limit_reader_with_short_limit) (#27592)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27592

The caffe2 data reader test `test_time_limit_reader_with_short_limit` is flaky as-written because it places an upper bound on how much can be read, but under stress it is possible for fewer records to be read. The fix is to make the assertion check a fuzzy/range check rather than exact equality, since there's not a straightforward way to precisely test a timer-based feature.
ghstack-source-id: 91543898

Test Plan:
`buck test mode/dev-tsan //caffe2/caffe2/python:dataio_test-2.7 -- --stress-runs 20` -> P117156924 (with fix, 100% pass)

P117158750 - without fix, lots of failures in this test

Reviewed By: boryiingsu

Differential Revision: D17816775

fbshipit-source-id: 2ab0d3304fbd9c9806d37a4fe2912c840616db61
2019-10-10 13:53:58 -07:00
8515650c2b C++ API parity: ReLU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27435

Test Plan: Imported from OSS

Differential Revision: D17835407

Pulled By: pbelevich

fbshipit-source-id: b8ee86c7a76674bc88d8e995424dad22d3caab59
2019-10-10 13:34:38 -07:00
ce6287f675 Adding support to offsets based Fused8BitRowwiseEmbeddingLookup (#27635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27635

PyTorch uses `offsets` instead of `lengths` for embedding table lookup. Adding support to that for fused quantized version.

AVX2 version is generated with
```
python caffe2/caffe2/perfkernels/hp_emblookup_codegen.py --fused --use-offsets
```

Test Plan:
```
buck test caffe2/torch/fb/sparsenn:test
```

Reviewed By: jianyuh

Differential Revision: D17826873

fbshipit-source-id: 23c4a96d92521deaebc02b688ad735d76a4476df
2019-10-10 10:50:44 -07:00
e8087a3060 Change C++ API test files to only include torch/torch.h (#27067)
Summary:
One of the purposes of the C++ API tests in `test/cpp/api/` should be to check that including `torch/torch.h` is a sufficient prerequisite for using all C++ frontend features. This PR change ensures that.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27067

Differential Revision: D17856815

Pulled By: yf225

fbshipit-source-id: 49c057bd807b003e4a00f6ba73131d763a0f277a
2019-10-10 09:46:29 -07:00
9bc8fb8dfd Revert D17850696: [pytorch][PR] Updates to quantization related files, index.rst, and javadocs
Test Plan: revert-hammer

Differential Revision:
D17850696

Original commit changeset: 3de146f06522

fbshipit-source-id: 565fef87fcf6021362ec3e540be78641d47ef9a7
2019-10-10 09:23:33 -07:00
829a5c8584 Disable TSAN test in LiteInterpreterConv
Summary: There's TSAN test failure. From stack it's likely related to mkldnn (https://github.com/pytorch/pytorch/issues/27497). Before the issue is resolved, disable TSAN test.

Test Plan: buck test mode/dev-tsan caffe2/test/cpp/jit:jit -- 'JitTest\.LiteInterpreterConv' --run-disabled

Reviewed By: bddppq

Differential Revision: D17846079

fbshipit-source-id: 669d6385690223d83996fb14051c39df0c521dfa
2019-10-10 08:50:59 -07:00
38a3eabd3e remove cuda from add_test
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K64_cpu
# Input: M: 64, N: 64, K: 64, device: cpu
Forward Execution Time (us) : 29691.940

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_cpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 60820.813

Reviewed By: hl475

Differential Revision: D17855731

fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66
2019-10-10 08:32:04 -07:00
9d925c1d6f Revert D17851047: [pytorch][PR] Add javasphinx extension
Test Plan: revert-hammer

Differential Revision:
D17851047

Original commit changeset: 8ed7e3c44f20

fbshipit-source-id: 9021436a7c84f7582c3d4d3e29fb5f7b0887e88c
2019-10-10 07:36:42 -07:00
d931c8bf75 substantially restructure all quantized docs to group logically (#27677)
Summary:
Make everything clickable
Organize APIs logically in subsections
Fix many typos
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27677

Differential Revision: D17850650

Pulled By: dzhulgakov

fbshipit-source-id: 060f6ed988d1c4beecba6bc8daf55626961fac98
2019-10-10 00:50:02 -07:00
91959aa3d3 Add javasphinx extension
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27681

Differential Revision: D17851047

Pulled By: brianjo

fbshipit-source-id: 8ed7e3c44f2055d2b8577686aff1d13548f45688
2019-10-09 23:20:33 -07:00
f3df6b8ede Add C++ torch::nn::functional::affine_grid (#27263)
Summary:
Adds`torch::nn::functional::affine_grid` functional support for the C++ API.

Issue: https://github.com/pytorch/pytorch/issues/25883, https://github.com/pytorch/pytorch/issues/27196

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27263

Differential Revision: D17802350

Pulled By: yf225

fbshipit-source-id: e823ee53da4a4cc6a1650d2dfc09b0ef6a74e249
2019-10-09 23:17:49 -07:00
1118ea5866 Updates to quantization related files, index.rst, and javadocs (#27676)
Summary:
- Update torch.rst to remove certain autofunction calls
- Add reference to Quantization Functions section in nn.rst
- Update javadocs for v1.3.0
- Update index.rst:
  - Update "Package Reference" to "Python API"
  - Add in torchaudio and torchtext reference links so they show up across all docs not just the main page
  - Add "Other Languages" section, add in C++ docs, add in Javadocs
  - Add link to XLA docs under Notes: http://pytorch.org/xla/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27676

Differential Revision: D17850696

Pulled By: brianjo

fbshipit-source-id: 3de146f065222d1acd9a33aae3b543927a63532a
2019-10-09 22:52:19 -07:00
51656eefb0 refactor tryMatchSchema (#26499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26499

We've changed how these functions are used over time, so I cleaned up
the header file API to match. In particular:

* tryMatchSchemas was added since the overload logic got copy/pasted
into three separate locations.
* With this change, tryMatchSchema is no longer public, as it is not needed
  outside of tryMatchSchemas
* emitBuiltinFunction no longer needs a requires argument (it was always true)

* Argument order for all the schema matching stuff now puts the 'self'
builtin override last. This is only rarely used and was inconsistent with
matchSchema

Test Plan: Imported from OSS

Differential Revision: D17488297

Pulled By: zdevito

fbshipit-source-id: a32d838ce35544972fa8767557acc22149081b55
2019-10-09 22:11:24 -07:00
d44b9cd4bb add type refinements for isinstance checks (#26271)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26271

This replaces unchecked_unwrap_optional with unchecked_cast. This
enables the generalization of type refinement so that it works for
isinstance checks as well. This also removes unchecked_unwrap_optional from
code we generate, which is good because it is a hard op to serialize well
since it doesn't directly encode the Optional[T] being unwrapped. In contrast,
unchecked_cast always explicitly lists the type.

Test Plan: Imported from OSS

Differential Revision: D17412856

Pulled By: zdevito

fbshipit-source-id: ded47eb086c4610998ad92bb1174225af00220f7
2019-10-09 22:11:19 -07:00
52985a3501 Install developer certificate for code signing (#27593)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27593

## Summary

Since the nightly jobs are lack of  testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary.

Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs

- [#26261](https://github.com/pytorch/pytorch/pull/26261)
- [#26632](https://github.com/pytorch/pytorch/pull/26632)

The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a  [tutorial](https://circleci.com/docs/2.0/ios-codesigning/)  but is too complicated to implement. Anyway, I figured out an easier way to do it

1. Disable automatically code sign in XCode
2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done)
3. Install the developer certificate to the key chain store on CI machines via Fastlane.
4. Add the testing code to PR jobs and verify the result.
5. Add the testing code to nightly jobs and verify the result.

## Test Plan

- Both PR jobs and nightly jobs can finish successfully.
- `xcodebuild` can finish successfully

Test Plan: Imported from OSS

Differential Revision: D17848814

Pulled By: xta0

fbshipit-source-id: 48353f001c38e61eed13a43943253cae30d8831a
2019-10-09 20:07:30 -07:00
e66e00cd17 Fix native ctc_loss gradient indexing bug for large target sizes (#27460)
Summary:
Fixes: https://github.com/pytorch/pytorch/issues/27442

Thank you Mohamed Yousef (ASDen) for the report with minimal
reproducing example and detailed analysis!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27460

Differential Revision: D17789378

Pulled By: soumith

fbshipit-source-id: dc01a31b998cced4462e933d4b32e09b331f7e41
2019-10-09 19:26:47 -07:00
17a54e1b3d Revert D17840343: [pytorch][PR] changes to the documentation in support of quantization
Test Plan: revert-hammer

Differential Revision:
D17840343

Original commit changeset: 06bf3da6012b

fbshipit-source-id: 35f96fac299a0f9dd8ad864f475f606317c46823
2019-10-09 19:20:44 -07:00
971f773886 Revert D17750005: [jit] Add doc copy-edits from review
Test Plan: revert-hammer

Differential Revision:
D17750005

Original commit changeset: 230d1d33efb0

fbshipit-source-id: 12d22567b99286a8c4f719c3a384cb3665f7ba54
2019-10-09 19:12:58 -07:00
ba792335fc Export traced aten::unbind (#27247)
Summary:
This PR enables exporting aten::unbind created by the tracer. The traced version IR will always have this pattern ```aten::unbind -> prim::ListUnpack```.
Another PR supporting scripted aten::unbind will be submitted separately later.
```
// Unbind is being converted to ONNX as Split + Squeeze.
// Example IR
// graph(%0 : Float(3, 4, 5)):
//   %7 : Long() = prim::Constant[value={0}]()
//   %3 : Tensor[] = aten::unbind(%0, %7)
//   %4 : Float(4, 5), %5 : Float(4, 5), %6 : Float(4, 5) = prim::ListUnpack(%3)
//   return (%4, %5, %6)
//
// Translates to ONNX:
// graph(%0 : Float(3, 4, 5)):
//   %1 : Tensor, %2 : Tensor, %3 : Tensor = onnx::Split[axis=0](%0)
//   %4 : Float(4, 5) = onnx::Squeeze[axes=[0]](%3)
//   %5 : Float(4, 5) = onnx::Squeeze[axes=[0]](%2)
//   %6 : Float(4, 5) = onnx::Squeeze[axes=[0]](%1)
//   return (%6, %5, %4)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27247

Reviewed By: hl475

Differential Revision: D17791095

Pulled By: houseroad

fbshipit-source-id: 83b724275124dd1dedb272583a2fefbdf7035d4c
2019-10-09 18:20:03 -07:00
9e9713f071 Register operators of CV models in PyTorch mobile (#27609)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27609

It's a fix to PR27379, where it failed in Windows CI.

Currently the operators need to be registered manually through c10 registration.

Test Plan:
The operators should be covered by tests on operators.
A few ops (add, conv) are covered in test_lite_interpreter.cpp for demonstration.
CV models may be too large to include in unittests.
Simple local loaders can be built. Follow similar pattern as in test_lite_interpreter to

load the torch script model
run the model to get reference results
save and load the mobile module using torch::jit::module._save_for_mobile() and torch::jit::_load_for_mobile().
run the mobile module by run_method() and compare the results to reference results.
Tested models:

Lenet
XrayMobileV3

Differential Revision: D17832709

fbshipit-source-id: 51e44fa95240b241da85cb67dc2302878742903c
2019-10-09 17:30:10 -07:00
18d5210de9 changes to the documentation in support of quantization (#27603)
Summary:
this includes changes to

docs/source/conf.py
docs/source/index.rst
docs/source/nn.rst
docs/source/torch.rst
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27603

Differential Revision: D17840343

Pulled By: gottbrath

fbshipit-source-id: 06bf3da6012b334e3246a6a2cad42358462e2630
2019-10-09 17:13:34 -07:00
2093fac4ee ONNX Export ConstantOfShape with default dtype (#27577)
Summary:
Exporting a scripted module to ONNX, with ops like torch.zeros(), fails when the dtype is not specified.
This PR adds support to exporting scripted torch.zeros() ops (and similar ops) without specifying the dtype (dtype will default to float).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27577

Reviewed By: hl475

Differential Revision: D17822318

Pulled By: houseroad

fbshipit-source-id: b2d4300b869e782a9b72534fea1263eb83744953
2019-10-09 17:05:35 -07:00
e049e0b027 adding quantization.rst file for quantization feature (#27559)
Summary:
This was written by Raghu, Jessica, Dmytro and myself.

This PR will accumulate additional changes (there are a few more things we need to add to this actual rst file). I'll probably add the related image files to this PR as well.

I'm breaking draft PR https://github.com/pytorch/pytorch/pull/27553 into more easily digestible pieces.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27559

Differential Revision: D17843414

Pulled By: gottbrath

fbshipit-source-id: 434689f255ac1449884acf81f10e0148d0d8d302
2019-10-09 16:45:09 -07:00
0eccd05ab4 Add javadoc rst files
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27646

Differential Revision: D17844860

Pulled By: brianjo

fbshipit-source-id: 9b3ddf8dab2f63345b73436aeb245eea1686c350
2019-10-09 16:40:02 -07:00
85f33a4738 Fix install location for ATen_CORE_HEADERS by avoiding relative paths (#27449)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/20046

While installing, `aten/src/ATen` is shortened to just `ATen` so these relative paths become `/usr/local/include/ATen/core/../../../../torch` or simply `/usr/torch`.
Note that in cmake, `Caffe2` is the name for the root `pytorch` project so `Caffe2_SOURCE_DIR` gives the `pytorch` directory; *not* the `caffe2` directory.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27449

Differential Revision: D17844763

Pulled By: ezyang

fbshipit-source-id: fcd964ef1b891972f18155eb72732e90f0d50b8b
2019-10-09 16:37:42 -07:00
1fec1441a1 C++ API parity: PReLU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27429

Test Plan: Imported from OSS

Differential Revision: D17835412

Pulled By: pbelevich

fbshipit-source-id: e678d5920dad1293bb0ba3de28e2da3087d19bde
2019-10-09 16:31:54 -07:00
0fbbc7acb4 Allow align_to to take in partially named tensors (#27308)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27308

Currently, `tensor.align_to(*names)` has the restriction that the
`tensor` must be fully named. This doesn't need to be the case, when
using Ellipsis, we "expand the ellipsis to all unmentioned dimensions,
in the order which they appear in the original tensor".

For example, consider `tensor: Tensor[None, None, C]`.

`tensor.align_to(C, None, None)` is ambiguous because the user might
have wanted to switch the order of the None dimensions and there is no
way to specify that using this API. However, `tensor.align_to('C', ...)`
isn't ambiguous: we can select the two unnamed dimensions in the order
in which they appear.

To actually implement this, we write a brand-new `align_to(names,
ellipsis_idx)` function in c++ that is separate from the regular
`align_to(names)` implementation. Ideally we would support "..." as a
special name in c++ and combine the two implementations; we'll need to
support "..." in c++ in the future but that requires a bit of extra work.
In this PR, Python processees the ellipsis and then calls the correct
overload.

Test Plan: - run tests

Differential Revision: D17745179

Pulled By: zou3519

fbshipit-source-id: 9fed06d224215cfb7efecd8c002604baab3c45e6
2019-10-09 16:28:45 -07:00
7591010077 Disable automatically code signing for TestApp (#27591)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27591

## Summary

Since the nightly jobs are lack of  testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary.

Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs

- [#26261](https://github.com/pytorch/pytorch/pull/26261)
- [#26632](https://github.com/pytorch/pytorch/pull/26632)

The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a  [tutorial](https://circleci.com/docs/2.0/ios-codesigning/)  but is too complicated to implement. Anyway, I figured out an easier way to do it

1. Disable automatically code sign in XCode
2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done)
3. Install the developer certificate to the key chain store on CI machines via Fastlane.
4. Add the testing code to PR jobs and verify the result.
5. Add the testing code to nightly jobs and verify the result.

## Test Plan

- Both PR jobs and nightly jobs can finish successfully.
- `xcodebuild` can finish successfully

Test Plan: Imported from OSS

Differential Revision: D17844036

Pulled By: xta0

fbshipit-source-id: 741f0442a718c9bda706107a2c4c3baed4c37137
2019-10-09 16:23:15 -07:00
b6fea4f77f Removes floating_dtype decorator from test_torch and test_cuda (#27599)
Summary:
Per title. Also makes a few test_torch tests generic.

This PR removes ~half the floating_dtype decorators. Follow-up will remove the rest.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27599

Differential Revision: D17840056

Pulled By: mruberry

fbshipit-source-id: 428bb5498c452083e3608325e0b548b1d75baf2d
2019-10-09 16:10:26 -07:00
aeae5d6020 add dim to the cat benchmark (#27620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27620

as title

Test Plan:
```
buck run caffe2/benchmarks/operator_benchmark/pt:cat_test -- --iterations 3

# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim0
# Input: M: 256, N: 512, K: 1, dim: 0
Forward Execution Time (us) : 775.348

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim1
# Input: M: 256, N: 512, K: 1, dim: 1
Forward Execution Time (us) : 3612.599

# Benchmarking PyTorch: cat
# Mode: Eager
# Name: cat_M256_N512_K1_dim2
# Input: M: 256, N: 512, K: 1, dim: 2
Forward Execution Time (us) : 91416.224
...
``

Reviewed By: hl475

Differential Revision: D17835348

fbshipit-source-id: 94e02e328c4ea61b2e210d860ccdd377ef2b97f8
2019-10-09 16:03:07 -07:00
abcd221f19 add as_strided operator to the benchmark (#27632)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27632

Support as_strided operator in the benchmark suite.

Test Plan:
buck run caffe2/benchmarks/operator_benchmark/pt:as_strided_test -- --iterations 3
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0
Forward Execution Time (us) : 92.008

# Benchmarking PyTorch: as_strided
# Mode: Eager
# Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset1
# Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 1
Forward Execution Time (us) : 91.029
...

Reviewed By: hl475

Differential Revision: D17840076

fbshipit-source-id: 6585feb51ebfaca40032ffa0a61d5f76c25a2599
2019-10-09 15:42:05 -07:00
283f4814d3 Modify PyTorch's integration of NNPACK to use a unified underlying thread pool implementation. (#27341)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27341

Multi-threaded:

```
Pixel 2:
Before:    362.716
PR-27402:  185.799
PR-27341:  142.011

Pixel 3:
Before:    246.755
PR-27402:  160.045
PR-27341:  115.437

```

Single-threaded:

```
Pixel 2:
Before:    308.084
PR-27340:  303.539
PR-27341:  313.558

Pixel 3:
Before:    234.272
PR-27340:  227.158
PR-27341:  232.787

```

Test Plan: Imported from OSS

Differential Revision: D17835333

Pulled By: AshkanAliabadi

fbshipit-source-id: 9502c230d8567b141ae93f611ac524d855ed9bdf
2019-10-09 15:00:29 -07:00
3246fddfd6 Implement C++ API torch::nn::MultiMarginLoss. (#27424)
Summary:
Hi yf225 , here is the C++ frontend API MultiMarginLoss implementation and tests https://github.com/pytorch/pytorch/issues/27198. Could you review it and tell me if it is okay?

I am not entirely sure I used `c10::optional` correctly, but `options.weight()` resulted in a compilation error, so I went with `options.weight().value()` instead of `value_or()` to follow the logic in `torch.nn._WeightedLoss.register_buffer` (where one can pass a `None` value).

Oh, and are the tests supposed to be skipped or did I do something wrong? I ran `pytest test/test_cpp_api_parity.py -k Loss -v` , and the `L1Loss` test passed but the others were skipped...

Thank you for the review in any case!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27424

Differential Revision: D17839963

Pulled By: yf225

fbshipit-source-id: f4b6012590cf22d56d42751c214df80cce717cb8
2019-10-09 14:44:41 -07:00
0fed4756d0 C++ API parity: SELU (#27434)
Summary:
Adds `SELU` functional and module support for the C++ API.

Issue: https://github.com/pytorch/pytorch/issues/25883
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27434

Differential Revision: D17782762

Pulled By: yf225

fbshipit-source-id: 96c7ce84b9baf9e219a63e631929b8997ba6f3f0
2019-10-09 14:39:28 -07:00
28a1806cbc C++ API: torch::nn::Softmax (#27446)
Summary:
Add torch::nn::Softmax module support for the C++ API

Related Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27446

Differential Revision: D17839546

Pulled By: yf225

fbshipit-source-id: 7c7fb55111b261614de7c3a75fa1019fbde93c67
2019-10-09 14:19:47 -07:00
e7c9c8098a Add doc copy-edits from review (#26322)
Summary:
Add edits from doc review
](https://our.intern.facebook.com/intern/diff/17750005/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26322

Pulled By: driazati

Differential Revision: D17750005

fbshipit-source-id: 230d1d33efb015e40327373a05a1d3eced7c5c00
2019-10-09 14:16:48 -07:00
9084fcba46 test_equal in test_quantized.py (#27616)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616

Fix a problem in reference implementation of equal

Test Plan:
pytho test/test_quantized.py

Imported from OSS

Differential Revision: D17837055

fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485
2019-10-09 14:13:56 -07:00
fbba4edd1d C++ API parity: ELU, Hardshrink, Hardtanh, LeakyReLU, LogSigmoid minor fixes
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27565

Test Plan: Imported from OSS

Differential Revision: D17835416

Pulled By: pbelevich

fbshipit-source-id: 9e83bdb4bf44cbc2ef09e2088df4bf0694c235f0
2019-10-09 13:23:49 -07:00
7c472ec597 Vectorized complex unary and binary op support. (#26500)
Summary:
Added Complex support with AVX to unary ops and binary ops.

I need to add nan propagation to minimum() and maximum() in the future.
In-tree changes to pytorch to support complex numbers are being submitted here.
Out-of-tree support for complex numbers is here: pytorch-cpu-strided-complex extension

Preliminary Benchmarks are here.

I tried rrii and riri and found that riri is better in most situations.
Divide is very slow because you can't reduce 1/(x+y)
Sqrt is also very slow.
Reciprocal could be sped up after I add conj()
Everything else is typically within 20% of the real number performance.
Questions:

Why does macOS not support mil? #if AT_MKL_ENABLED() && !defined(__APPLE__) in vml.h. MKL does support some complex operations like Abs, so I was curious about trying it.
Is MKL just calling AVX?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26500

Differential Revision: D17835431

Pulled By: ezyang

fbshipit-source-id: 6746209168fbeb567af340c22bf34af28286bd54
2019-10-09 12:49:21 -07:00
d70f8dd964 Tests for fallback boxed dispatch (including TLS mode) (#26719)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719

This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first.

General structure of the PR:

* Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary.
* Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch.
* The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT.
* The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs.

The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch:

* `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface.
* `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly.
* `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly.

One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: D17549624

Test Plan: Imported from OSS

Pulled By: ezyang

fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6
2019-10-09 12:20:29 -07:00
eb9000be4e always use the closure to resolve variable names (#27515)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27515

Resoving variable names using the local activation frames does not work
when using recursive scripting, but our current code tries to do it
(incorrectly) anyway. The reason it works is only because the script
call is in the same local frame as the definition. This will not be
true in practice and makes it seem like the API works in more cases
than it really does. This forces us to always use closure-based annotations,
documents it, and it fixes the tests so that they still pass.

Test Plan: Imported from OSS

Differential Revision: D17803403

Pulled By: zdevito

fbshipit-source-id: e172559c655b05f0acf96c34f5bdc849f4e09ce2
2019-10-09 12:16:15 -07:00
1b385e7e5f Add std::variant backport (mpark) as c10::variant, with gcc 7.3.1 fix (#27575)
Summary:
This is the same as https://github.com/pytorch/pytorch/pull/26836 with workarounds for gcc 7.3.1 bug in light of https://github.com/pytorch/pytorch/pull/27277#issue-324044466. The workaround also limits the use cases of `c10::variant`, but it is sufficient for our (simple) use case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27575

Differential Revision: D17834410

Pulled By: yf225

fbshipit-source-id: e8f3c0be2904ec3d2975cbb80af237a5c9d0cb92
2019-10-09 12:10:39 -07:00
013ca32730 Devirtualize numel() (#27294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27294

Fixes #27291

I'm a little annoyed that I have to reintroduce manual binding code.  But it's
probably not a good idea to teach the codegen how to do fastpath functions
(is it?)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17763486

Pulled By: ezyang

fbshipit-source-id: 5793b53e2db80b044e57faae325a95c649d9d459
2019-10-09 11:43:50 -07:00
ab15584dce add random sample function to generate list of inputs (#23174)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23174

This diff introduces a new function to random generates inputs based on the weights.

Test Plan:
buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark/common/tests:random_sample_test -- --iterations 3

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N5_K7
# Input: M: 1, N: 5, K: 7
Forward Execution Time (us) : 82.923

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N6_K8
# Input: M: 1, N: 6, K: 8
Forward Execution Time (us) : 79.535

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M2_N6_K7
# Input: M: 2, N: 6, K: 7
Forward Execution Time (us) : 83.471

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N4_K7
# Input: M: 1, N: 4, K: 7
Forward Execution Time (us) : 84.410

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M1_N6_K7
# Input: M: 1, N: 6, K: 7
Forward Execution Time (us) : 82.399
```

Reviewed By: zheng-xq

Differential Revision: D15791723

fbshipit-source-id: 730e34d455e962ddf594a491d7c81c3f99fafa86
2019-10-09 11:24:14 -07:00
c1ed0150c5 canonical example of torch.add benchmark (#23402)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402

This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code.

Test Plan:
buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu
# Input: M: 8, N: 16, K: 32, device: cpu
Forward Execution Time (us) : 146.586

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda
# Input: M: 8, N: 16, K: 32, device: cuda
Forward Execution Time (us) : 92.151

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecpu
# Input: M: 16, N: 16, K: 64, device: cpu
Forward Execution Time (us) : 428.421

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecuda
# Input: M: 16, N: 16, K: 64, device: cuda
Forward Execution Time (us) : 89.811

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecpu
# Input: M: 64, N: 64, K: 128, device: cpu
Forward Execution Time (us) : 11857.012

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M64_N64_K128_devicecuda
# Input: M: 64, N: 64, K: 128, device: cuda
Forward Execution Time (us) : 93.918

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwdall
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 990.125

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd1
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 781.217

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_bwd2
# Input: M: 8, N: 16, K: 32, device: cpu
Backward Execution Time (us) : 777.307
```

Reviewed By: zheng-xq

Differential Revision: D16501974

fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d
2019-10-09 11:24:10 -07:00
a750a1a2b4 modify config_list to support cross product of attributes (#23399)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23399

This diff enables config_list function to support cross product of inputs besides the shapes.

The following is an example using the update interface. The same input shapes can run on different devices and dtypes.
```
add_short_configs = op_bench.config_list(
    attr_names=['M', 'N', 'K'],
    attrs=[
        [8, 16, 32],
        [16, 16, 64],
        [64, 64, 128],
    ],
    cross_product_configs={
        'device': ['cpu', 'cuda'],
        'dtype': [torch.float, torch.float64],
    },
    tags=['short'],
)
```

Test Plan:
buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/common/tests:pt_configs_list_test -- --iterations 3

```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_dtypetorch.float32
# Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float32
Forward Execution Time (us) : 164.489

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecpu_dtypetorch.float64
# Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float64
Forward Execution Time (us) : 158.677

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda_dtypetorch.float32
# Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float32
Forward Execution Time (us) : 103.866

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M8_N16_K32_devicecuda_dtypetorch.float64
# Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float64
Forward Execution Time (us) : 106.027

# Benchmarking PyTorch: add
# Mode: Eager
# Name: add_M16_N16_K64_devicecpu_dtypetorch.float32
# Input: M: 16, N: 16, K: 64, device: cpu, dtype: torch.float32
Forward Execution Time (us) : 451.016
...
```

buck test caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test

```
Building: finished in 2.4 sec (100%) 6882/6882 jobs, 2 updated
  Total time: 2.8 sec
Trace available for this run at /tmp/testpilot.20190730-160519.3952794.log
TestPilot test runner for Facebook. See https://fburl.com/testpilot for details.
Testpilot build revision 203f0104fbfcec4128be2c482c64736309ae39c9 fbpkg a4b2a9897a0c45069bd07d83e5981052 at Sun Jul 28 01:22:13 2019 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/667/t.par
Discovering tests
Running 3 tests
Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830
      ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_config_list_impl (operator_benchmark_test.TestConsumeOp) 0.011 1/3 (passed)
      ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_list_of_ops (operator_benchmark_test.TestConsumeOp) 19.920 2/3 (passed)
      ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_single_op (operator_benchmark_test.TestConsumeOp) 23.418 3/3 (passed)
      ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - main 0.000 (passed)
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830
Summary (total time 29.90s):
  PASS: 4
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Reviewed By: zheng-xq

Differential Revision: D16501272

fbshipit-source-id: d92b5cf50b0f37d5b3a79d423acb521366b4e8db
2019-10-09 11:24:06 -07:00
b9b9fd4fad Fix the arithmetic overflow issue for MSVC (#27596)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27568.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27596

Differential Revision: D17831612

Pulled By: ezyang

fbshipit-source-id: eff18095a74b6b82f70ed3f11d201483097205c5
2019-10-09 09:31:23 -07:00
987e37b9c2 Enable EXE001 flake8 check. (#27560)
Summary:
According to https://github.com/pytorch/pytorch/issues/27285 , seems we do not intend to use shebang as an indication of Python version, thus
we enable EXE001 flake8 check.
For violations, we either remove shebang from non-executable Python scripts or grant them executable permission.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27560

Differential Revision: D17831782

Pulled By: ezyang

fbshipit-source-id: 6282fd3617b25676a6d959af0d318faf05c09b26
2019-10-09 09:15:29 -07:00
65cdc8db5d Remove GEN_TO_SOURCE from CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27570

Differential Revision: D17831571

Pulled By: ezyang

fbshipit-source-id: d4f98dab64c892886cc4fd4128428a677edfd7a8
2019-10-09 08:58:42 -07:00
eb8fe883d8 Revert D17599915: [pytorch][PR] Support 0-batch size for nn.Linear.
Test Plan: revert-hammer

Differential Revision:
D17599915

Original commit changeset: 78894ce602d9

fbshipit-source-id: 3afd3621e85e5aa8b186d3542f71cef441f3d1bb
2019-10-09 08:58:38 -07:00
47e6d40b9c Revert D17810912: Register operators of CV models in PyTorch mobile
Test Plan: revert-hammer

Differential Revision:
D17810912

Original commit changeset: 2cc25dbe81a3

fbshipit-source-id: 3b020f8eee2064f8f5df939b689332c9cab320d5
2019-10-09 08:56:08 -07:00
15bec0970c Add instructions for setting up ccache from conda (#27481)
Summary:
I was unable to use the existing instructions since I don't have sudo privileges on my GPU development machine and couldn't easily install `ccache` or the build dependencies for `ccache`.

However, I was able to get it working by installing `ccache` with `conda` and then creating symlinks to shadow my compilers as in the build-from-source installation instructions. I figure this might be generally useful as others might not have sudo privileges on their pytorch development machine.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27481

Differential Revision: D17831556

Pulled By: ezyang

fbshipit-source-id: c5373d8739ad910015e677e7ad48bd91b770f842
2019-10-09 08:49:51 -07:00
59b14a7620 Documentation for named tensors (#27173)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173

`docs/source/named_tensor.rst` is the entry point; most users will land
either here or the named tensor tutorial when looking to use named
tensors. We should strive to make this as readable, concise, and understandable
as possible.

`docs/source/name_inference.rst` lists all of the name inference rules.
It should be clear but it's hard to make it concise.

Please let me know if anything doesn't make sense and please propose
alternative wordings and/or restructuring to improve the documentation.
This should ultimately get cherry-picked into the 1.3 branch as one
monolithic commit so it would be good to get all necessary changes made
in this PR and not have any follow ups.

Test Plan: - built and reviewed locally with `cd docs/ && make html`.

Differential Revision: D17763046

Pulled By: zou3519

fbshipit-source-id: c7872184fc4b189d405b18dad77cad6899ae1522
2019-10-08 22:22:30 -07:00
a37be201c1 Implement torch.nn.Embedding / EmbeddingBag in PyTorch C++ API (#26358)
Summary:
added more variables to EmbeddingOptions and updated EmbeddingImpl reset, forward functions. Also added EmbeddingBag.

-----

This PR is BC-breaking in the following way:

Previously, `EmbeddingOptions` supports `count` and `dimension` as options arguments. After this PR, they are renamed to `num_embeddings` and `embedding_dim` respectively.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26358

Differential Revision: D17714337

Pulled By: yf225

fbshipit-source-id: f9f969c68e4bece106b92f8e2e02ac39c8455fb7
2019-10-08 22:13:39 -07:00
b96f49885f caffe2 python ideep conv_op test_int8_convolution skip for python 3
Summary: This test was failing in 3.7,  turns out it was ommitted by test director in 3.6 so I added a skip for both versions

Test Plan: unittests is skipped in 3.7 and 3.6 all other tests pass.

Reviewed By: tomdz

Differential Revision: D17820967

fbshipit-source-id: 571f0ec7fe1b0cb50ead4e0d18c00151a701f36a
2019-10-08 21:31:11 -07:00
1f158adeee Add support for attention weight in SparseLookup (#26748)
Summary:
Support attention weights input to SparseLookup. In attention sum pooling, if attention weights can be pre-calculated before embedding lookup,  they can be passed to SparseLookup and processed by SparseLengthsWeightedSum op. One example is id_score attention sum pooling.

Essentially the net is converted from:
  LengthsSum(Mul(Gather(keys, w), att_weight))
to:
  SpaseLenghtsWeightedSum(keys, w, att_weight)

It unblocks potential efficiency gain with distributed training.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/26748

Test Plan: unit test

Reviewed By: chocjy

Differential Revision: D17553345

Pulled By: wheatkit

fbshipit-source-id: 60cc3c4b0bc1eade5459ac598e85286f3849a412
2019-10-08 20:22:25 -07:00
a891e92f89 Support 0-batch size for nn.Linear. (#27211)
Summary:
At the current moment of time nn.Linear (an it's interal functional code), will
fail in THBlas:

RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363

This diff is trying to fix this bug.

As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic:
1. The file touched in this diff
2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu

At the moment I didn't find a better places comparing to injecting logic to those files:
the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27211

Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends.

Differential Revision: D17599915

Pulled By: kennyhorror

fbshipit-source-id: 78894ce602d96aac2d6bf8c16a3fab43973e2d53
2019-10-08 16:43:21 -07:00
c27853fbba Expose torch::jit::script::Module::dump_to_str to python as module._c.dump_to_str.
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27556

Test Plan: Imported from OSS

Differential Revision: D17814331

Pulled By: ZolotukhinM

fbshipit-source-id: a25fc853897d37c6a703373838b522c64ad3aa78
2019-10-08 16:32:23 -07:00
6cf189512c Remove underscore from pybind of module._c.dump (#27555)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27555

It is already under '_c' anyway.

Test Plan: Imported from OSS

Differential Revision: D17814333

Pulled By: ZolotukhinM

fbshipit-source-id: ca21649d553f6601be12828958a8077867d0e30e
2019-10-08 16:32:19 -07:00
1610ea8ef8 Comprehensive-ish instrumentation for CUDA memory allocator (#27361)
Summary:
Adds comprehensive memory instrumentation to the CUDA caching memory allocator.

# Counters

Added comprehensive instrumentation for the following stats:
  - Allocation requests (`allocation`)
  - Allocated memory (`allocated_bytes`)
  - Reserved segments from cudaMalloc (`segment`)
  - Reserved memory (`reserved_bytes`)
  - Active memory blocks (`active`)
  - Active memory (`active_bytes`)
  - Inactive, non-releasable blocks (`inactive_split`)
  - Inactive, non-releasable memory (`inactive_split_bytes`)
  - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`)
  - Number of OOMs (`num_ooms`)

Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator.

# Snapshots

Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state.

# Implementation: major changes

- Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary.
- Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments.
- Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq

# Implementation: minor changes

- Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`.
- Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module.
- Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`.
- `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent.
- `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`.
- Style (add access modifiers in the allocator class, random nit fixes, etc.)

# Testing

- Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`.
- Ran on various basic workflows (toy example, CIFAR)

# Performance

Running the following speed benchmark: https://pastebin.com/UNndQg50

- Before this PR: 45.98 microseconds per tensor creation
- After this PR: 46.65 microseconds per tensor creation
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361

Differential Revision: D17758747

Pulled By: jma127

fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6
2019-10-08 15:42:48 -07:00
04cd777ed4 Create BUCK build for lite-interpreter (#27546)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27546

Add files in csrc/jit/mobile folder to torch_core, as a first step to have light interpreter built in BUCK. Next the files will be independent of torch_core (T54912812)
ghstack-source-id: 91523987

Test Plan:
buck build -c pytorch.enable_rtti=1 -c project.ignore= -c ndk.app_platform=android-23 -c user.libcxx_cflags=-DFOLLY_USE_LIBCPP=1 -c user.libcxx_cxxflags=-DFOLLY_USE_LIBCPP=1 -c ndk.cxx_runtime=libcxx -c user.ndk_cxxflags=-g0 //xplat/experimental/pytorch/mobile:lite_predictorAndroid#android-armv7 && adb push buck-out/gen/xplat/experimental/pytorch/mobile/lite_predictorAndroid#android-armv7 /data/local/tmp/
In adb shell:
data/local/tmp/lite_predictorAndroid\#android-armv7 add_it.bc

buck build -c project.ignore= @//fbcode/mode/dev-asan //xplat/experimental/pytorch/mobile:lite_predictor

Reviewed By: ljk53

Differential Revision: D17717547

fbshipit-source-id: 4c00a35eb231968d05d0d7b56bcfd5dc0258d4bb
2019-10-08 15:20:30 -07:00
ff03f9bc94 Remove CPU_tensor_apply* from Normalization.cpp (#27327)
Summary:
https://github.com/pytorch/pytorch/issues/24486
https://github.com/pytorch/pytorch/issues/24485
https://github.com/pytorch/pytorch/issues/24484
https://github.com/pytorch/pytorch/issues/24483
https://github.com/pytorch/pytorch/issues/24482
https://github.com/pytorch/pytorch/issues/24481
https://github.com/pytorch/pytorch/issues/24480
https://github.com/pytorch/pytorch/issues/24479
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27327

Differential Revision: D17811268

Pulled By: ifedan

fbshipit-source-id: 7ce54d8e87752e9ea34b12b1415e1398017070cd
2019-10-08 14:49:59 -07:00
e16868ab29 Register operators of CV models in PyTorch mobile (#27379)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27379

Currently the operators need to be registered manually through c10 registration.

Test Plan:
The operators should be covered by tests on operators.
A few ops (add, conv) are covered in test_lite_interpreter.cpp for demonstration.
CV models may be too large to include in unittests.
Simple local loaders can be built. Follow similar pattern as in test_lite_interpreter to
1. load the torch script model
2. run the model to get reference results
3. save and load the mobile module using torch::jit::module._save_for_mobile() and torch::jit::_load_for_mobile().
4. run the mobile module by run_method() and compare the results to reference results.

Tested models:
* Lenet
* XrayMobileV3

Differential Revision: D17810912

fbshipit-source-id: 2cc25dbe81a3c9a85108b3efe6a8e957028fc622
2019-10-08 14:05:26 -07:00
3f660cdf0f Remove CUDA_tensor_apply1 (#27313)
Summary:
CUDA_tensor_apply1 is unused, so it will be removed
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27313

Differential Revision: D17746076

Pulled By: ifedan

fbshipit-source-id: 99120a5f1f0f716b4dc19b6ffe931071cbcdaea2
2019-10-08 13:23:00 -07:00
e7b6ea5535 Move the CUDA implementation of atan2 (which was partially implemented in ATen) to ATen. (#26178)
Summary:
std::atan2 is not used because it does not work with HIP.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26178

Differential Revision: D17747897

Pulled By: VitalyFedyunin

fbshipit-source-id: b300f0573c431e1425644c9c1899d0b024c6a57c
2019-10-08 13:15:51 -07:00
c1c176d91b record_stream() for shifted view tensors (#27371)
Summary:
Issue: https://github.com/pytorch/pytorch/issues/27366

The address of a view tensor might be shifted from the head of the storage.

```python
>>> x = torch.rand(10, 10, device=0, requires_grad=True)
>>> y = x[2:]
>>> hex(x.data_ptr())
'0x7f1b15c00000'
>>> hex(y.data_ptr())
'0x7f1b15c00050'
```

Currently, `Tensor.record_stream()` silently ignores shifted view tensors, because `CUDACachingAllocator` cannot find the block from the shifted address.

```c++
void recordStream(void* ptr, cuda::CUDAStream stream)
{
  if (ptr) {
    std::lock_guard<std::recursive_mutex> lock(mutex);
    Block* block = find_allocated_block(ptr);
    if (block) {
      ...
    }
    // 'block' is nullptr if 'ptr' is shifted.
  }
}
```

So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call `record_stream()` on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view.

We can retrieve the head of the address from any types of tensors by `tensor.storage().data_ptr()`. Hence, I've thought it's better to pass to `recordStream()` rather than `tensor.data_ptr()` for consistent behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27371

Reviewed By: ezyang

Differential Revision: D17768558

Pulled By: albanD

fbshipit-source-id: 7705f52b0177625168edb6f71c07a029df471bc5
2019-10-08 12:31:26 -07:00
6e59fb6a97 .gitignore for the docs folder
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27491

Test Plan: Imported from OSS

Differential Revision: D17796152

Pulled By: zafartahirov

fbshipit-source-id: d1aaf27b4ea1fb683cd889e5a935b4ca275de3ad
2019-10-08 12:18:30 -07:00
eb93200321 Fix DDP incompatibility issue with nn.MultiheadAttention. (#26826)
Summary:
Fix issue https://github.com/pytorch/pytorch/issues/26698.

With different query/keys/value dimensions, `nn.MultiheadAttention` has DDP incompatibility issue because in that case `in_proj_weight` attribute is created but not used. Fix it and add a distributed unit test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26826

Differential Revision: D17583807

Pulled By: zhangguanheng66

fbshipit-source-id: c393584c331ed4f57ebaf2d4015ef04589c973f6
2019-10-08 12:13:34 -07:00
f522bde121 Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105)
Summary:
Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring.

I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105

Differential Revision: D17813641

Pulled By: ezyang

fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c
2019-10-08 12:09:02 -07:00
d57124823b Regenerate aten_op.h when native_functions.yaml changes. (#27253)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/10127.

This ensures that aten_op.h is regenerated whenever a new native kernel
is removed. Previously it was only being regenerated when new native
kernels were added because this generated new source files, which this
cmake target depended on. However if a native kernel is removed then
there is no dependent target and the header is never regenerated.

Explicitly depending on native_functions.yaml ensures that the header
is regenerated even if a kernel is removed.

I'm no cmake expert so alternative approaches or reasons why this is
obviously incorrect are very appreciated!

EDIT: reflecting comments below we now depend on `Dependencies.yaml` instead of `native_functions.yaml`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27253

Differential Revision: D17813659

Pulled By: ezyang

fbshipit-source-id: 2c754a88ba62495c14de8a9649f6675d2dad0b7d
2019-10-08 11:54:51 -07:00
31a6ff46c1 change input shape to reduce variation (#27548)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548

as title

Test Plan: i_dont_want_it

Reviewed By: hl475

Differential Revision: D17811295

fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c
2019-10-08 11:45:06 -07:00
b4ce922b58 Move RPC API to torch.distributed.rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27290

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D17808212

Pulled By: pietern

fbshipit-source-id: c79907940fe4888b2ceaaa1cda0078e39c89b454
2019-10-08 11:31:25 -07:00
a6d26ce135 Move internal functions to torch.distributed.rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27289

Test Plan: Imported from OSS

Differential Revision: D17808214

Pulled By: pietern

fbshipit-source-id: 4c453028e431c3e951d439784017ef07037ba1a9
2019-10-08 11:31:20 -07:00
14f1629c4d Move RPC backend registry to torch.distributed.rpc
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27288

Test Plan: Imported from OSS

Differential Revision: D17808215

Pulled By: pietern

fbshipit-source-id: 489c031e02cd3141a861cf7ec2273aaa4c55b7d7
2019-10-08 11:31:16 -07:00
1fd14c5822 Remove torch.distributed.rpc function (#27287)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27287

This is replaced by calls to `dist.rpc_sync` and `dist.rpc_async`.

Test Plan: Imported from OSS

Differential Revision: D17808210

Pulled By: pietern

fbshipit-source-id: 3103a615fa8b08224780387a3ea4ac6b1c73badb
2019-10-08 11:31:12 -07:00
48a571b29c Rename variables and add comments (#27286)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27286

The name `runUDFFunction` stutters because the F in UDF also stands
for function. Renamed these variables to be identical to their Python
equivalents. Renamed those to share a prefix and drop `internal`,
because internal functions can use an underscore prefix.

Test Plan: Imported from OSS

Differential Revision: D17808208

Pulled By: pietern

fbshipit-source-id: 7619f07fc8215203dfb1da1eb281845edcd2bb99
2019-10-08 11:31:08 -07:00
f597926fe0 Remove shebang from non-executable files in torch.distributed
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27285

Test Plan: Imported from OSS

Differential Revision: D17808207

Pulled By: pietern

fbshipit-source-id: 6141c1783e3a6f448a298275120db1f254b42b2a
2019-10-08 11:31:03 -07:00
c742918854 Fix pybind11 warnings in python_rpc_handler.cpp (#27284)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27284

The warnings related to usage of the deprecated != operator. Instead
of checking the member field on every function call, we can check it
once, on construction of PythonRpcHandler.

Test Plan: Imported from OSS

Differential Revision: D17808213

Pulled By: pietern

fbshipit-source-id: 022c8f77f266942c49c55b1729e62dbb06262d77
2019-10-08 11:30:59 -07:00
0d22f3b170 Emergency split CUDA libtorch build/test into separate job (#26859)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26859

CUDA builds are intermittently taking greater than five hours,
hitting CircleCI's timeout limit, and also all around making
developers unhappy.  Part of the reason for this is because
they build PyTorch twice: once as normal, and once as libtorch.
This diff splits libtorch into a new job to parallelize this
and get us below the patch.  It's an emergency diff because
I did the minimum possible work to make this work, including
grody hacks to make sure macos libtorch builds still work
(without adding a separate job there).

- Add a new libtorch config, to cuda9 (same as before).  Disable
  generation of the other test variants.
- Adjust common.sh to NO LONGER set BUILD_TEST_LIBTORCH for
  pytorch-linux-trusty-py3.6-gcc7; we will test for *libtorch*
  in the job name for this case.  (I noticed a bug while
  looking at this.)
- Adjust build.sh and test.sh.  The eventual logic is that if you are a
  *libtorch* build, ONLY build libtorch; otherwise do the same
  thing you used to do (including respecting BUILD_TEST_LIBTORCH)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17810592

Pulled By: ezyang

fbshipit-source-id: 8dcdb8f7424ddda293500d9fc90097a54dca28b9
2019-10-08 11:24:21 -07:00
660264e173 fix documentation for add_hparams (#27521)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27521

adding new lines to add_hparams description

Test Plan: sphinx-autobuild

Reviewed By: orionr

Differential Revision: D17800387

fbshipit-source-id: 4a09a86a9d35c6c2d3a7e2857027f9d053851585
2019-10-08 10:56:44 -07:00
3b5d40c339 Add C++ torch::nn::CosineEmbeddingLoss (#27345)
Summary:
Adds `torch::nn::CosineEmbeddingLoss`  module and functional support for the C++ API.

Issue: https://github.com/pytorch/pytorch/issues/25883

Reviewer: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27345

Differential Revision: D17801402

Pulled By: yf225

fbshipit-source-id: 0eabe80d7d36397e6667b331c3fa2f56d7a15962
2019-10-08 10:52:05 -07:00
e63bfb7877 Use orig source range in Node::print
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27524

Test Plan: Imported from OSS

Differential Revision: D17806454

Pulled By: jamesr66a

fbshipit-source-id: 5e3edb87fc79ad8dd1aed0b7d4a2153e7e0429ab
2019-10-08 10:30:56 -07:00
e2143fdeb8 Updating submodules
Summary:
GitHub commits:

fdc5edee63
266b453eb0

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: bd5b2e9b7bd31d8995e75124b61e29423b624265
2019-10-08 10:20:06 -07:00
725810f42c Set existing attributes under recursive script (#27514)
Summary:
This is related to #27109, `training` was being skipped since modules
have it as an attribute by default, but it should be copied anyways.
](https://our.intern.facebook.com/intern/diff/17802544/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27514

Pulled By: driazati

Differential Revision: D17802544

fbshipit-source-id: 9e8f068903b67073c509c2c598b27622fcada2d7
2019-10-08 10:12:04 -07:00
7f183a978f Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444)
Summary:
This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers.

Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are:

- test_autograd.py
- test_distributions.py
- test_jit.py
- test_nn.py

This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting.

Notable technical changes in this PR are:

- Significant updates to test_torch.py to make it pass without setting the default floating dtype globally.
- The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously.
- test_torch-specific parts of common_utils were refactored into test_torch.
- tensor creation methods in common_utils were updated to accept an optional dtype and device.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444

Differential Revision: D17795235

Pulled By: mruberry

fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1
2019-10-08 09:52:44 -07:00
16ece1c9da Fixed typos and grammatical errors (#27465)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27443
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27465

Differential Revision: D17810732

Pulled By: pietern

fbshipit-source-id: b8a62dd086a4f4a61c9aa6acfa495cf822995604
2019-10-08 09:31:45 -07:00
6e0312a9c5 Revert "Make static dispatch turn off variable before entering the kernel. (#26908)" (#27283)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27283

This reverts commit 9159a601ca4953ecf0d3dc568cd0b966de2d4686.

Test Plan: Imported from OSS

Differential Revision: D17738167

Pulled By: ezyang

fbshipit-source-id: cc4048d553017409279603590833d1529f59048c
2019-10-08 09:21:07 -07:00
a96b003b39 docstring only formatting changes: quantize.py, fake_quantize.py, observer.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27415

Reviewed By: zafartahirov

Differential Revision: D17783101

Pulled By: gottbrath

fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f
2019-10-08 09:21:03 -07:00
e63addfff6 Exponential decay of the weight of task loss (#27508)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508

Implemented a simple exponential decay of the weight of lr loss function, with a lower bound.

Test Plan:
buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay
https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308

canary: f140103452

Reviewed By: chenshouyuan

Differential Revision: D17524101

fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20
2019-10-08 09:15:41 -07:00
2c51e0659b Roll master to 1.4.0 (#27374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27374

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17809770

Pulled By: ezyang

fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8
2019-10-08 08:58:53 -07:00
34662f77c6 Revert D17159707: [pytorch][PR] [ONNX] Fixed Select symbolic to export slice when index = negative one
Test Plan: revert-hammer

Differential Revision:
D17159707

Original commit changeset: 2c3b27542108

fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01
2019-10-08 01:59:10 -07:00
1b5df37441 Updating submodules
Summary:
GitHub commits:

e80ecd1d63
6c7a36b1b3
8750462043
442d7def67
c138dc3d2c
3833f10989
6fc473d530
82d259dade

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 7834a4a8620d0ab9b60060e0abadfba457fb2890
2019-10-08 01:08:45 -07:00
84e2dc692a Fix broken name mangling
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27511

Test Plan: Imported from OSS

Differential Revision: D17801185

Pulled By: jamesr66a

fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a
2019-10-07 20:05:32 -07:00
23f2fb0aec #include <stdexcept> into flat_hash_map.h (#27478)
Summary:
Fixing https://github.com/pytorch/pytorch/issues/27266

In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27478

Differential Revision: D17799522

Pulled By: pbelevich

fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0
2019-10-07 19:24:07 -07:00
24242e86fa Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (#27124)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27124

ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL
2.4 to detect errors in NCCL communicators. These were used as part of
ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were
supported. Although, there is still legitimate use for older NCCL versions and
hence we should still support those.

For that purpose, in this change I've ensured we disable NCCL error checking
for versions < 2.4.
ghstack-source-id: 91452959

Test Plan:
1) Test with 2.4.8
2) Test with 2.2.13
3) unit tests.

Differential Revision: D17178988

fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698
2019-10-07 17:39:32 -07:00
4bd8ae13c6 Move hipify to torch/utils to bundle them into torch package (#27425)
Summary:
Similar to https://github.com/pytorch/pytorch/pull/27418 but try to put it under "torch" namespace
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27425

Differential Revision: D17779490

Pulled By: bddppq

fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b
2019-10-07 17:25:45 -07:00
ce16d689b3 FunctionEventAvg implements __iadd__ interface (#27498)
Summary:
Resolving issue https://github.com/pytorch/pytorch/issues/26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27498

Differential Revision: D17801918

Pulled By: ezyang

fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06
2019-10-07 17:14:27 -07:00
4a28ab95d0 Clean up JavaDoc comments in pytorch_android
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455

Test Plan: Imported from OSS

Differential Revision: D17800658

Pulled By: dreiss

fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd
2019-10-07 17:01:30 -07:00
1ffa81d772 Various cleanups to pytorch_android API (#27454)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454

See detailed discussion at
https://github.com/pytorch/pytorch/issues/27350

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800480

Pulled By: dreiss

fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0
2019-10-07 17:01:26 -07:00
b66df47a11 Refactor python_android test to separate Android-specific components (#27453)
Summary:
All of the test cases move into a base class that is extended by the
intrumentation test and a new "HostTests" class that can be run in
normal Java.  (Some changes to the build script and dependencies are
required before the host test can actually run.)

ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453

Test Plan: Imported from OSS

Reviewed By: IvanKobzarev

Differential Revision: D17800410

fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181
2019-10-07 17:01:22 -07:00
aab9673e8d Avoid variable shadowing in `::at::philox_engine::single_round()` (#27486)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27486

Rename `key` argument of `single_round` method to `in_key`

Test Plan: CI

Reviewed By: stepancheg, soumith

Differential Revision: D17782904

fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603
2019-10-07 16:34:22 -07:00
16454095e0 Fixed Select symbolic to export slice when index = negative one (#25273)
Summary:
Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25273

Reviewed By: hl475

Differential Revision: D17159707

Pulled By: houseroad

fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76
2019-10-07 14:24:34 -07:00
8cc9d27647 Automatic update of fbcode/onnx to 2891e1459745933f4bba9a8cb3371cf3c9eb1d16 (#27474)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27474

Previous import was 034921bd574cc84906b7996c07873454b7dd4135

Included changes:
- **[2891e145](https://github.com/onnx/onnx/commit/2891e145)**: Fix Unique unit test (#2381) <Scott McKay>
- **[25cf73e5](https://github.com/onnx/onnx/commit/25cf73e5)**: update shapeInference h file link (#2369) <prcvih>
- **[e3074bc0](https://github.com/onnx/onnx/commit/e3074bc0)**: modify file path (#2378) <prcvih>
- **[9058d3a4](https://github.com/onnx/onnx/commit/9058d3a4)**: Incrementing version number to 1.6.0 (#2353) (#2385) <Kevin Chen>
- **[c963586d](https://github.com/onnx/onnx/commit/c963586d)**: Remove typing packages from test requirements (#2375) <Aiken Cairncross>

Test Plan: ci

Reviewed By: bddppq

Differential Revision: D17791527

fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc
2019-10-07 14:16:29 -07:00
a4cba50d62 Put metrics back to torch.utils.tensorboard similar we have in TensorboardX
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27252

Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj

Reviewed By: sanekmelnikov

Differential Revision: D17723414

fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065
2019-10-07 14:10:38 -07:00
0046092178 Reduce special casing around 'training' (#27109)
Summary:
Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling.

Fixes #26884
](https://our.intern.facebook.com/intern/diff/17728129/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27109

Pulled By: driazati

Differential Revision: D17728129

fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57
2019-10-07 13:52:59 -07:00
a24291a554 Unfold export (#24970)
Summary:
ONNX export for Unfold in symbolic opset9 + op and ORT tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24970

Reviewed By: hl475

Differential Revision: D17495106

Pulled By: houseroad

fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50
2019-10-07 13:06:37 -07:00
1250acef90 Disable tsan for test_multiprocessing. (#27410)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27410

Similar to https://github.com/pytorch/pytorch/pull/25005, TSAN is not
safe to use in a multi-threaded program with fork and can cause deadlocks. As a
result, disabling this test for TSAN.
ghstack-source-id: 91393545

Test Plan: buildbot

Differential Revision: D17775141

fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a
2019-10-07 11:29:04 -07:00
0222eceaaa Remove outdated note in cholesky_solve and triangular_solve doc strings (#26989)
Summary:
We do support inputs with dim > 2 in _out variants
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26989

Differential Revision: D17785632

Pulled By: soumith

fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001
2019-10-06 23:28:48 -07:00
0b6186d778 Remove Tensor.h, TensorMethods.h from src/core. (#27086)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086

This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past).

This is a commandeer of #25031

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D17687345

Pulled By: ezyang

fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f
2019-10-06 09:37:50 -07:00
2cc1e69cc9 C++ API parity: LogSigmoid
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27060

Test Plan: Imported from OSS

Differential Revision: D17682404

Pulled By: pbelevich

fbshipit-source-id: d60d64cd4caf1f56a2e05c516f91321d46ec9624
2019-10-05 06:18:25 -07:00
17c672e704 enable rocTX API (#27416)
Summary:
ROCm 2.9 brings support for the rocTX API through rocTracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416

Differential Revision: D17777480

Pulled By: bddppq

fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7
2019-10-05 01:55:00 -07:00
04436f6c60 Upgrade to ROCm 2.9 (#27417)
Summary:
New docker images built with tag 325: https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/325

Related ossci-job-dsl commits:
a00a76f927
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27417

Differential Revision: D17777517

Pulled By: bddppq

fbshipit-source-id: a6b8cb86b37f537d402f6d2c7d28ad28a6a5a317
2019-10-05 00:36:34 -07:00
bac11d1002 Tweak docs on building docs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27364

Differential Revision: D17777402

Pulled By: dzhulgakov

fbshipit-source-id: 304c678e5c80d7f8c779d65c11f9bf1b0facdb52
2019-10-04 22:14:37 -07:00
e0ae3ce5e4 Docstring fix (#27225)
Summary:
Correcting docstring for `add_image_with_boxes` method. Fixed spelling mistake.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27225

Differential Revision: D17776604

Pulled By: jerryzh168

fbshipit-source-id: 45f69643ec3b58c46b9fb67411c42a6d09b7290e
2019-10-04 21:29:36 -07:00
7a2e61c28e Remove dependency on six from dist_autograd_test.py
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27369

Test Plan: Imported from OSS

Differential Revision: D17763104

Pulled By: mrshenli

fbshipit-source-id: dd146809686e7720f2b77012eebb6aed72851556
2019-10-04 21:24:25 -07:00
1741adfd3e Use deepcopy inputs for ONNX ort test cases (#27186)
Summary:
Running models with inplace operators will change values of input tensors.
Deepcopy input tensors each time to keep the original input tensors intact.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27186

Differential Revision: D17776598

Pulled By: jerryzh168

fbshipit-source-id: d4808a11185a9ab0d782a62d7d708dfe7e94559c
2019-10-04 19:01:59 -07:00
1f0328c6d4 Add randomFill to test_utils.h
Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests.

Test Plan:
```
buck run mode/opt //tvm/sparse:cblas_bench
```

Reviewed By: yinghai

Differential Revision: D17759193

fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1
2019-10-04 18:29:22 -07:00
f4d0d0a811 Enable RCCL in ROCm build (#27383)
Summary:
continues https://github.com/pytorch/pytorch/pull/23884
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383

Differential Revision: D17767248

Pulled By: bddppq

fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90
2019-10-04 17:41:41 -07:00
7b3881f68c Adding docstrings for nnq.functional
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363

Test Plan: Imported from OSS

Differential Revision: D17758907

Pulled By: zafartahirov

fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2
2019-10-04 17:19:47 -07:00
b05ec828ad Add interface/object serialization as module attribute (#26770)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26770

This PR added the interface/object serialization as module attribute, to
allow initializing object as a interface type during python
initialization. Because interface type can be backed by any class object
that implements that interface, if we declare it in
python/module.__init__, we will need to collect the run time types of the
value and serialize them to ensure complete code information

Test Plan: Imported from OSS

Differential Revision: D17742707

fbshipit-source-id: 7f614ad4f982996d320a0e2dd3515bf47370e730
2019-10-04 17:12:08 -07:00
381cf2bd24 add warning to dnnlowp fc if quantization kind is not min_max
Summary:
Print warning when using DNNLOWP dynamic int8 quant for FC and activation_quantization_kind != min_max.

Warning will display in console but not in Bento. Would have to use CAFFE_ENFORCE to alert in Bento.

Test Plan: buck run unit test forcing DNNLOWP FC with activation_quantization_kind = "l2" and saw warning printed in console.

Reviewed By: csummersea

Differential Revision: D17770921

fbshipit-source-id: b6532e4c9a86d74e3db4cb432735505d378a366e
2019-10-04 17:03:19 -07:00
afbbe16f49 Add methods to write image tensor content to buffer (#27359)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359

Adding methods  to TensorImageUtils:
```
bitmapToFloatBuffer(..., FloatBuffer outBuffer, int outBufferOffset)
imageYUV420CenterCropToFloat32Tensor(..., FloatBuffer outBuffer, int outBufferOffset)
```
To be able to
 - reuse FloatBuffer for inference
 - to create batch-Tensor (contains several images/bitmaps)

As we reuse FloatBuffer for example demo app - image classification,
profiler shows less memory allocations (before that for every run we created new input tensor with newly allocated FloatBuffer) and ~-20ms on my PixelXL

Known open question:
At the moment every tensor element is written separatly calling `outBuffer.put()`, which is native call crossing lang boundaries
As an alternative - to allocation `float[]` on java side and fill it and put it in `outBuffer` with one call, reducing native calls, but increasing memory allocation on java side.
Tested locally just eyeballing durations - have not noticed big difference - decided to go with less memory allocations.

Will be good to merge into 1.3.0, but if not - demo app can use snapshot dependencies with this change.

PR with integration to demo app:
https://github.com/pytorch/android-demo-app/pull/6

Test Plan: Imported from OSS

Differential Revision: D17758621

Pulled By: IvanKobzarev

fbshipit-source-id: b4f1a068789279002d7ecc0bc680111f781bf980
2019-10-04 16:33:50 -07:00
ac0f18437f MovingAverage Observer (#27396)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396

Observer that estimates moving averages of min and max values per batch,  more suited for quantization aware training instead of minmax observers that track extremal values across batches
ghstack-source-id: 91369018

Test Plan:
buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details

buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details

Differential Revision: D17727213

fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
2019-10-04 16:28:59 -07:00
92a2caa028 Pickup proxy parameters for publishing (#27389)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27389

Pickup gradle proxy parameters (handy for publishing from devserver) in maven publishing gradle plugin

Test Plan: Imported from OSS

Differential Revision: D17773548

Pulled By: IvanKobzarev

fbshipit-source-id: 662c0b2835e6cf1e4009da79e27268d4a19c2ceb
2019-10-04 16:21:31 -07:00
18215337f4 Change nightly builds version to 1.4.0-SNAPSHOT (#27381)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27381

Changing android nightly builds from master to version 1.4.0-SNAPSHOT, as we also have 1.3.0-SNAPSHOT from the branch v1.3.0

Test Plan: Imported from OSS

Differential Revision: D17773620

Pulled By: IvanKobzarev

fbshipit-source-id: c39a1dbf5e06f79c25367c3bc602cc8ce42cd939
2019-10-04 16:14:24 -07:00
32d009a37f Add gfx908 to the list of per-default compiled architectures. (#27388)
Summary:
ROCm 2.8 added preliminary support for gfx908.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388

Differential Revision: D17767772

Pulled By: bddppq

fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57
2019-10-04 14:49:33 -07:00
6db0cc472c add some support for the occupancy API on ROCm (#27390)
Summary:
Unfortunately, the HIP function takes uint32_t* instead of int*, so we still need to ifdef for the time being.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27390

Differential Revision: D17768832

Pulled By: bddppq

fbshipit-source-id: c65176660cb0783a04f0a4a064f686818d759589
2019-10-04 14:45:53 -07:00
3c2cd8cc10 Some hipify script cleanups (#27375)
Summary:
continue https://github.com/pytorch/pytorch/issues/26363
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27375

Differential Revision: D17764992

Pulled By: bddppq

fbshipit-source-id: ecc06521179677efcedb1d58ceda63df7d63627e
2019-10-04 14:43:22 -07:00
8b61a220c0 C++ API parity: LeakyReLU
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27059

Test Plan: Imported from OSS

Differential Revision: D17682407

Pulled By: pbelevich

fbshipit-source-id: 2a4f42e9438799ba8de7282ac7a6fd3ff97ee048
2019-10-04 14:18:03 -07:00
badb08d577 Add clip_grad_norm_ to c++ api (#26140)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26140

Per https://github.com/pytorch/pytorch/issues/25883, we want to work
towards C++/Python API parity. This diff adds clip_grad_norm_ to the c++ API to
improve parity.

ghstack-source-id: 91334333
ghstack-source-id: 91334333

Test Plan: Added a unit test

Differential Revision: D17312367

fbshipit-source-id: 753ba3a4d084d01f3cc8919da3108e67c809ad65
2019-10-04 13:50:36 -07:00
646e214706 ProcessGroupNCCL should respect timeout passed in to init_process_group. (#27224)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27224

As part of adding error handling to NCCL, we are now able to specify a
timeout for operations using ProcessGroupNCCL. Although, this timeout had a
default of 10 seconds and didn't respect the timeout specified in
init_process_group.

In this change, I've ensured we pass the appropriate timeout to
ProcessGroupNCCL.
ghstack-source-id: 91283548

Test Plan:
Added unit test to verify timeout passed in to init_process_group is
respected.

Differential Revision: D17717992

fbshipit-source-id: c73320187f1f3b2693ba1e177d80646e282d01a2
2019-10-04 13:28:57 -07:00
f4c37e6b32 fix OSX CI build (#27373)
Summary:
fix OSX caffe2 CI build, attempt 1
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373

Differential Revision: D17768461

Pulled By: soumith

fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e
2019-10-04 13:06:58 -07:00
192ca9730f C++ API parity: Hardtanh
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27038

Test Plan: Imported from OSS

Differential Revision: D17682405

Pulled By: pbelevich

fbshipit-source-id: f65e76696e0041c3518f56da94f2e3b800305234
2019-10-04 12:53:33 -07:00
0be6641fbf add function to get nccl version for error messages (#27068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27068

Adds a function that uses ncclGetVersion from the NCCL API to retrieve the NCCL version. Converts it into a readable string, and is called in NCCL-related error messages to log the NCCL version. Hopefully this will help with debugging NCCL errors.

Test Plan:
Modify C10D_NCCL_CHECK in NCCLUtils.hpp to always error by setting ncclResult_t error = ncclSystemError
force an NCCL error with script test/simulate_nccl_errors.py:
Start master node: python test/simulate_nccl_errors.py localhost 9124 0 2
Start other node: python test/simulate_nccl_errors.py localhost 9124 1 2
On the master node, should see the following error message w/NCCL version:

```
Traceback (most recent call last):
  File "simulate_nccl_errors.py", line 29, in <module>
    process_group.allreduce(torch.rand(10).cuda(rank)).wait()
RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:375, unhandled system error, NCCL version 2.4.8
```

Differential Revision: D17639476

fbshipit-source-id: a2f558ad9e883b6be173cfe758ec56cf140bc1ee
2019-10-04 12:49:45 -07:00
a33dbccf60 Fix some return std::move warnings (#27384)
Summary:
clang-tidy was complaining about these
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27384

Pulled By: driazati

Differential Revision: D17767412

fbshipit-source-id: 03e2630790edf3f6bbf9064e754156613032b464
2019-10-04 12:30:13 -07:00
a6bb8b52d4 Reduce error context from 10 -> 3 (#26765)
Summary:
10 lines of error context (on both sides) is overkill, especially now
that we have line numbers. With a compilation stack of a couple
functions, it becomes a pain to scroll to the top of the stack to see
the real error every time.

This also fixes class names in the compilation stack to a format of
`ClassName.method_name` instead of the the full qualified name
Old output
```
clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
        top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
        batch_idx = torch.arange(num_images, device=device)[:, None]
        objectness = objectness[batch_idx, top_n_idx]
        levels = levels[batch_idx, top_n_idx]
        proposals = proposals[batch_idx, top_n_idx]

        final_boxes = []
        final_scores = []
        for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            keep = box_ops.remove_small_boxes(boxes, self.min_size)
            boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
            # non-maximum suppression, independently done per level
            keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh)
            # keep only topk scoring predictions
            keep = keep[:self.post_nms_top_n]
            boxes, scores = boxes[keep], scores[keep]
            final_boxes.append(boxes)
            final_scores.append(scores)
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
        num_images = len(anchors)
        num_anchors_per_level = [o[0].numel() for o in objectness]
        objectness, pred_bbox_deltas = \
            concat_box_prediction_layers(objectness, pred_bbox_deltas)
        # apply pred_bbox_deltas to anchors to obtain the decoded proposals
        # note that we detach the deltas because Faster R-CNN do not backprop through
        # the proposals
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        losses = {}
        if self.training:
            assert targets is not None
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets)
            losses = {
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
        """
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")
        original_image_sizes = [(img.shape[-2], img.shape[-3])  for img in images]

        images, targets = self.transform(images, targets)
        features = self.backbone(images.tensors)
        if isinstance(features, torch.Tensor):
            features = OrderedDict([(0, features)])
        proposals, proposal_losses = self.rpn(images, features, targets)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
        detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

        losses = {}
        losses.update(detector_losses)
        losses.update(proposal_losses)

        # TODO: multiple return types??
        # if self.training:
```

New output

```
RuntimeError:

clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor):
Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'.
:
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20
        final_scores = []
        for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes):
            boxes = box_ops.clip_boxes_to_image(boxes, img_shape)
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            keep = box_ops.remove_small_boxes(boxes, self.min_size)
            boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep]
'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
        proposals = proposals.view(num_images, -1, 4)
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE

        losses = {}
'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward'
at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8
        if isinstance(features, torch.Tensor):
            features = OrderedDict([(0, features)])
        proposals, proposal_losses = self.rpn(images, features, targets)
        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
        detections = self.transform.postprocess
```
](https://our.intern.facebook.com/intern/diff/17560963/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26765

Pulled By: driazati

Differential Revision: D17560963

fbshipit-source-id: e463548744b505ca17f0158079b80e08fda47d49
2019-10-04 11:24:52 -07:00
9f9c6c0999 From docs of scatter_add_() removed erroneous comment on uniqueness of indices. (#27132)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/27080
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27132

Differential Revision: D17765307

Pulled By: soumith

fbshipit-source-id: b0892ff442f3b49f8e3cdf029e2a08b51fa88f28
2019-10-04 11:02:19 -07:00
50b3f9d815 Allow use cpu_serial_kernel with void-lambda (#27370)
Summary:
https://github.com/pytorch/pytorch/pull/27271
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27370

Differential Revision: D17763265

Pulled By: ifedan

fbshipit-source-id: d670560dfc555db529b18c01aa42f0ccb2127889
2019-10-04 10:04:44 -07:00
19ab5381c3 Add OPN instruction and vararg operator table (#27104)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104

* The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter.
* (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic").
* A vararg operator table is built with void(int input_size, Stack& stack) functions.
## Unit test
LiteInterpreterConv covers OPN instruction and conv operator.

Test Plan: Imported from OSS

Differential Revision: D17762853

fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83
2019-10-04 09:35:53 -07:00
e166bcbbde Make RpcTest re-usable by other RPC backends by using init_method to initialize a RPC backend (#27320)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27320

https://github.com/pytorch/pytorch/pull/27208/

# Problem

Other RPC backends take init_method.

# Solution

Set up init_method in rpc tests.
ghstack-source-id: 91335127

Differential Revision: D17709219

fbshipit-source-id: 3184c6e9b922a6ff9f4d1cb9abfa118b23f43eeb
2019-10-04 09:20:05 -07:00
28b1f586f6 Change schedulers to chainable form (#26423)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423

Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).

* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

# #20527

### Before

The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
  lr_scheduler.step(epoch)
  print(optimizer.param_groups[0]['lr'])
```

### After

If the user wants to step
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:

  # Check if epoch number has changed manually
  if epoch-last_epoch > 0:
    lr_scheduler.step()
  last_epoch = epoch

  print(epoch, scheduler.get_computed_values())
```

# #22107

### Before

```
import torch
from torchvision.models import resnet18
net = resnet18()

optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
  # Scheduler computes and returns new learning rate, leading to unexpected behavior
  print(i, scheduler.get_lr())
  scheduler.step()
```

### After

```
import torch
from torchvision.models import resnet18

net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
    # Returns last computed learning rate by scheduler
    print(i, lr_scheduler.get_computed_values())
    lr_scheduler.step()
```

# ghstack

This contains the changes from #24352. Opening again since they were reverted.

This reverts commit 1c477b7e1f378e9c1f8efed296241f68a8a4372b.

Test Plan: Imported from OSS

Differential Revision: D17460427

Pulled By: vincentqb

fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9
2019-10-04 08:53:14 -07:00
da669c25ee autograd: double backwards function for binary_cross_entropy loss
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26983

Reviewed By: albanD

Differential Revision: D17714357

Pulled By: anjali411

fbshipit-source-id: cebfe09a9048c4be457b7f2718bc396c06ecabee
2019-10-04 08:29:22 -07:00
c389156fc4 move new_zeros to core from THP (#26511)
Summary:
Fix for issue https://github.com/pytorch/pytorch/issues/25831

ezyang can you please have a look?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26511

Differential Revision: D17763037

Pulled By: ezyang

fbshipit-source-id: 3596c01c4ab421e7785d6055cc813806f840a5c7
2019-10-04 08:23:35 -07:00
b7fb2b8862 Implement pickle support for sparse tensors and torch.layout instances (#27062)
Summary:
Resolves issue https://github.com/pytorch/pytorch/issues/16667 and https://github.com/OpenMined/PySyft/issues/2326
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27062

Differential Revision: D17762932

Pulled By: ezyang

fbshipit-source-id: dd99c1f4ac8eb2286eb55aa20ce973f60ce7b7e1
2019-10-04 08:09:32 -07:00
76fc028533 Revert D17743310: [pytorch][PR] Allow use cpu_serial_kernel with void-lambda
Test Plan: revert-hammer

Differential Revision:
D17743310

Original commit changeset: a149751f2d67

fbshipit-source-id: 043240201d67966dd08b7b1bc2f9bf4897923e00
2019-10-04 08:00:49 -07:00
081069e8ca Remove CUDA_VERSION from Python script (which has already been detected in CMake) (#27316)
Summary:
(Intentionally left blank)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27316

Differential Revision: D17762715

Pulled By: ezyang

fbshipit-source-id: 044c0ea6e8c2d12912c946a9a50b934b5253d8c8
2019-10-04 07:49:57 -07:00
e29baaca3d Make align_to method-only. (#27304)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304

The ellipsis version of `align_to` only works if it is called as a
method. To prevent any confusion, this PR disables `torch.align_to` (but
keeps `Tensor.align_to`.

Test Plan: - [namedtensor ci]

Differential Revision: D17743809

Pulled By: zou3519

fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241
2019-10-04 07:18:52 -07:00
13c39c8ecc Remove six dependency (#27282)
Summary:
https://github.com/pytorch/pytorch/pull/27136 added a dependency on `six`, which is not available by default and is not marked as a dependency on PyTorch binaries, causing torchvision CI to break, see https://circleci.com/gh/pytorch/vision/20778?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link for example.

This PR use `torch._six` instead of `six` as a replacement.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27282

Reviewed By: lerks

Differential Revision: D17737561

Pulled By: fmassa

fbshipit-source-id: 7dcd0cc2c8bab27b8f4535f664f60388818d3497
2019-10-04 04:56:25 -07:00
a7de545c63 Makes test_cuda.py's generated tensor op tests generic (#27210)
Summary:
- The tensor op tests generated in test_cuda.py are now generic and appear in test_torch,py
- Data previously held in auxiliary data structures and files, like test_cuda_ignores.txt, is inlined

Previously the tensor op tests used several auxiliary data structures, a file, and exception handling to filter the test suite. If a function wasn't implemented, for example, that exception would be caught. This let functions like trigamma, which isn't callable, appear to be tested. See https://github.com/pytorch/pytorch/issues/27230. Filtering from additional data stores is error prone, too. It requires developers understand what data stores are used and how they're used. The existing sources are also sometimes incorrect. The txt file claims that dist_ doesn't work on half tensors, for example, but the updated tests verify it does.

In addition to making these tests generic, this PR removes those auxiliary data structures and does not catch any exceptions. Exceptions are errors. (This also means that if something implemented breaks it will now report as an error. Previously the test suite would have reported a pass.) The test infrastructure was also simplified to not perform computations with CPU half tensors since they do not support many operations. This introduces a float<->half conversion quirk but eliminates awkward functions that would first convert cpu tensors to float, perform an operation, and convert them back.

With this change test_cuda.py is almost entirely CUDA-specific.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27210

Differential Revision: D17757907

Pulled By: mruberry

fbshipit-source-id: b3c191c379667b1a7d5361087bdf82f397f77f65
2019-10-04 02:40:59 -07:00
527b10c2d1 Fixes PackedSequence.to (and unifies PackedSequence conversions) (#27245)
Summary:
PackedSequence.to(device) incorrectly places one of three tensors on the device and leaves the other two tensors where they are. If these devices are distinct then further operations on PackedSequence will fail. This behavior is inconsistent with Tensor.to and PackedSequence's behavior when .cuda() is called.

Additionally, PackedSequence defines multiple other conversion functions that were independently and inconsistently implemented.

This PR unifies all implementations and makes the PackedSequence.to behavior more consistent with Tensor.to. It is not completely consistent per comments. test_device_mask in test_nn.py is updated to validate the new functionality.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27245

Differential Revision: D17757850

Pulled By: mruberry

fbshipit-source-id: 58f0bd40f1aa300fb0a91ee743483d645f977dc5
2019-10-04 02:22:41 -07:00
76f847546b Enable Python3.6 PyTorch ROCm CI
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27353

Differential Revision: D17758495

Pulled By: bddppq

fbshipit-source-id: 95e329bc30f092e4093a33c408f1647b803d9983
2019-10-04 00:23:37 -07:00
d0a4b2f586 Choose num_threads in parallel_for based on GRAIN_SIZE (#26963)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/24080, Continuation of https://github.com/pytorch/pytorch/issues/26886

What soumith said in https://github.com/pytorch/pytorch/pull/26886#issuecomment-535760635 seems plausible
> I wonder if it has to do with `#pragma omp parallel num_threads(num_threads)` which has unintended consequences, where even if `num_threads=1`, entering an omp block inside an omp block results in bad behavior.

I know for a fact that gcc's openmp doesn't start the thread pool when given `num_threads(1)` but it seems clang behaves differently.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26963

Differential Revision: D17626981

Pulled By: soumith

fbshipit-source-id: 484ffe6cc172382bb5ff49ce1fceda7eba20a512
2019-10-03 23:31:39 -07:00
42e7eb0426 Minor readability fixes to C++ documentation (#27338)
Summary:
Changed `yieldings` to `yielding`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27338

Differential Revision: D17758406

Pulled By: yf225

fbshipit-source-id: 1633834a6ad80449c061ebc330ac24f3e42f5506
2019-10-03 21:45:35 -07:00
2ea1d3d01f refactor extra sugared values (#26270)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26270

We've accumulated a lot of sugared values whose only purpose is
to be instanced-checked against in emitApplyExpr. I need to add
another one to insert an unchecked_cast, and do not want to continue
the pattern. This creates an abstraction for this concept (SpecialFormValue),
and removes all the unneeded sugared values. There is no functionality
change here just a bunch of code movement in compiler.cpp

Test Plan: Imported from OSS

Differential Revision: D17412854

Pulled By: zdevito

fbshipit-source-id: 15877c91decaea5a00d1fe737ed2d0f0f8a79a28
2019-10-03 21:25:05 -07:00
9ade1e6944 improve error messages when a method or attribute is missing (#27110)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27110

Previously missing methods on some types like tensors would talk about
'builtins' which are only a thing inside of the compiler. Furthermore,
the error would only occur when the builtin was applied and it was discovered
that no builtin existed. This changes the error message so that it
discovers that method on our builtin types does not exist on attribute lookup.

Test Plan: Imported from OSS

Differential Revision: D17677616

Pulled By: zdevito

fbshipit-source-id: 2f7cf6c6093a9c832569c44f4b1044a2e56fe205
2019-10-03 21:25:01 -07:00
ef97841147 Show a warning that not all dir members of quantized work. (#27339)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27339

This PR just shows a warning message.
Eventually we will show a correct __dir__

Test Plan: Imported from OSS

Differential Revision: D17751333

Pulled By: zafartahirov

fbshipit-source-id: e9bc62fd8dd0147979291d0aac3f1afe5b8c7a9f
2019-10-03 20:48:04 -07:00
6bb7433ad5 Replacing the skip_list with white_list in the qconfig propagation
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183

Test Plan: Imported from OSS

Differential Revision: D17700548

Pulled By: zafartahirov

fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16
2019-10-03 20:40:17 -07:00
c874dd91a7 export remainder (#24410)
Summary:
Added ONNX export support for torch.remainder and torch.fmod
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24410

Reviewed By: hl475

Differential Revision: D17466791

Pulled By: houseroad

fbshipit-source-id: afe6519e5f370824e3b4a45b69036a7260fb72cf
2019-10-03 20:15:20 -07:00
736c754739 add sdk support for xcodebuild script
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27358

Test Plan: Imported from OSS

Differential Revision: D17757389

Pulled By: xta0

fbshipit-source-id: ed8e470b9c6329b96297ee7c65ba08759251baad
2019-10-03 20:11:08 -07:00
c3d97c2638 Update to ROCm 2.8 (#27337)
Summary:
New docker images built with tag 324.

Related jenkins changes:
83ec813357
aa235a14c8

Triggered CI runs:
https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337

Differential Revision: D17753827

Pulled By: bddppq

fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48
2019-10-03 20:03:28 -07:00
86a8971ebb Add a test case to RpcTest, check src/dst (#27322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27322

# Problem

Existing test cases are too symmetric, so that didn't detect this error, request sent to the wrong worker.

Because of wrong `worker_names` setup, worker0 sends request to itself, while it should had sent to worker1.

# Solution

Add a test case, letting the dst side to check if it's an request from the expected src.
ghstack-source-id: 91299312

Reviewed By: satgera

Differential Revision: D17069062

fbshipit-source-id: ef7a532dd497bfc0f0ee8446fcd5d29656aaf175
2019-10-03 18:59:59 -07:00
f5df46ce39 Set MINIZ_NO_TIME to avoid computing localtime on each pickle/unpickle (#27268)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27268

For small pickle/unpickle, we spend a disproportionate amount of time in
time functions - roughly 23% in __tzset() for unpickle case.

We're currently not using the .m_time currently, though we can add this feature
back if it's ever needed.

An alternative would be to -DMINIZ_NO_TIME in compiler_flags, but we would
need to also consistently # define MINIZ_NO_TIME in any .cpp including this .h,
since this # define modifies the struct length in an unfortunate manner.

Test Plan:
buck test mode/dev-nosan caffe2/test/...
Run benchmark:
 buck-out/opt/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ThriftRpcAgentBench

Differential Revision: D17724198

fbshipit-source-id: b44a0217b1d9f8ce6c0f24297f59045c7cadf4b1
2019-10-03 17:59:33 -07:00
2486b0ba82 Add Python RRef as args and return value (#25499)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499

See #23110 for model parallel design details, and #26759 for the RRef
protocol. This commit add support for using RRef as Python UDF arguments
and return value. RRefs can now be shared from owner to user, from user to
owner, or from user to user.

Limitations:
1. No implicit type conversion yet. (#27099)
2. No failure handling and retry. (#26116)
3. UDF is not yet blocked until all RRefs are confirmed. (#27098)
4. Internal RRef control messages are not idempotent yet. (#26116)
5. Cannot delete RRefs correctly when there are circular dependencies. (#27096)

Main changes:

1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations.
2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages.
3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`.
4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure.
5.  Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs.
6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`.

Test Plan:
Imported from OSS

buck test mode/dev-nosan //caffe2/test:rpc_fork

Differential Revision: D17184146

Pulled By: mrshenli

fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265
2019-10-03 17:47:12 -07:00
8fe5dcf699 Skip tests that use numpy if it's not present
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27165

Pulled By: driazati

Differential Revision: D17695078

fbshipit-source-id: d25c920f4c43285028537f88761d47a2c9db7b8f
2019-10-03 17:18:41 -07:00
827a00cf63 Support interface python assignment as an attribute (#26734)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26734

This PR added the python assignment for interface as an attribute in the
module, it enables any object that implicitly inheriting the specific
interface to be able to be assigned to the interface type in python.

Serialization support for interface/class assignment will be done in the
follow up PR

Test Plan: Imported from OSS

Differential Revision: D17742708

Pulled By: wanchaol

fbshipit-source-id: a0a2d8c74b60ed3fa6c05e1b0d49b7ad1abc670b
2019-10-03 17:18:37 -07:00
cc964765a5 Add method add_hparams to API doc (#27344)
Summary:
Adds the method `add_hparams` to `torch.utils.tensorboard` API docs. Will want to have this in PyTorch 1.3 release.

cc sanekmelnikov lanpa natalialunova
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27344

Differential Revision: D17753689

Pulled By: orionr

fbshipit-source-id: cc8636e0bdcf3f434444cd29471c62105491039d
2019-10-03 17:07:45 -07:00
99c32d97fa Migrate the cpu and gpu implementations of resize nearest 3D from vision to caffe2
Summary: As title. Fix the build failures in unicorn-build-restrictions as discussed in D17330625

Test Plan:
buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test

In vision libs, no need to explicitly add dep to resize 3d op as the caffe2_cpu dep is added by default.

Reviewed By: stephenyan1231

Differential Revision: D17676082

fbshipit-source-id: c034ab67a9078f72077b396991ffb9e54e6ab40b
2019-10-03 16:14:00 -07:00
74572fc985 Relax restrictions on set_num_threads (#27190)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27190

Allow set_num_threads to be called multiple times in case of TBB
parallel backend

Test Plan:
BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop
install  --cmake
./build/bin/test_parallel
./build/bin/thread_init_test

Reviewed By: kostmo

Differential Revision: D17704236

Pulled By: ilia-cher

fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e
2019-10-03 15:51:03 -07:00
a444054d4b Fix build (#27318)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27318

Fix TBB build
USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake

Test Plan: Imported from OSS

Differential Revision: D17747449

Pulled By: ilia-cher

fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4
2019-10-03 15:43:06 -07:00
05df6b67c6 C++ API parity: TensorTest.BackwardNonScalarOutputs
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27314

Test Plan: Imported from OSS

Differential Revision: D17746371

Pulled By: pbelevich

fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e
2019-10-03 15:36:35 -07:00
0c4bc27539 Mention magma-cuda101 package in install instructions (#27325)
Summary:
There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/27325

Differential Revision: D17749535

Pulled By: soumith

fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200
2019-10-03 15:21:53 -07:00
1769 changed files with 117202 additions and 57098 deletions

View File

@ -340,12 +340,12 @@ Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for
All linux builds occur in docker images. The docker images are
* soumith/conda-cuda
* pytorch/conda-cuda
* Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds
* Also used for cpu builds
* soumith/manylinux-cuda90
* soumith/manylinux-cuda92
* soumith/manylinux-cuda100
* pytorch/manylinux-cuda90
* pytorch/manylinux-cuda92
* pytorch/manylinux-cuda100
* Also used for cpu builds
The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.
@ -411,7 +411,7 @@ You can build Linux binaries locally easily using docker.
```
# Run the docker
# Use the correct docker image, soumith/conda-cuda used here as an example
# Use the correct docker image, pytorch/conda-cuda used here as an example
#
# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the
# machine that you're running the command on) accessible to the docker
@ -426,7 +426,7 @@ docker run \
-v your/pytorch/repo:/pytorch \
-v your/builder/repo:/builder \
-v where/you/want/packages/to/appear:/final_pkgs \
-it soumith/conda-cuda /bin/bash
-it pytorch/conda-cuda /bin/bash
# Export whatever variables are important to you. All variables that you'd
# possibly need are in .circleci/scripts/binary_populate_env.sh

View File

@ -1,5 +1,3 @@
#!/usr/bin/env python3
"""
This module models the tree of configuration variants
for "smoketest" builds.

View File

@ -1,11 +1,8 @@
#!/usr/bin/env python3
from collections import OrderedDict
import cimodel.data.binary_build_data as binary_build_data
import cimodel.lib.conf_tree as conf_tree
import cimodel.lib.miniutils as miniutils
import cimodel.lib.visualization as visualization
class Conf(object):
@ -27,7 +24,7 @@ class Conf(object):
def gen_docker_image(self):
if self.gcc_config_variant == 'gcc5.4_cxx11-abi':
return miniutils.quote("soumith/conda-cuda-cxx11-ubuntu1604:latest")
return miniutils.quote("pytorch/conda-cuda-cxx11-ubuntu1604:latest")
docker_word_substitution = {
"manywheel": "manylinux",
@ -36,18 +33,23 @@ class Conf(object):
docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)
# The cpu nightlies are built on the soumith/manylinux-cuda100 docker image
# The cpu nightlies are built on the pytorch/manylinux-cuda100 docker image
alt_docker_suffix = self.cuda_version or "100"
docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix
return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
if self.cuda_version == "101":
return "soumith/manylinux-cuda101@sha256:5d62be90d5b7777121180e6137c7eed73d37aaf9f669c51b783611e37e0b4916"
return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)
def get_name_prefix(self):
return "smoke" if self.smoke else "binary"
def gen_build_name(self, build_or_test):
def gen_build_name(self, build_or_test, nightly):
parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()
if nightly:
parts.append("nightly")
if self.libtorch_variant:
parts.append(self.libtorch_variant)
@ -57,17 +59,22 @@ class Conf(object):
joined = "_".join(parts)
return joined.replace(".", "_")
def gen_workflow_job(self, phase, upload_phase_dependency=None):
def gen_workflow_job(self, phase, upload_phase_dependency=None, nightly=False):
job_def = OrderedDict()
job_def["name"] = self.gen_build_name(phase)
job_def["name"] = self.gen_build_name(phase, nightly)
job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))
job_def["requires"] = ["setup"]
job_def["filters"] = {"branches": {"only": "nightly"}}
if self.smoke:
job_def["requires"].append("update_s3_htmls_for_nightlies")
job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")
job_def["filters"] = {"branches": {"only": "postnightly"}}
else:
job_def["filters"] = {"branches": {"only": "nightly"}}
if self.libtorch_variant:
job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)
if phase == "test":
if not self.smoke:
job_def["requires"].append(self.gen_build_name("build"))
job_def["requires"].append(self.gen_build_name("build", nightly))
if not (self.smoke and self.os == "macos"):
job_def["docker_image"] = self.gen_docker_image()
@ -82,7 +89,7 @@ class Conf(object):
job_def["resource_class"] = "gpu.medium"
if phase == "upload":
job_def["context"] = "org-member"
job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency)]
job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency, nightly)]
os_name = miniutils.override(self.os, {"macos": "mac"})
job_name = "_".join([self.get_name_prefix(), os_name, phase])
@ -127,7 +134,7 @@ def get_nightly_uploads():
mylist = []
for conf in configs:
phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"
mylist.append(conf.gen_workflow_job("upload", phase_dependency))
mylist.append(conf.gen_workflow_job("upload", phase_dependency, nightly=True))
return mylist
@ -138,32 +145,25 @@ def get_nightly_tests():
tests = []
for conf_options in filtered_configs:
yaml_item = conf_options.gen_workflow_job("test")
yaml_item = conf_options.gen_workflow_job("test", nightly=True)
tests.append(yaml_item)
return tests
def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):
jobs_list = ["setup"]
def get_jobs(toplevel_key, smoke):
jobs_list = []
configs = gen_build_env_list(smoke)
phase = "build" if toplevel_key == "binarybuilds" else "test"
for build_config in configs:
jobs_list.append(build_config.gen_workflow_job(phase))
jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))
jobs_dict[toplevel_key] = OrderedDict(
jobs=jobs_list,
)
graph = visualization.generate_graph(get_root(smoke, toplevel_key))
graph.draw(toplevel_key + "-config-dimensions.png", prog="twopi")
return jobs_list
def add_binary_build_jobs(jobs_dict):
add_jobs_and_render(jobs_dict, "binarybuilds", False, "5 5 * * *")
def get_binary_build_jobs():
return get_jobs("binarybuilds", False)
def add_binary_smoke_test_jobs(jobs_dict):
add_jobs_and_render(jobs_dict, "binarysmoketests", True, "15 16 * * *")
def get_binary_smoke_test_jobs():
return get_jobs("binarysmoketests", True)

View File

@ -1,38 +1,11 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
from cimodel.lib.conf_tree import ConfigNode, XImportant
from cimodel.lib.conf_tree import Ver
CONFIG_TREE_DATA = [
(Ver("ubuntu", "14.04"), [
(Ver("gcc", "4.8"), [X("py2")]),
(Ver("gcc", "4.9"), [X("py2")]),
]),
(Ver("ubuntu", "16.04"), [
(Ver("cuda", "9.0"), [
# TODO make explicit that this is a "secret TensorRT build"
# (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)
# TODO Uh oh, were we supposed to make this one important?!
X("py2"),
XImportant("cmake"),
]),
(Ver("cuda", "10.1"), [XImportant("py3.5")]), # TensorRT 6 build
(Ver("mkl"), [XImportant("py2")]),
(Ver("gcc", "5"), [XImportant("onnx_py2")]),
(Ver("clang", "3.8"), [X("py2")]),
(Ver("clang", "3.9"), [X("py2")]),
(Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),
(Ver("android"), [XImportant("py2")]),
]),
(Ver("centos", "7"), [
(Ver("cuda", "9.0"), [X("py2")]),
]),
(Ver("macos", "10.13"), [
# TODO ios and system aren't related. system qualifies where the python comes
# from (use the system python instead of homebrew or anaconda)
(Ver("ios"), [X("py2")]),
(Ver("system"), [XImportant("py2")]),
([Ver("gcc", "5")], [XImportant("onnx_py2")]),
([Ver("clang", "7")], [XImportant("onnx_py3.6")]),
]),
]
@ -56,13 +29,12 @@ class TreeConfigNode(ConfigNode):
def is_build_only(self):
if str(self.find_prop("language_version")) == "onnx_py3.6":
return False
return str(self.find_prop("compiler_version")) in [
"gcc4.9",
return set(str(c) for c in self.find_prop("compiler_version")).intersection({
"clang3.8",
"clang3.9",
"clang7",
"android",
] or self.find_prop("distro_version").name == "macos"
}) or self.find_prop("distro_version").name == "macos"
class TopLevelNode(TreeConfigNode):

View File

@ -1,5 +1,3 @@
#!/usr/bin/env python3
from collections import OrderedDict
import cimodel.data.dimensions as dimensions
@ -14,23 +12,29 @@ from dataclasses import dataclass
DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"
DOCKER_IMAGE_VERSION = 315
DOCKER_IMAGE_VERSION = 345
@dataclass
class Conf:
language: str
distro: Ver
compiler: Ver
# There could be multiple compiler versions configured (e.g. nvcc
# for gpu files and host compiler (gcc/clang) for cpu files)
compilers: [Ver]
build_only: bool
is_important: bool
@property
def compiler_names(self):
return [c.name for c in self.compilers]
# TODO: Eventually we can probably just remove the cudnn7 everywhere.
def get_cudnn_insertion(self):
omit = self.language == "onnx_py2" \
or self.language == "onnx_py3.6" \
or self.compiler.name in ["android", "mkl", "clang"] \
or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \
or str(self.distro) in ["ubuntu14.04", "macos10.13"]
return [] if omit else ["cudnn7"]
@ -42,7 +46,7 @@ class Conf:
] + self.get_build_name_middle_parts()
def get_build_name_middle_parts(self):
return [str(self.compiler)] + self.get_cudnn_insertion() + [str(self.distro)]
return [str(c) for c in self.compilers] + self.get_cudnn_insertion() + [str(self.distro)]
def construct_phase_name(self, phase):
root_parts = self.get_build_name_root_parts()
@ -82,11 +86,11 @@ class Conf:
build_env_name = "-".join(parts)
parameters["build_environment"] = miniutils.quote(build_env_name)
if self.compiler.name == "ios":
if "ios" in self.compiler_names:
parameters["build_ios"] = miniutils.quote("1")
if phase == "test":
# TODO cuda should not be considered a compiler
if self.compiler.name == "cuda":
if "cuda" in self.compiler_names:
parameters["use_cuda_docker_runtime"] = miniutils.quote("1")
if self.distro.name != "macos":
@ -94,7 +98,7 @@ class Conf:
if self.build_only:
parameters["build_only"] = miniutils.quote("1")
if phase == "test":
resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"
resource_class = "large" if "cuda" not in self.compiler_names else "gpu.medium"
parameters["resource_class"] = resource_class
return parameters
@ -127,11 +131,10 @@ def instantiate_configs():
root = get_root()
found_configs = conf_tree.dfs(root)
for fc in found_configs:
c = Conf(
language=fc.find_prop("language_version"),
distro=fc.find_prop("distro_version"),
compiler=fc.find_prop("compiler_version"),
compilers=fc.find_prop("compiler_version"),
build_only=fc.find_prop("build_only"),
is_important=fc.find_prop("important"),
)
@ -145,12 +148,8 @@ def get_workflow_jobs():
configs = instantiate_configs()
# TODO Why don't we build this config?
# See https://github.com/pytorch/pytorch/pull/17323#discussion_r259450540
filtered_configs = filter(lambda x: not (str(x.distro) == "ubuntu14.04" and str(x.compiler) == "gcc4.9"), configs)
x = []
for conf_options in filtered_configs:
for conf_options in configs:
phases = ["build"]
if not conf_options.build_only:

View File

@ -1,6 +1,3 @@
#!/usr/bin/env python3
PHASES = ["build", "test"]
CUDA_VERSIONS = [

View File

@ -1,5 +1,3 @@
#!/usr/bin/env python3
from cimodel.lib.conf_tree import ConfigNode, X, XImportant
@ -12,27 +10,25 @@ CONFIG_TREE_DATA = [
X("nightly"),
]),
("gcc", [
("4.8", [X("3.6")]),
("5.4", [ # All this subtree rebases to master and then build
XImportant("3.6"),
("3.6", [
("namedtensor", [XImportant(True)]),
("parallel_tbb", [XImportant(True)]),
("parallel_native", [XImportant(True)]),
]),
]),
# TODO: bring back libtorch test
("7", [X("3.6")]),
]),
("clang", [
("5", [
XImportant("3.6"), # This is actually the ASAN build
("3.6", [
("namedtensor", [XImportant(True)]), # ASAN
]),
]),
("7", [
("3.6", [
("xla", [XImportant(True)]),
]),
]),
# ("7", [
# ("3.6", [
# ("xla", [XImportant(True)]),
# ]),
# ]),
]),
("cuda", [
("9", [
@ -43,10 +39,9 @@ CONFIG_TREE_DATA = [
# and
# https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153
# (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)
X("2.7"),
XImportant("3.6"),
("2.7", [
("namedtensor", [XImportant(True)]),
("3.6", [
("libtorch", [XImportant(True)])
]),
]),
("9.2", [X("3.6")]),
@ -129,7 +124,9 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):
next_nodes = {
"xla": XlaConfigNode,
"namedtensor": NamedTensorConfigNode,
"parallel_tbb": ParallelTBBConfigNode,
"parallel_native": ParallelNativeConfigNode,
"libtorch": LibTorchConfigNode,
"important": ImportantConfigNode,
"android_abi": AndroidAbiConfigNode,
}
@ -146,13 +143,32 @@ class XlaConfigNode(TreeConfigNode):
def child_constructor(self):
return ImportantConfigNode
class NamedTensorConfigNode(TreeConfigNode):
class ParallelTBBConfigNode(TreeConfigNode):
def modify_label(self, label):
return "NAMEDTENSOR=" + str(label)
return "PARALLELTBB=" + str(label)
def init2(self, node_name):
self.props["is_namedtensor"] = node_name
self.props["parallel_backend"] = "paralleltbb"
def child_constructor(self):
return ImportantConfigNode
class ParallelNativeConfigNode(TreeConfigNode):
def modify_label(self, label):
return "PARALLELNATIVE=" + str(label)
def init2(self, node_name):
self.props["parallel_backend"] = "parallelnative"
def child_constructor(self):
return ImportantConfigNode
class LibTorchConfigNode(TreeConfigNode):
def modify_label(self, label):
return "BUILD_TEST_LIBTORCH=" + str(label)
def init2(self, node_name):
self.props["is_libtorch"] = node_name
def child_constructor(self):
return ImportantConfigNode

View File

@ -1,5 +1,3 @@
#!/usr/bin/env python3
from collections import OrderedDict
from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA
@ -15,7 +13,7 @@ DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"
# ARE YOU EDITING THIS NUMBER? MAKE SURE YOU READ THE GUIDANCE AT THE
# TOP OF .circleci/config.yml
DOCKER_IMAGE_VERSION = 347
DOCKER_IMAGE_VERSION = 405
@dataclass
@ -33,8 +31,9 @@ class Conf:
gpu_resource: Optional[str] = None
dependent_tests: List = field(default_factory=list)
parent_build: Optional['Conf'] = None
is_namedtensor: bool = False
is_libtorch: bool = False
is_important: bool = False
parallel_backend: Optional[str] = None
# TODO: Eliminate the special casing for docker paths
# In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch
@ -47,8 +46,10 @@ class Conf:
leading.append("pytorch")
if self.is_xla and not for_docker:
leading.append("xla")
if self.is_namedtensor and not for_docker:
leading.append("namedtensor")
if self.is_libtorch and not for_docker:
leading.append("libtorch")
if self.parallel_backend is not None and not for_docker:
leading.append(self.parallel_backend)
cuda_parms = []
if self.cuda_version:
@ -159,7 +160,7 @@ def gen_dependent_configs(xenial_parent_config):
configs.append(c)
for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push"]:
configs.append(HiddenConf(x, parent_build=xenial_parent_config))
return configs
@ -209,6 +210,7 @@ def instantiate_configs():
android_abi = fc.find_prop("android_abi")
parms_list_ignored_for_docker_image.append(android_abi)
restrict_phases = ["build"]
fc.props["is_important"] = True
elif compiler_name:
gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")
@ -224,8 +226,9 @@ def instantiate_configs():
# TODO The gcc version is orthogonal to CUDA version?
parms_list.append("gcc7")
is_namedtensor = fc.find_prop("is_namedtensor") or False
is_libtorch = fc.find_prop("is_libtorch") or False
is_important = fc.find_prop("is_important") or False
parallel_backend = fc.find_prop("parallel_backend") or None
gpu_resource = None
if cuda_version and cuda_version != "10":
@ -240,22 +243,24 @@ def instantiate_configs():
is_xla,
restrict_phases,
gpu_resource,
is_namedtensor=is_namedtensor,
is_libtorch=is_libtorch,
is_important=is_important,
parallel_backend=parallel_backend,
)
if cuda_version == "9" and python_version == "3.6":
if cuda_version == "9" and python_version == "3.6" and not is_libtorch:
c.dependent_tests = gen_dependent_configs(c)
if (compiler_name == "gcc"
and compiler_version == "5.4"
and not is_namedtensor):
and not is_libtorch
and parallel_backend is None):
bc_breaking_check = Conf(
"backward-compatibility-check",
[],
is_xla=False,
restrict_phases=["test"],
is_namedtensor=False,
is_libtorch=False,
is_important=True,
parent_build=c,
)

View File

@ -1,6 +1,3 @@
#!/usr/bin/env python3
from dataclasses import dataclass, field
from typing import Optional, Dict

View File

@ -1,6 +1,3 @@
#!/usr/bin/env python3
def quote(s):
return sandwich('"', s)

View File

@ -1,6 +1,3 @@
#!/usr/bin/env python3
from collections import OrderedDict

View File

@ -1,5 +1,3 @@
#!/usr/bin/env python3
"""
This module encapsulates dependencies on pygraphviz
"""

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,19 @@
# Docker images for Jenkins
This directory contains everything needed to build the Docker images
that are used in our CI
The Dockerfiles located in subdirectories are parameterized to
conditionally run build stages depending on build arguments passed to
`docker build`. This lets us use only a few Dockerfiles for many
images. The different configurations are identified by a freeform
string that we call a _build environment_. This string is persisted in
each image as the `BUILD_ENVIRONMENT` environment variable.
See `build.sh` for valid build environments (it's the giant switch).
## Contents
* `build.sh` -- dispatch script to launch all builds
* `common` -- scripts used to execute individual Docker build stages
* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker

View File

@ -0,0 +1 @@
<manifest package="org.pytorch.deps" />

View File

@ -0,0 +1,68 @@
buildscript {
ext {
minSdkVersion = 21
targetSdkVersion = 28
compileSdkVersion = 28
buildToolsVersion = '28.0.3'
coreVersion = "1.2.0"
extJUnitVersion = "1.1.1"
runnerVersion = "1.2.0"
rulesVersion = "1.2.0"
junitVersion = "4.12"
}
repositories {
google()
mavenLocal()
mavenCentral()
jcenter()
}
dependencies {
classpath 'com.android.tools.build:gradle:3.3.2'
classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.0"
classpath "com.github.dcendents:android-maven-gradle-plugin:2.1"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
}
}
repositories {
google()
jcenter()
}
apply plugin: 'com.android.library'
android {
compileSdkVersion rootProject.compileSdkVersion
buildToolsVersion rootProject.buildToolsVersion
defaultConfig {
minSdkVersion minSdkVersion
targetSdkVersion targetSdkVersion
}
sourceSets {
main {
manifest.srcFile 'AndroidManifest.xml'
}
}
}
dependencies {
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'androidx.appcompat:appcompat:1.0.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
implementation 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion
implementation 'junit:junit:' + rootProject.junitVersion
implementation 'androidx.test:core:' + rootProject.coreVersion
implementation 'androidx.test.ext:junit:' + rootProject.extJUnitVersion
implementation 'androidx.test:rules:' + rootProject.rulesVersion
implementation 'androidx.test:runner:' + rootProject.runnerVersion
}

275
.circleci/docker/build.sh Executable file
View File

@ -0,0 +1,275 @@
#!/bin/bash
set -ex
image="$1"
shift
if [ -z "${image}" ]; then
echo "Usage: $0 IMAGE"
exit 1
fi
# TODO: Generalize
OS="ubuntu"
DOCKERFILE="${OS}/Dockerfile"
if [[ "$image" == *-cuda* ]]; then
DOCKERFILE="${OS}-cuda/Dockerfile"
fi
if [[ "$image" == *-trusty* ]]; then
UBUNTU_VERSION=14.04
elif [[ "$image" == *-xenial* ]]; then
UBUNTU_VERSION=16.04
elif [[ "$image" == *-artful* ]]; then
UBUNTU_VERSION=17.10
elif [[ "$image" == *-bionic* ]]; then
UBUNTU_VERSION=18.04
fi
# It's annoying to rename jobs every time you want to rewrite a
# configuration, so we hardcode everything here rather than do it
# from scratch
case "$image" in
pytorch-linux-bionic-clang9-thrift-llvmdev)
CLANG_VERSION=9
THRIFT=yes
LLVMDEV=yes
PROTOBUF=yes
;;
pytorch-linux-xenial-py2.7.9)
TRAVIS_PYTHON_VERSION=2.7.9
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py2.7)
TRAVIS_PYTHON_VERSION=2.7
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3.5)
TRAVIS_PYTHON_VERSION=3.5
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc4.8)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=4.8
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3.6-gcc5.4)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=5
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3.6-gcc7.2)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
# Do not install PROTOBUF, DB, and VISION as a test
;;
pytorch-linux-xenial-py3.6-gcc7)
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-pynightly)
TRAVIS_PYTHON_VERSION=nightly
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda8-cudnn7-py2)
CUDA_VERSION=8.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=2.7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda8-cudnn7-py3)
CUDA_VERSION=8.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda9-cudnn7-py2)
CUDA_VERSION=9.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=2.7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda9-cudnn7-py3)
CUDA_VERSION=9.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
PROTOBUF=yes
DB=yes
VISION=yes
KATEX=yes
;;
pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)
CUDA_VERSION=9.2
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)
CUDA_VERSION=10.0
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)
CUDA_VERSION=10.1
CUDNN_VERSION=7
ANACONDA_PYTHON_VERSION=3.6
GCC_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang5-asan)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
PROTOBUF=yes
DB=yes
VISION=yes
;;
pytorch-linux-xenial-py3-clang5-android-ndk-r19c)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=5.0
PROTOBUF=yes
ANDROID=yes
ANDROID_NDK_VERSION=r19c
GRADLE_VERSION=4.10.3
CMAKE_VERSION=3.7.0
NINJA_VERSION=1.9.0
;;
pytorch-linux-xenial-py3.6-clang7)
ANACONDA_PYTHON_VERSION=3.6
CLANG_VERSION=7
PROTOBUF=yes
DB=yes
VISION=yes
;;
esac
# Set Jenkins UID and GID if running Jenkins
if [ -n "${JENKINS:-}" ]; then
JENKINS_UID=$(id -u jenkins)
JENKINS_GID=$(id -g jenkins)
fi
tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"
# Build image
docker build \
--no-cache \
--build-arg "BUILD_ENVIRONMENT=${image}" \
--build-arg "PROTOBUF=${PROTOBUF:-}" \
--build-arg "THRIFT=${THRIFT:-}" \
--build-arg "LLVMDEV=${LLVMDEV:-}" \
--build-arg "DB=${DB:-}" \
--build-arg "VISION=${VISION:-}" \
--build-arg "EC2=${EC2:-}" \
--build-arg "JENKINS=${JENKINS:-}" \
--build-arg "JENKINS_UID=${JENKINS_UID:-}" \
--build-arg "JENKINS_GID=${JENKINS_GID:-}" \
--build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \
--build-arg "CLANG_VERSION=${CLANG_VERSION}" \
--build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \
--build-arg "TRAVIS_PYTHON_VERSION=${TRAVIS_PYTHON_VERSION}" \
--build-arg "GCC_VERSION=${GCC_VERSION}" \
--build-arg "CUDA_VERSION=${CUDA_VERSION}" \
--build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \
--build-arg "ANDROID=${ANDROID}" \
--build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \
--build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \
--build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
.
function drun() {
docker run --rm "$tmp_tag" $*
}
if [[ "$OS" == "ubuntu" ]]; then
if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then
echo "OS=ubuntu, but:"
drun lsb_release -a
exit 1
fi
if !(drun lsb_release -a 2>&1 | grep -qF "$UBUNTU_VERSION"); then
echo "UBUNTU_VERSION=$UBUNTU_VERSION, but:"
drun lsb_release -a
exit 1
fi
fi
if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]]; then
if !(drun python --version 2>&1 | grep -qF "Python $TRAVIS_PYTHON_VERSION"); then
echo "TRAVIS_PYTHON_VERSION=$TRAVIS_PYTHON_VERSION, but:"
drun python --version
exit 1
fi
else
echo "Please manually check nightly is OK:"
drun python --version
fi
fi
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then
echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"
drun python --version
exit 1
fi
fi
if [ -n "$GCC_VERSION" ]; then
if !(drun gcc --version 2>&1 | grep -q " $GCC_VERSION\\W"); then
echo "GCC_VERSION=$GCC_VERSION, but:"
drun gcc --version
exit 1
fi
fi
if [ -n "$CLANG_VERSION" ]; then
if !(drun clang --version 2>&1 | grep -qF "clang version $CLANG_VERSION"); then
echo "CLANG_VERSION=$CLANG_VERSION, but:"
drun clang --version
exit 1
fi
fi
if [ -n "$KATEX" ]; then
if !(drun katex --version); then
echo "KATEX=$KATEX, but:"
drun katex --version
exit 1
fi
fi

View File

@ -0,0 +1,49 @@
#!/bin/bash
set -ex
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*)
}
# If UPSTREAM_BUILD_ID is set (see trigger job), then we can
# use it to tag this build with the same ID used to tag all other
# base image builds. Also, we can try and pull the previous
# image first, to avoid rebuilding layers that haven't changed.
#until we find a way to reliably reuse previous build, this last_tag is not in use
# last_tag="$(( CIRCLE_BUILD_NUM - 1 ))"
tag="${CIRCLE_WORKFLOW_ID}"
registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"
image="${registry}/pytorch/${IMAGE_NAME}"
login() {
aws ecr get-authorization-token --region us-east-1 --output text --query 'authorizationData[].authorizationToken' |
base64 -d |
cut -d: -f2 |
docker login -u AWS --password-stdin "$1"
}
# Retry on timeouts (can happen on job stampede).
retry login "${registry}"
# Logout on exit
trap "docker logout ${registry}" EXIT
# export EC2=1
# export JENKINS=1
# Try to pull the previous image (perhaps we can reuse some layers)
# if [ -n "${last_tag}" ]; then
# docker pull "${image}:${last_tag}" || true
# fi
# Build new image
./build.sh ${IMAGE_NAME} -t "${image}:${tag}"
docker push "${image}:${tag}"
docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"
aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

View File

@ -0,0 +1,129 @@
#!/bin/bash
set -ex
[ -n "${ANDROID_NDK}" ]
apt-get update
apt-get install -y --no-install-recommends autotools-dev autoconf unzip
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
pushd /tmp
curl -Os https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip
popd
_ndk_dir=/opt/ndk
mkdir -p "$_ndk_dir"
unzip -qo /tmp/android*.zip -d "$_ndk_dir"
_versioned_dir=$(find "$_ndk_dir/" -mindepth 1 -maxdepth 1 -type d)
mv "$_versioned_dir"/* "$_ndk_dir"/
rmdir "$_versioned_dir"
rm -rf /tmp/*
# Install OpenJDK
# https://hub.docker.com/r/picoded/ubuntu-openjdk-8-jdk/dockerfile/
sudo apt-get update && \
apt-get install -y openjdk-8-jdk && \
apt-get install -y ant && \
apt-get clean && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer;
# Fix certificate issues, found as of
# https://bugs.launchpad.net/ubuntu/+source/ca-certificates-java/+bug/983302
sudo apt-get update && \
apt-get install -y ca-certificates-java && \
apt-get clean && \
update-ca-certificates -f && \
rm -rf /var/lib/apt/lists/* && \
rm -rf /var/cache/oracle-jdk8-installer;
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
# Installing android sdk
# https://github.com/circleci/circleci-images/blob/staging/android/Dockerfile.m4
_sdk_version=sdk-tools-linux-3859397.zip
_android_home=/opt/android/sdk
rm -rf $_android_home
sudo mkdir -p $_android_home
curl --silent --show-error --location --fail --retry 3 --output /tmp/$_sdk_version https://dl.google.com/android/repository/$_sdk_version
sudo unzip -q /tmp/$_sdk_version -d $_android_home
rm /tmp/$_sdk_version
sudo chmod -R 777 $_android_home
export ANDROID_HOME=$_android_home
export ADB_INSTALL_TIMEOUT=120
export PATH="${ANDROID_HOME}/emulator:${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"
echo "PATH:${PATH}"
alias sdkmanager="$ANDROID_HOME/tools/bin/sdkmanager"
sudo mkdir ~/.android && sudo echo '### User Sources for Android SDK Manager' > ~/.android/repositories.cfg
sudo chmod -R 777 ~/.android
yes | sdkmanager --licenses
yes | sdkmanager --update
sdkmanager \
"tools" \
"platform-tools" \
"emulator"
sdkmanager \
"build-tools;28.0.3" \
"build-tools;29.0.2"
sdkmanager \
"platforms;android-28" \
"platforms;android-29"
sdkmanager --list
# Installing Gradle
echo "GRADLE_VERSION:${GRADLE_VERSION}"
_gradle_home=/opt/gradle
sudo rm -rf $gradle_home
sudo mkdir -p $_gradle_home
wget --no-verbose --output-document=/tmp/gradle.zip \
"https://services.gradle.org/distributions/gradle-${GRADLE_VERSION}-bin.zip"
sudo unzip -q /tmp/gradle.zip -d $_gradle_home
rm /tmp/gradle.zip
sudo chmod -R 777 $_gradle_home
export GRADLE_HOME=$_gradle_home/gradle-$GRADLE_VERSION
alias gradle="${GRADLE_HOME}/bin/gradle"
export PATH="${GRADLE_HOME}/bin/:${PATH}"
echo "PATH:${PATH}"
gradle --version
mkdir /var/lib/jenkins/gradledeps
cp build.gradle /var/lib/jenkins/gradledeps
cp AndroidManifest.xml /var/lib/jenkins/gradledeps
pushd /var/lib/jenkins
export GRADLE_LOCAL_PROPERTIES=gradledeps/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
chown -R jenkins /var/lib/jenkins/gradledeps
chgrp -R jenkins /var/lib/jenkins/gradledeps
sudo -H -u jenkins $GRADLE_HOME/bin/gradle -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble
chown -R jenkins /var/lib/jenkins/.gradle
chgrp -R jenkins /var/lib/jenkins/.gradle
popd
rm -rf /var/lib/jenkins/.gradle/daemon

View File

@ -0,0 +1,75 @@
#!/bin/bash
set -ex
if [[ "$UBUNTU_VERSION" == "14.04" ]]; then
# cmake 2 is too old
cmake3=cmake3
else
cmake3=cmake
fi
if [[ "$UBUNTU_VERSION" == "18.04" ]]; then
cmake3="cmake=3.10*"
else
cmake3="${cmake3}=3.5*"
fi
# Install common dependencies
apt-get update
# TODO: Some of these may not be necessary
# TODO: libiomp also gets installed by conda, aka there's a conflict
ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"
numpy_deps="gfortran"
apt-get install -y --no-install-recommends \
$ccache_deps \
$numpy_deps \
${cmake3} \
apt-transport-https \
autoconf \
automake \
build-essential \
ca-certificates \
curl \
git \
libatlas-base-dev \
libc6-dbg \
libiomp-dev \
libyaml-dev \
libz-dev \
libjpeg-dev \
libasound2-dev \
libsndfile-dev \
python \
python-dev \
python-setuptools \
python-wheel \
software-properties-common \
sudo \
wget \
vim
# Install Valgrind separately since the apt-get version is too old.
mkdir valgrind_build && cd valgrind_build
if ! wget http://valgrind.org/downloads/valgrind-3.14.0.tar.bz2
then
wget https://sourceware.org/ftp/valgrind/valgrind-3.14.0.tar.bz2
fi
tar -xjf valgrind-3.14.0.tar.bz2
cd valgrind-3.14.0
./configure --prefix=/usr/local
make
sudo make install
cd ../../
rm -rf valgrind_build
alias valgrind="/usr/local/bin/valgrind"
# TODO: THIS IS A HACK!!!
# distributed nccl(2) tests are a bit busted, see https://github.com/pytorch/pytorch/issues/5877
if dpkg -s libnccl-dev; then
apt-get remove -y libnccl-dev libnccl2 --allow-change-held-packages
fi
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

View File

@ -0,0 +1,35 @@
#!/bin/bash
set -ex
mkdir -p /opt/cache/bin
mkdir -p /opt/cache/lib
sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment
export PATH="/opt/cache/bin:$PATH"
# Setup compiler cache
curl https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache
chmod a+x /opt/cache/bin/sccache
function write_sccache_stub() {
printf "#!/bin/sh\nexec sccache $(which $1) \$*" > "/opt/cache/bin/$1"
chmod a+x "/opt/cache/bin/$1"
}
write_sccache_stub cc
write_sccache_stub c++
write_sccache_stub gcc
write_sccache_stub g++
write_sccache_stub clang
write_sccache_stub clang++
if [ -n "$CUDA_VERSION" ]; then
# TODO: This is a workaround for the fact that PyTorch's FindCUDA
# implementation cannot find nvcc if it is setup this way, because it
# appears to search for the nvcc in PATH, and use its path to infer
# where CUDA is installed. Instead, we install an nvcc symlink outside
# of the PATH, and set CUDA_NVCC_EXECUTABLE so that we make use of it.
printf "#!/bin/sh\nexec sccache $(which nvcc) \"\$@\"" > /opt/cache/lib/nvcc
chmod a+x /opt/cache/lib/nvcc
fi

View File

@ -0,0 +1,44 @@
#!/bin/bash
set -ex
if [ -n "$CLANG_VERSION" ]; then
if [[ $CLANG_VERSION == 7 && $UBUNTU_VERSION == 16.04 ]]; then
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"
elif [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then
sudo apt-get update
# gpg-agent is not available by default on 18.04
sudo apt-get install -y --no-install-recommends gpg-agent
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"
fi
sudo apt-get update
apt-get install -y --no-install-recommends clang-"$CLANG_VERSION"
apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"
# Install dev version of LLVM.
if [ -n "$LLVMDEV" ]; then
sudo apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"-dev
fi
# Use update-alternatives to make this version the default
# TODO: Decide if overriding gcc as well is a good idea
# update-alternatives --install /usr/bin/gcc gcc /usr/bin/clang-"$CLANG_VERSION" 50
# update-alternatives --install /usr/bin/g++ g++ /usr/bin/clang++-"$CLANG_VERSION" 50
update-alternatives --install /usr/bin/clang clang /usr/bin/clang-"$CLANG_VERSION" 50
update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-"$CLANG_VERSION" 50
# clang's packaging is a little messed up (the runtime libs aren't
# added into the linker path), so give it a little help
clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux")
echo "$clang_lib" > /etc/ld.so.conf.d/clang.conf
ldconfig
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -0,0 +1,16 @@
#!/bin/bash
set -ex
[ -n "$CMAKE_VERSION" ]
# Turn 3.6.3 into v3.6
path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')
file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"
# Download and install specific CMake version in /usr/local
pushd /tmp
curl -Os "https://cmake.org/files/${path}/${file}"
tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz
rm -f cmake-*.tar.gz
popd

View File

@ -0,0 +1,94 @@
#!/bin/bash
set -ex
# Optionally install conda
if [ -n "$ANACONDA_PYTHON_VERSION" ]; then
BASE_URL="https://repo.continuum.io/miniconda"
MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)
case "$MAJOR_PYTHON_VERSION" in
2)
CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"
;;
3)
CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"
;;
*)
echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"
exit 1
;;
esac
mkdir /opt/conda
chown jenkins:jenkins /opt/conda
as_jenkins() {
# NB: unsetting the environment variables works around a conda bug
# https://github.com/conda/conda/issues/6576
# NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation
# NB: This must be run from a directory that jenkins has access to,
# works around https://github.com/conda/conda-package-handling/pull/34
sudo -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
pushd /tmp
wget -q "${BASE_URL}/${CONDA_FILE}"
chmod +x "${CONDA_FILE}"
as_jenkins ./"${CONDA_FILE}" -b -f -p "/opt/conda"
popd
# NB: Don't do this, rely on the rpath to get it right
#echo "/opt/conda/lib" > /etc/ld.so.conf.d/conda-python.conf
#ldconfig
sed -e 's|PATH="\(.*\)"|PATH="/opt/conda/bin:\1"|g' -i /etc/environment
export PATH="/opt/conda/bin:$PATH"
# Ensure we run conda in a directory that jenkins has write access to
pushd /opt/conda
# Track latest conda update
as_jenkins conda update -n base conda
# Install correct Python version
as_jenkins conda install python="$ANACONDA_PYTHON_VERSION"
conda_install() {
# Ensure that the install command don't upgrade/downgrade Python
# This should be called as
# conda_install pkg1 pkg2 ... [-c channel]
as_jenkins conda install -q -y python="$ANACONDA_PYTHON_VERSION" $*
}
# Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README
# DO NOT install cmake here as it would install a version newer than 3.5, but
# we want to pin to version 3.5.
conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six
if [[ "$CUDA_VERSION" == 8.0* ]]; then
conda_install magma-cuda80 -c pytorch
elif [[ "$CUDA_VERSION" == 9.0* ]]; then
conda_install magma-cuda90 -c pytorch
elif [[ "$CUDA_VERSION" == 9.1* ]]; then
conda_install magma-cuda91 -c pytorch
elif [[ "$CUDA_VERSION" == 9.2* ]]; then
conda_install magma-cuda92 -c pytorch
elif [[ "$CUDA_VERSION" == 10.0* ]]; then
conda_install magma-cuda100 -c pytorch
elif [[ "$CUDA_VERSION" == 10.1* ]]; then
conda_install magma-cuda101 -c pytorch
fi
# TODO: This isn't working atm
conda_install nnpack -c killeent
# Install some other packages
# TODO: Why is scipy pinned
# numba & llvmlite is pinned because of https://github.com/numba/numba/issues/4368
# scikit-learn is pinned because of
# https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5
# only)
as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.43.1 llvmlite==0.28.0
popd
fi

View File

@ -0,0 +1,61 @@
#!/bin/bash
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
libhiredis-dev \
libleveldb-dev \
liblmdb-dev \
libsnappy-dev
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
yum install -y \
hiredis-devel \
leveldb-devel \
lmdb-devel \
snappy-devel
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -0,0 +1,19 @@
#!/bin/bash
set -ex
if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
apt-get install -y g++-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -0,0 +1,6 @@
#!/bin/bash
set -ex
mkdir -p /usr/local/include
cp jni.h /usr/local/include

View File

@ -0,0 +1,20 @@
#!/bin/bash
set -ex
if [ -n "$KATEX" ]; then
curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -
sudo apt-get install -y nodejs
curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -
echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list
apt-get update
apt-get install -y --no-install-recommends yarn
yarn global add katex --prefix /usr/local
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -0,0 +1,13 @@
#!/bin/bash
set -ex
[ -n "$NINJA_VERSION" ]
url="https://github.com/ninja-build/ninja/releases/download/v${NINJA_VERSION}/ninja-linux.zip"
pushd /tmp
wget --no-verbose --output-document=ninja-linux.zip "$url"
unzip ninja-linux.zip -d /usr/local/bin
rm -f ninja-linux.zip
popd

View File

@ -0,0 +1,56 @@
#!/bin/bash
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
# Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we install that here if on 14.04
# Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will
# install cmake3 here and use cmake3.
apt-get update
if [[ "$UBUNTU_VERSION" == 14.04 ]]; then
apt-get install -y --no-install-recommends cmake3
install_protobuf_26
else
apt-get install -y --no-install-recommends \
libprotobuf-dev \
protobuf-compiler
fi
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6
# so we always install install that here
install_protobuf_26
}
# Install base packages depending on the base OS
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

View File

@ -0,0 +1,14 @@
apt-get update
apt-get install -y sudo wget libboost-dev libboost-test-dev libboost-program-options-dev libboost-filesystem-dev libboost-thread-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev
wget https://www-us.apache.org/dist/thrift/0.12.0/thrift-0.12.0.tar.gz
tar -xvf thrift-0.12.0.tar.gz
cd thrift-0.12.0
for file in ./compiler/cpp/Makefile*; do
sed -i 's/\-Werror//' $file
done
./bootstrap.sh
./configure --without-php --without-java --without-python --without-nodejs --without-go --without-ruby
sudo make
sudo make install
cd ..
rm thrift-0.12.0.tar.gz

View File

@ -0,0 +1,94 @@
#!/bin/bash
set -ex
as_jenkins() {
# NB: Preserve PATH and LD_LIBRARY_PATH changes
sudo -H -u jenkins env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*
}
if [ -n "$TRAVIS_PYTHON_VERSION" ]; then
mkdir -p /opt/python
chown jenkins:jenkins /opt/python
# Download Python binary from Travis
pushd tmp
as_jenkins wget --quiet https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64/python-$TRAVIS_PYTHON_VERSION.tar.bz2
# NB: The tarball also comes with /home/travis virtualenv that we
# don't care about. (Maybe we should, but we've worked around the
# "how do I install to python" issue by making this entire directory
# user-writable "lol")
# NB: Relative ordering of opt/python and flags matters
as_jenkins tar xjf python-$TRAVIS_PYTHON_VERSION.tar.bz2 --strip-components=2 --directory /opt/python opt/python
popd
echo "/opt/python/$TRAVIS_PYTHON_VERSION/lib" > /etc/ld.so.conf.d/travis-python.conf
ldconfig
sed -e 's|PATH="\(.*\)"|PATH="/opt/python/'"$TRAVIS_PYTHON_VERSION"'/bin:\1"|g' -i /etc/environment
export PATH="/opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH"
python --version
pip --version
# Install pip from source.
# The python-pip package on Ubuntu Trusty is old
# and upon install numpy doesn't use the binary
# distribution, and fails to compile it from source.
pushd tmp
as_jenkins curl -L -O https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz
as_jenkins tar zxf pip-9.0.1.tar.gz
pushd pip-9.0.1
as_jenkins python setup.py install
popd
rm -rf pip-9.0.1*
popd
# Install pip packages
as_jenkins pip install --upgrade pip
pip --version
if [[ "$TRAVIS_PYTHON_VERSION" == nightly ]]; then
# These two packages have broken Cythonizations uploaded
# to PyPi, see:
#
# - https://github.com/numpy/numpy/issues/10500
# - https://github.com/yaml/pyyaml/issues/117
#
# Furthermore, the released version of Cython does not
# have these issues fixed.
#
# While we are waiting on fixes for these, we build
# from Git for now. Feel free to delete this conditional
# branch if things start working again (you may need
# to do this if these packages regress on Git HEAD.)
as_jenkins pip install git+https://github.com/cython/cython.git
as_jenkins pip install git+https://github.com/numpy/numpy.git
as_jenkins pip install git+https://github.com/yaml/pyyaml.git
else
as_jenkins pip install numpy pyyaml
fi
as_jenkins pip install \
future \
hypothesis \
protobuf \
pytest \
pillow \
typing
as_jenkins pip install mkl mkl-devel
# SciPy does not support Python 3.7 or Python 2.7.9
if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]] && [[ "$TRAVIS_PYTHON_VERSION" != "2.7.9" ]]; then
as_jenkins pip install scipy==1.1.0 scikit-image librosa>=0.6.2
fi
# Install psutil for dataloader tests
as_jenkins pip install psutil
# Cleanup package manager
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
fi

View File

@ -0,0 +1,20 @@
#!/bin/bash
set -ex
# Mirror jenkins user in container
echo "jenkins:x:1014:1014::/var/lib/jenkins:" >> /etc/passwd
echo "jenkins:x:1014:" >> /etc/group
# Create $HOME
mkdir -p /var/lib/jenkins
chown jenkins:jenkins /var/lib/jenkins
mkdir -p /var/lib/jenkins/.ccache
chown jenkins:jenkins /var/lib/jenkins/.ccache
# Allow writing to /usr/local (for make install)
chown jenkins:jenkins /usr/local
# Allow sudo
# TODO: Maybe we shouldn't
echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins

View File

@ -0,0 +1,57 @@
#!/bin/bash
set -ex
# This function installs protobuf 2.6
install_protobuf_26() {
pb_dir="/usr/temp_pb_install_dir"
mkdir -p $pb_dir
# On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or
# else it will fail with
# g++: error: ./../lib64/crti.o: No such file or directory
ln -s /usr/lib64 "$pb_dir/lib64"
curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"
tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz
pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig
popd
rm -rf $pb_dir
}
install_ubuntu() {
apt-get update
apt-get install -y --no-install-recommends \
libopencv-dev \
libavcodec-dev
# Cleanup
apt-get autoclean && apt-get clean
rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
}
install_centos() {
# Need EPEL for many packages we depend on.
# See http://fedoraproject.org/wiki/EPEL
yum --enablerepo=extras install -y epel-release
yum install -y \
opencv-devel \
ffmpeg-devel
# Cleanup
yum clean all
rm -rf /var/cache/yum
rm -rf /var/lib/yum/yumdb
rm -rf /var/lib/yum/history
}
# Install base packages depending on the base OS
if [ -f /etc/lsb-release ]; then
install_ubuntu
elif [ -f /etc/os-release ]; then
install_centos
else
echo "Unable to determine OS..."
exit 1
fi

1143
.circleci/docker/java/jni.h Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,85 @@
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG CUDNN_VERSION
FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ARG CUDA_VERSION
ARG CUDNN_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
ARG EC2
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install user
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
# Install gcc
ARG GCC_VERSION
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install non-standard Python versions (via Travis binaries)
ARG TRAVIS_PYTHON_VERSION
ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
ADD ./common/install_travis_python.sh install_travis_python.sh
RUN bash ./install_travis_python.sh && rm install_travis_python.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc
# Add jni.h for java host build
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
# AWS specific CUDA build guidance
ENV TORCH_CUDA_ARCH_LIST Maxwell
ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"
USER jenkins
CMD ["bash"]

View File

@ -0,0 +1,114 @@
ARG UBUNTU_VERSION
FROM ubuntu:${UBUNTU_VERSION}
ARG UBUNTU_VERSION
ENV DEBIAN_FRONTEND noninteractive
# Install common dependencies (so that this step can be cached separately)
ARG EC2
ADD ./common/install_base.sh install_base.sh
RUN bash ./install_base.sh && rm install_base.sh
# Install clang
ARG LLVMDEV
ARG CLANG_VERSION
ADD ./common/install_clang.sh install_clang.sh
RUN bash ./install_clang.sh && rm install_clang.sh
# (optional) Install thrift.
ARG THRIFT
ADD ./common/install_thrift.sh install_thrift.sh
RUN if [ -n "${THRIFT}" ]; then bash ./install_thrift.sh; fi
RUN rm install_thrift.sh
ENV INSTALLED_THRIFT ${THRIFT}
# Install user
ADD ./common/install_user.sh install_user.sh
RUN bash ./install_user.sh && rm install_user.sh
# Install katex
ARG KATEX
ADD ./common/install_katex.sh install_katex.sh
RUN bash ./install_katex.sh && rm install_katex.sh
# Install conda
ENV PATH /opt/conda/bin:$PATH
ARG ANACONDA_PYTHON_VERSION
ADD ./common/install_conda.sh install_conda.sh
RUN bash ./install_conda.sh && rm install_conda.sh
# Install gcc
ARG GCC_VERSION
ADD ./common/install_gcc.sh install_gcc.sh
RUN bash ./install_gcc.sh && rm install_gcc.sh
# Install non-standard Python versions (via Travis binaries)
ARG TRAVIS_PYTHON_VERSION
ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH
ADD ./common/install_travis_python.sh install_travis_python.sh
RUN bash ./install_travis_python.sh && rm install_travis_python.sh
# (optional) Install protobuf for ONNX
ARG PROTOBUF
ADD ./common/install_protobuf.sh install_protobuf.sh
RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi
RUN rm install_protobuf.sh
ENV INSTALLED_PROTOBUF ${PROTOBUF}
# (optional) Install database packages like LMDB and LevelDB
ARG DB
ADD ./common/install_db.sh install_db.sh
RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi
RUN rm install_db.sh
ENV INSTALLED_DB ${DB}
# (optional) Install vision packages like OpenCV and ffmpeg
ARG VISION
ADD ./common/install_vision.sh install_vision.sh
RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi
RUN rm install_vision.sh
ENV INSTALLED_VISION ${VISION}
# (optional) Install Android NDK
ARG ANDROID
ARG ANDROID_NDK
ARG GRADLE_VERSION
ADD ./common/install_android.sh install_android.sh
ADD ./android/AndroidManifest.xml AndroidManifest.xml
ADD ./android/build.gradle build.gradle
RUN if [ -n "${ANDROID}" ]; then bash ./install_android.sh; fi
RUN rm install_android.sh
RUN rm AndroidManifest.xml
RUN rm build.gradle
ENV INSTALLED_ANDROID ${ANDROID}
# (optional) Install non-default CMake version
ARG CMAKE_VERSION
ADD ./common/install_cmake.sh install_cmake.sh
RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi
RUN rm install_cmake.sh
# (optional) Install non-default Ninja version
ARG NINJA_VERSION
ADD ./common/install_ninja.sh install_ninja.sh
RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi
RUN rm install_ninja.sh
# Install ccache/sccache (do this last, so we get priority in PATH)
ADD ./common/install_cache.sh install_cache.sh
ENV PATH /opt/cache/bin:$PATH
RUN bash ./install_cache.sh && rm install_cache.sh
# Add jni.h for java host build
ADD ./common/install_jni.sh install_jni.sh
ADD ./java/jni.h jni.h
RUN bash ./install_jni.sh && rm install_jni.sh
# Include BUILD_ENVIRONMENT environment variable in image
ARG BUILD_ENVIRONMENT
ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}
USER jenkins
CMD ["bash"]

View File

@ -88,17 +88,18 @@ YAML_SOURCES = [
File("job-specs-custom.yml"),
File("binary_update_htmls.yml"),
File("binary-build-tests.yml"),
File("docker_build_job.yml"),
File("workflows.yml"),
Listgen(pytorch_build_definitions.get_workflow_jobs, 3),
File("workflows-pytorch-macos-builds.yml"),
File("workflows-pytorch-android-gradle-build.yml"),
File("workflows-pytorch-ios-builds.yml"),
File("workflows-pytorch-mobile-builds.yml"),
File("workflows-pytorch-ge-config-tests.yml"),
Listgen(caffe2_build_definitions.get_workflow_jobs, 3),
File("workflows-binary-builds-smoke-subset.yml"),
Header("Daily smoke test trigger"),
Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),
Header("Daily binary build trigger"),
Treegen(binary_build_definitions.add_binary_build_jobs, 1),
Listgen(binary_build_definitions.get_binary_smoke_test_jobs, 3),
Listgen(binary_build_definitions.get_binary_build_jobs, 3),
File("workflows-nightly-ios-binary-builds.yml"),
File("workflows-nightly-android-binary-builds.yml"),
Header("Nightly tests"),
@ -106,6 +107,7 @@ YAML_SOURCES = [
File("workflows-nightly-uploads-header.yml"),
Listgen(binary_build_definitions.get_nightly_uploads, 3),
File("workflows-s3-html.yml"),
File("workflows-docker-builder.yml")
]

View File

@ -1,8 +1,8 @@
#!/bin/bash
set -eux -o pipefail
set -ex -o pipefail
echo ""
echo "PWD: ${PWD}"
echo "DIR: $(pwd)"
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"

View File

@ -0,0 +1,29 @@
#!/bin/bash
set -ex -o pipefail
echo ""
echo "DIR: $(pwd)"
PROJ_ROOT=/Users/distiller/project
cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo "${IOS_CERT_KEY}" >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_cert
# install the provisioning profile
PROFILE=TestApp_CI.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo "${IOS_SIGN_KEY}" >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
PROFILE=TestApp_CI
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

View File

@ -1,8 +1,8 @@
#!/bin/bash
set -eux -o pipefail
set -ex -o pipefail
echo ""
echo "PWD: $(pwd)"
echo "DIR: $(pwd)"
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
ARTIFACTS_DIR=${WORKSPACE}/ios

View File

@ -11,6 +11,8 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then
source activate testenv >/dev/null
elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then
export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"
elif [[ "$DESIRED_PYTHON" == 3.8m ]]; then
export PATH="/opt/python/cp38-cp38/bin:\$PATH"
elif [[ "$PACKAGE_TYPE" != libtorch ]]; then
python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"
export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

View File

@ -5,26 +5,30 @@ source "/Users/distiller/project/env"
export "PATH=$workdir/miniconda/bin:$PATH"
pkg="$workdir/final_pkgs/$(ls $workdir/final_pkgs)"
# Don't test libtorch
# TODO we should test libtorch
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
exit 0
fi
# Create a new test env
# TODO cut all this out into a separate test job and have an entirely different
# miniconda
source deactivate || true
conda create -qyn test python="$DESIRED_PYTHON"
source activate test >/dev/null
if [[ "$PACKAGE_TYPE" != libtorch ]]; then
source deactivate || true
conda create -qyn test python="$DESIRED_PYTHON"
source activate test >/dev/null
fi
# Install the package
if [[ "$PACKAGE_TYPE" == conda ]]; then
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
pkg="$(ls $workdir/final_pkgs/*-latest.zip)"
unzip "$pkg" -d /tmp
cd /tmp/libtorch
elif [[ "$PACKAGE_TYPE" == conda ]]; then
conda install -y "$pkg" --offline
else
pip install "$pkg" --no-index --no-dependencies -v
fi
# Test
pushd "$workdir/pytorch"
$workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
$workdir/builder/check_binary.sh
else
pushd "$workdir/pytorch"
$workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"
fi

View File

@ -32,11 +32,11 @@ fi
export DOCKER_IMAGE=${DOCKER_IMAGE:-}
if [[ -z "$DOCKER_IMAGE" ]]; then
if [[ "$PACKAGE_TYPE" == conda ]]; then
export DOCKER_IMAGE="soumith/conda-cuda"
export DOCKER_IMAGE="pytorch/conda-cuda"
elif [[ "$DESIRED_CUDA" == cpu ]]; then
export DOCKER_IMAGE="soumith/manylinux-cuda100"
export DOCKER_IMAGE="pytorch/manylinux-cuda100"
else
export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"
export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"
fi
fi
@ -55,13 +55,34 @@ fi
# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it
export DATE="$(date -u +%Y%m%d)"
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu100" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE"
if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then
export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE"
else
export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE+$DESIRED_CUDA"
export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE+$DESIRED_CUDA"
fi
export PYTORCH_BUILD_NUMBER=1
JAVA_HOME=
BUILD_JNI=OFF
if [[ "$PACKAGE_TYPE" == libtorch ]]; then
POSSIBLE_JAVA_HOMES=()
POSSIBLE_JAVA_HOMES+=(/usr/local)
POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
if [[ -e "$JH/include/jni.h" ]] ; then
echo "Found jni.h under $JH"
JAVA_HOME="$JH"
BUILD_JNI=ON
break
fi
done
if [ -z "$JAVA_HOME" ]; then
echo "Did not find jni.h"
fi
fi
cat >>"$envfile" <<EOL
# =================== The following code will be executed inside Docker container ===================
export TZ=UTC
@ -75,7 +96,7 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"
export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"
export DATE="$DATE"
export NIGHTLIES_DATE_PREAMBLE=1.3.0.dev
export NIGHTLIES_DATE_PREAMBLE=1.4.0.dev
export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"
export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"
export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"
@ -85,6 +106,8 @@ export TORCH_PACKAGE_NAME='torch'
export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'
export USE_FBGEMM=1
export JAVA_HOME=$JAVA_HOME
export BUILD_JNI=$BUILD_JNI
export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"
export DOCKER_IMAGE="$DOCKER_IMAGE"

View File

@ -18,9 +18,9 @@ chmod +x /home/circleci/project/ci_test_script.sh
# Run the docker
if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then
export id=$(docker run --runtime=nvidia -t -d "${DOCKER_IMAGE}")
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d "${DOCKER_IMAGE}")
else
export id=$(docker run -t -d "${DOCKER_IMAGE}")
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d "${DOCKER_IMAGE}")
fi
# Copy the envfile and script with all the code to run into the docker.

View File

@ -4,6 +4,8 @@ set -eux -o pipefail
export ANDROID_NDK_HOME=/opt/ndk
export ANDROID_HOME=/opt/android/sdk
# Must be in sync with GRADLE_VERSION in docker image for android
# https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155
export GRADLE_VERSION=4.10.3
export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION
export GRADLE_PATH=$GRADLE_HOME/bin/gradle
@ -45,15 +47,39 @@ fi
env
echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"
GRADLE_PARAMS="-p android assembleRelease --debug --stacktrace"
if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then
GRADLE_PARAMS+=" -PABI_FILTERS=x86"
fi
if [ -n "{GRADLE_OFFLINE:-}" ]; then
GRADLE_PARAMS+=" --offline"
fi
# touch gradle cache files to prevent expiration
while IFS= read -r -d '' file
do
touch "$file" || true
done < <(find /var/lib/jenkins/.gradle -type f -print0)
env
export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties
rm -f $GRADLE_LOCAL_PROPERTIES
echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES
echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES
echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES
if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then
$GRADLE_PATH -PABI_FILTERS=x86 -p ~/workspace/android/ assembleRelease
else
$GRADLE_PATH -p ~/workspace/android/ assembleRelease
fi
$GRADLE_PATH $GRADLE_PARAMS
find . -type f -name "*.a" -exec ls -lh {} \;
while IFS= read -r -d '' file
do
echo
echo "$file"
ls -lah "$file"
zipinfo -l "$file"
done < <(find . -type f -name '*.aar' -print0)
find . -type f -name *aar -print | xargs tar cfvz ~/workspace/android/artifacts.tgz

View File

@ -53,7 +53,7 @@ sudo apt-get -y install doxygen
# Generate ATen files
pushd "${pt_checkout}"
pip install -r requirements.txt
time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \
time python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \

View File

@ -18,6 +18,8 @@ default_set = set([
'pytorch-linux-xenial-py3-clang5-asan',
# PyTorch DEBUG
'pytorch-linux-xenial-py3.6-gcc5.4',
# LibTorch
'pytorch-libtorch-linux-xenial-cuda9-cudnn7-py3',
# Caffe2 CPU
'caffe2-py2-mkl-ubuntu16.04',
@ -30,14 +32,17 @@ default_set = set([
'caffe2-py2-clang7-ubuntu16.04',
# Caffe2 CMake
'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',
# Caffe2 CentOS
'caffe2-py3.6-devtoolset7-cuda9.0-cudnn7-centos7',
# Binaries
'manywheel 2.7mu cpu devtoolset7',
'libtorch 2.7m cpu devtoolset7',
'libtorch 2.7m cpu gcc5.4_cxx11-abi',
'libtorch-ios-10.2.1-nightly-x86_64-build',
'libtorch-ios-10.2.1-nightly-arm64-build',
'libtorch-ios-10.2.1-nightly-binary-build-upload',
'libtorch 2.7 cpu',
'libtorch-ios-11.2.1-nightly-x86_64-build',
'libtorch-ios-11.2.1-nightly-arm64-build',
'libtorch-ios-11.2.1-nightly-binary-build-upload',
# Caffe2 Android
'caffe2-py2-android-ubuntu16.04',
@ -48,11 +53,15 @@ default_set = set([
'pytorch-macos-10.13-cuda9.2-cudnn7-py3',
# PyTorch Android
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',
'pytorch-linux-xenial-py3-clang5-android-ndk-r19',
# PyTorch Android gradle
'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',
# Pytorch iOS builds
'pytorch-ios-10.2.1-x86_64_build',
'pytorch-ios-10.2.1-arm64_build',
'pytorch-ios-11.2.1-x86_64_build',
'pytorch-ios-11.2.1-arm64_build',
# PyTorch Mobile builds
'pytorch-linux-xenial-py3-clang5-mobile-build',
# Pytorch backward compatibility check
'pytorch-linux-backward-compatibility-check-test',
@ -60,10 +69,9 @@ default_set = set([
# XLA
'pytorch-xla-linux-xenial-py3.6-clang7',
# Named tensor
"pytorch-namedtensor-linux-xenial-py3.6-gcc5.4",
"pytorch-namedtensor-linux-xenial-py3-clang5-asan",
"pytorch-namedtensor-linux-xenial-cuda9-cudnn7-py2",
# GraphExecutor config jobs
'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test',
'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test',
# Other checks
'pytorch-short-perf-test-gpu',

View File

@ -223,7 +223,7 @@
binary_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
xcode: "11.2.1"
steps:
- attach_workspace:
at: ~/workspace
@ -232,12 +232,18 @@
- run_brew_for_ios_build
- run:
name: Build
contxt: org-member
no_output_timeout: "1h"
command: |
script="/Users/distiller/project/.circleci/scripts/binary_ios_build.sh"
cat "$script"
source "$script"
- run:
name: Test
no_output_timeout: "30m"
command: |
script="/Users/distiller/project/.circleci/scripts/binary_ios_test.sh"
cat "$script"
source "$script"
- persist_to_workspace:
root: /Users/distiller/workspace/
paths: ios
@ -245,7 +251,7 @@
binary_ios_upload:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
xcode: "11.2.1"
steps:
- attach_workspace:
at: ~/workspace

View File

@ -41,7 +41,7 @@
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
time docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
@ -112,9 +112,9 @@
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"
@ -146,12 +146,7 @@
# Reinitialize path (see man page for path_helper(8))
eval `/usr/libexec/path_helper -s`
# Use Homebrew Python if configured to do so
if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
fi
pip -q install numpy
export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH
# Install Anaconda if we need to
if [ -n "${CAFFE2_USE_ANACONDA}" ]; then
@ -164,6 +159,8 @@
source ${TMPDIR}/anaconda/bin/activate
fi
pip -q install numpy
# Install sccache
sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache
sudo chmod +x /usr/local/bin/sccache
@ -201,4 +198,3 @@
if which sccache > /dev/null; then
sccache --show-stats
fi

View File

@ -0,0 +1,21 @@
docker_build_job:
parameters:
image_name:
type: string
default: ""
machine:
image: ubuntu-1604:201903-01
resource_class: large
environment:
IMAGE_NAME: << parameters.image_name >>
steps:
- checkout
- run:
name: build_docker_image_<< parameters.image_name >>
no_output_timeout: "1h"
command: |
set +x
export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}
export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}
set -x
cd .circleci/docker && ./build_docker.sh

View File

@ -1,43 +1,8 @@
pytorch_short_perf_test_gpu:
environment:
BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
PYTHON_VERSION: "3.6"
USE_CUDA_DOCKER_RUNTIME: "1"
resource_class: gpu.medium
machine:
image: ubuntu-1604:201903-01
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- setup_linux_system_environment
- setup_ci_environment
- run:
name: Perf Test
no_output_timeout: "1h"
command: |
set -e
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env
# This IAM user allows write access to S3 bucket for perf test numbers
set +x
echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env
set -x
docker cp /home/circleci/project/env $id:/var/lib/jenkins/workspace/env
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
pytorch_python_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-python-doc-push
# TODO: stop hardcoding this
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"
resource_class: large
machine:
image: ubuntu-1604:201903-01
@ -54,7 +19,7 @@
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
@ -82,7 +47,7 @@
pytorch_cpp_doc_push:
environment:
BUILD_ENVIRONMENT: pytorch-cpp-doc-push
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"
resource_class: large
machine:
image: ubuntu-1604:201903-01
@ -99,7 +64,7 @@
export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
# master branch docs push
if [[ "${CIRCLE_BRANCH}" == "master" ]]; then
@ -186,6 +151,8 @@
chmod a+x .jenkins/pytorch/macos-test.sh
unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts
- store_test_results:
path: test/test-reports
pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:
environment:
@ -238,7 +205,7 @@
pytorch_android_gradle_build:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
@ -268,14 +235,14 @@
# x86_32
time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null
export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# arm-v7a
time docker pull ${docker_image_libtorch_android_arm_v7a} >/dev/null
export id_arm_v7a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})
export id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -285,7 +252,7 @@
# x86_64
time docker pull ${docker_image_libtorch_android_x86_64} >/dev/null
export id_x86_64=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})
export id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -295,7 +262,7 @@
# arm-v8a
time docker pull ${docker_image_libtorch_android_arm_v8a} >/dev/null
export id_arm_v8a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})
export id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -308,7 +275,7 @@
docker cp ~/workspace/build_android_install_arm_v8a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v8a
# run gradle buildRelease
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_android_artifacts
@ -324,7 +291,7 @@
pytorch_android_publish_snapshot:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
@ -348,7 +315,7 @@
# x86_32
time docker pull ${docker_image_libtorch_android_x86_32_gradle} >/dev/null
export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})
export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})
export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
@ -360,7 +327,7 @@
pytorch_android_gradle_build-x86_32:
environment:
BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
PYTHON_VERSION: "3.6"
resource_class: large
machine:
@ -388,9 +355,9 @@
# x86
time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
mkdir -p ~/workspace/build_android_x86_32_artifacts
@ -406,12 +373,34 @@
pytorch_ios_build:
<<: *pytorch_ios_params
macos:
xcode: "10.2.1"
xcode: "11.2.1"
steps:
# See Note [Workspace for CircleCI scripts] in job-specs-setup.yml
- should_run_job
- checkout
- run_brew_for_ios_build
- run_brew_for_ios_build
- run:
name: Run Fastlane
no_output_timeout: "1h"
command: |
set -e
PROJ_ROOT=/Users/distiller/project
cd ${PROJ_ROOT}/ios/TestApp
# install fastlane
sudo gem install bundler && bundle install
# install certificates
echo ${IOS_CERT_KEY} >> cert.txt
base64 --decode cert.txt -o Certificates.p12
rm cert.txt
bundle exec fastlane install_cert
# install the provisioning profile
PROFILE=TestApp_CI.mobileprovision
PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles
mkdir -pv "${PROVISIONING_PROFILES}"
cd "${PROVISIONING_PROFILES}"
echo ${IOS_SIGN_KEY} >> cert.txt
base64 --decode cert.txt -o ${PROFILE}
rm cert.txt
- run:
name: Build
no_output_timeout: "1h"
@ -421,7 +410,6 @@
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
export TCLLIBPATH="/usr/local/lib"
# Install conda
curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh
chmod +x ~/Downloads/conda.sh
@ -444,3 +432,43 @@
export IOS_ARCH=${IOS_ARCH}
export IOS_PLATFORM=${IOS_PLATFORM}
unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts
- run:
name: Run Build Tests
no_output_timeout: "30m"
command: |
set -e
PROJ_ROOT=/Users/distiller/project
PROFILE=TestApp_CI
# run the ruby build script
if ! [ -x "$(command -v xcodebuild)" ]; then
echo 'Error: xcodebuild is not installed.'
exit 1
fi
echo ${IOS_DEV_TEAM_ID}
ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}
if ! [ "$?" -eq "0" ]; then
echo 'xcodebuild failed!'
exit 1
fi
- run:
name: Run Simulator Tests
no_output_timeout: "2h"
command: |
set -e
if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then
echo "not SIMULATOR build, skip it."
exit 0
fi
WORKSPACE=/Users/distiller/workspace
PROJ_ROOT=/Users/distiller/project
source ~/anaconda/bin/activate
#install the latest version of PyTorch and TorchVision
pip install torch torchvision
#run unit test
cd ${PROJ_ROOT}/ios/TestApp/benchmark
python trace_model.py
ruby setup.rb
cd ${PROJ_ROOT}/ios/TestApp
instruments -s -devices
fastlane scan

View File

@ -4,10 +4,6 @@
- image: circleci/python:3.7.3
steps:
- checkout
- run:
name: Ensure config is up to date
command: ./ensure-consistency.py
working_directory: .circleci
- run:
name: Save commit message
command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG

View File

@ -17,52 +17,57 @@ jobs:
# Pull Docker image and run build
echo "DOCKER_IMAGE: "${DOCKER_IMAGE}
time docker pull ${DOCKER_IMAGE} >/dev/null
export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})
# TODO We may want to move the rebase logic to a separate step after checkout
# Rebase to master only if in xenial_py3_6_gcc5_4 case
if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
set -x
git config --global user.email "circleci.ossci@gmail.com"
git config --global user.name "CircleCI"
git config remote.origin.url https://github.com/pytorch/pytorch.git
git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=50 --quiet
export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
export GIT_COMMIT=${CIRCLE_SHA1}
echo "GIT_COMMIT: " ${GIT_COMMIT}
git checkout -f ${GIT_COMMIT}
git reset --hard ${GIT_COMMIT}
git merge --no-edit --no-ff ${GIT_MERGE_TARGET}
set +x
else
echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
fi
# NB: Temporarily disable the rebase logic in v1.4.0, don't merge this change into master
# # TODO We may want to move the rebase logic to a separate step after checkout
# # Rebase to master only if in xenial_py3_6_gcc5_4 case
# if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then
# echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"
# set -x
# git config --global user.email "circleci.ossci@gmail.com"
# git config --global user.name "CircleCI"
# git config remote.origin.url https://github.com/pytorch/pytorch.git
# git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master
# git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet
# export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`
# echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}
# export GIT_COMMIT=${CIRCLE_SHA1}
# echo "GIT_COMMIT: " ${GIT_COMMIT}
# git checkout -f ${GIT_COMMIT}
# git reset --hard ${GIT_COMMIT}
# git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}
# set +x
# else
# echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"
# fi
git submodule sync && git submodule update -q --init --recursive
docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
NAMED_FLAG="export BUILD_NAMEDTENSOR=1"
if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "
elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "
fi
echo "Parallel backend flags: "${PARALLEL_FLAGS}
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$PARALLEL_FLAGS"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
# Push intermediate Docker image for next phase to use
if [ -z "${BUILD_ONLY}" ]; then
# Note [Special build images]
# The namedtensor and xla builds use the same docker image as
# The xla build uses the same docker image as
# pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to
# distinguish between them so the test can pick up the correct image.
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-libtorch
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64
elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then
@ -94,24 +99,43 @@ jobs:
set -e
# See Note [Special build images]
output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}
if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-namedtensor
export NAMED_FLAG="export BUILD_NAMEDTENSOR=1 && export TEST_NAMEDTENSOR=1"
elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-xla
elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then
export COMMIT_DOCKER_IMAGE=$output_image-libtorch
else
export COMMIT_DOCKER_IMAGE=$output_image
fi
echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then
export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "
elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then
export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "
fi
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
echo "Parallel backend flags: "${PARALLEL_FLAGS}
time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null
if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})
fi
retrieve_test_reports() {
echo "retrieving test reports"
docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'
}
trap "retrieve_test_reports" ERR
if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
else
export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'
fi
echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts
retrieve_test_reports
- store_test_results:
path: test-reports

View File

@ -10,19 +10,19 @@
build_environment: "manywheel 2.7mu cpu devtoolset7"
requires:
- setup
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build
build_environment: "manywheel 3.7m cu100 devtoolset7"
requires:
- setup
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_conda_2_7_cpu_devtoolset7_build
build_environment: "conda 2.7 cpu devtoolset7"
requires:
- setup
docker_image: "soumith/conda-cuda"
docker_image: "pytorch/conda-cuda"
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_build
- binary_linux_build:
@ -31,14 +31,14 @@
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_build:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
requires:
- setup
libtorch_variant: "shared-with-deps"
docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"
docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"
# TODO we should test a libtorch cuda build, but they take too long
# - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build
- binary_mac_build:
@ -63,14 +63,14 @@
requires:
- setup
- binary_linux_manywheel_2_7mu_cpu_devtoolset7_build
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_test:
name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test
build_environment: "manywheel 3.7m cu100 devtoolset7"
requires:
- setup
- binary_linux_manywheel_3_7m_cu100_devtoolset7_build
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
use_cuda_docker_runtime: "1"
resource_class: gpu.medium
- binary_linux_test:
@ -79,7 +79,7 @@
requires:
- setup
- binary_linux_conda_2_7_cpu_devtoolset7_build
docker_image: "soumith/conda-cuda"
docker_image: "pytorch/conda-cuda"
# This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710
# - binary_linux_conda_3_6_cu90_devtoolset7_test:
- binary_linux_test:
@ -89,7 +89,7 @@
- setup
- binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "soumith/manylinux-cuda100"
docker_image: "pytorch/manylinux-cuda100"
- binary_linux_test:
name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test
build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"
@ -97,5 +97,5 @@
- setup
- binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build
libtorch_variant: "shared-with-deps"
docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"
docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

View File

@ -0,0 +1,66 @@
docker_build:
triggers:
- schedule:
cron: "0 15 * * 0"
filters:
branches:
only:
- master
jobs:
- docker_build_job:
name: "pytorch-linux-bionic-clang9-thrift-llvmdev"
image_name: "pytorch-linux-bionic-clang9-thrift-llvmdev"
- docker_build_job:
name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-cuda8-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda8-cudnn7-py2"
- docker_build_job:
name: "pytorch-linux-xenial-cuda8-cudnn7-py3"
image_name: "pytorch-linux-xenial-cuda8-cudnn7-py3"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
image_name: "pytorch-linux-xenial-cuda9-cudnn7-py3"
- docker_build_job:
name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
image_name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7.9"
image_name: "pytorch-linux-xenial-py2.7.9"
- docker_build_job:
name: "pytorch-linux-xenial-py2.7"
image_name: "pytorch-linux-xenial-py2.7"
- docker_build_job:
name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"
- docker_build_job:
name: "pytorch-linux-xenial-py3-clang5-asan"
image_name: "pytorch-linux-xenial-py3-clang5-asan"
- docker_build_job:
name: "pytorch-linux-xenial-py3.5"
image_name: "pytorch-linux-xenial-py3.5"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-clang7"
image_name: "pytorch-linux-xenial-py3.6-clang7"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-gcc4.8"
image_name: "pytorch-linux-xenial-py3.6-gcc4.8"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-gcc5.4"
image_name: "pytorch-linux-xenial-py3.6-gcc5.4"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-gcc7.2"
image_name: "pytorch-linux-xenial-py3.6-gcc7.2"
- docker_build_job:
name: "pytorch-linux-xenial-py3.6-gcc7"
image_name: "pytorch-linux-xenial-py3.6-gcc7"
- docker_build_job:
name: "pytorch-linux-xenial-pynightly"
image_name: "pytorch-linux-xenial-pynightly"

View File

@ -3,7 +3,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
filters:
branches:
only: nightly
@ -12,7 +12,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
filters:
branches:
only: nightly
@ -21,7 +21,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
filters:
branches:
only: nightly
@ -30,7 +30,7 @@
build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"
requires:
- setup
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"
filters:
branches:
only: nightly

View File

@ -1,7 +1,8 @@
# Pytorch iOS binary builds
- binary_ios_build:
name: pytorch_ios_10_2_1_nightly_x86_64_build
build_environment: "libtorch-ios-10.2.1-nightly-x86_64-build"
name: pytorch_ios_11_2_1_nightly_x86_64_build
build_environment: "libtorch-ios-11.2.1-nightly-x86_64-build"
context: org-member
ios_platform: "SIMULATOR"
ios_arch: "x86_64"
requires:
@ -10,8 +11,9 @@
branches:
only: nightly
- binary_ios_build:
name: pytorch_ios_10_2_1_nightly_arm64_build
build_environment: "libtorch-ios-10.2.1-nightly-arm64-build"
name: pytorch_ios_11_2_1_nightly_arm64_build
build_environment: "libtorch-ios-11.2.1-nightly-arm64-build"
context: org-member
ios_arch: "arm64"
ios_platform: "OS"
requires:
@ -20,12 +22,12 @@
branches:
only: nightly
- binary_ios_upload:
build_environment: "libtorch-ios-10.2.1-nightly-binary-build-upload"
build_environment: "libtorch-ios-11.2.1-nightly-binary-build-upload"
context: org-member
requires:
- setup
- pytorch_ios_10_2_1_nightly_x86_64_build
- pytorch_ios_10_2_1_nightly_arm64_build
- pytorch_ios_11_2_1_nightly_x86_64_build
- pytorch_ios_11_2_1_nightly_arm64_build
filters:
branches:
only: nightly

View File

@ -0,0 +1,16 @@
- pytorch_linux_test:
name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test
requires:
- setup
- pytorch_linux_xenial_py3_6_gcc5_4_build
build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"
resource_class: large
- pytorch_linux_test:
name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test
requires:
- setup
- pytorch_linux_xenial_py3_6_gcc5_4_build
build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"
resource_class: large

View File

@ -1,13 +1,17 @@
# Pytorch iOS PR builds
- pytorch_ios_build:
name: pytorch_ios_10_2_1_x86_64_build
build_environment: "pytorch-ios-10.2.1-x86_64_build"
name: pytorch_ios_11_2_1_x86_64_build
context: org-member
build_environment: "pytorch-ios-11.2.1-x86_64_build"
ios_arch: "x86_64"
ios_platform: "SIMULATOR"
requires:
- setup
- pytorch_ios_build:
name: pytorch_ios_10_2_1_arm64_build
build_environment: "pytorch-ios-10.2.1-arm64_build"
name: pytorch_ios_11_2_1_arm64_build
context: org-member
build_environment: "pytorch-ios-11.2.1-arm64_build"
ios_arch: "arm64"
ios_platform: "OS"
requires:
- setup

View File

@ -0,0 +1,7 @@
# PyTorch Mobile PR builds (use linux host toolchain + mobile build options)
- pytorch_linux_build:
name: pytorch_linux_xenial_py3_clang5_mobile_build
requires:
- setup
build_environment: "pytorch-linux-xenial-py3-clang5-mobile-build"
docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:405"

View File

@ -1,16 +1,3 @@
# Scheduled to run 4 hours after the binary jobs start
# These jobs need to run after all the binary jobs run, regardless of if the
# jobs failed or not. There's no way to do this in CircleCI right now, so we
# just schedule this to run after all the binary jobs should've finished.
# These jobs are all idempotent and very lightweight; they just upload html
# files that track what binaries are available and what their sizes are.
update_s3_htmls:
jobs:
- setup:
filters:
branches:
only: postnightly
- update_s3_htmls_for_nightlies:
context: org-member
requires:

View File

@ -1,34 +1,32 @@
---
# NOTE there must be no spaces before the '-', so put the comma first.
Checks: '
-*
,bugprone-*
,-bugprone-forward-declaration-namespace
,-bugprone-macro-parentheses
,-bugprone-lambda-function-name
,cppcoreguidelines-*
,-cppcoreguidelines-interfaces-global-init
,-cppcoreguidelines-owning-memory
,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
,-cppcoreguidelines-pro-bounds-constant-array-index
,-cppcoreguidelines-pro-bounds-pointer-arithmetic
,-cppcoreguidelines-pro-type-cstyle-cast
,-cppcoreguidelines-pro-type-reinterpret-cast
,-cppcoreguidelines-pro-type-static-cast-downcast
,-cppcoreguidelines-pro-type-union-access
,-cppcoreguidelines-pro-type-vararg
,-cppcoreguidelines-special-member-functions
,hicpp-exception-baseclass
,hicpp-avoid-goto
,modernize-*
,-modernize-return-braced-init-list
,-modernize-use-auto
,-modernize-use-default-member-init
,-modernize-use-using
,performance-*
,-performance-noexcept-move-constructor
# NOTE there must be no spaces before the '-', so put the comma last.
Checks: '-*,
bugprone-*,
-bugprone-forward-declaration-namespace,
-bugprone-macro-parentheses,
-bugprone-lambda-function-name,
cppcoreguidelines-*,
-cppcoreguidelines-interfaces-global-init,
-cppcoreguidelines-owning-memory,
-cppcoreguidelines-pro-bounds-array-to-pointer-decay,
-cppcoreguidelines-pro-bounds-constant-array-index,
-cppcoreguidelines-pro-bounds-pointer-arithmetic,
-cppcoreguidelines-pro-type-cstyle-cast,
-cppcoreguidelines-pro-type-reinterpret-cast,
-cppcoreguidelines-pro-type-static-cast-downcast,
-cppcoreguidelines-pro-type-union-access,
-cppcoreguidelines-pro-type-vararg,
-cppcoreguidelines-special-member-functions,
hicpp-exception-baseclass,
hicpp-avoid-goto,
modernize-*,
-modernize-return-braced-init-list,
-modernize-use-auto,
-modernize-use-default-member-init,
-modernize-use-using,
performance-*,
-performance-noexcept-move-constructor,
'
WarningsAsErrors: '*'
HeaderFilterRegex: 'torch/csrc/.*'
AnalyzeTemporaryDtors: false
CheckOptions:

View File

@ -5,10 +5,9 @@ max-line-length = 120
# E501 is not flexible enough, we're using B950 instead
ignore =
E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
# EXE001 is skipped for now because some files use shebang to determine Python version.
EXE001,
# these ignores are from flake8-bugbear; please fix!
B007,B008,
# these ignores are from flake8-comprehensions; please fix!
C400,C401,C402,C403,C404,C405,C407,C411,
exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,tools/amd_build/pyHIPIFY,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi
per-file-ignores = __init__.py: F401
exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git

View File

@ -7,29 +7,60 @@ on:
pull_request:
jobs:
quick-checks:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.x
architecture: x64
- name: Checkout PyTorch
uses: actions/checkout@v1
- name: Ensure consistent CircleCI YAML config
run: |
pip install -r requirements.txt
cd .circleci && ./ensure-consistency.py
- name: Ensure Docker version is correctly deployed
run: .circleci/validate-docker-version.py
- name: Shellcheck Jenkins scripts
run: |
sudo apt-get install -y shellcheck
.jenkins/run-shellcheck.sh
- name: Ensure no tabs
run: |
(! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))
- name: Ensure C++ source files are not executable
run: |
(! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))
- name: MyPy typecheck
run: |
pip install mypy mypy-extensions
mypy @mypy-files.txt
- name: C++ docs check
run: |
sudo apt-get install -y doxygen && pip install -r requirements.txt
cd docs/cpp/source && ./check-doxygen.sh
flake8-py3:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.7.4
python-version: 3.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@master
uses: actions/checkout@v1
- name: Checkout PR tip
run: |
set -eux
if [ -z "${GITHUB_HEAD_REF}" ]; then
# We are on master, just set the SHA from our current location
echo ::set-output name=commit_sha::${GITHUB_SHA}
else
# We are on a PR, so actions/checkout leaves us on merge commit.
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# We are on a PR, so actions/checkout leaves us on a merge commit.
# Check out the actual tip of the branch.
PR_TIP=$(git rev-parse HEAD^2)
git checkout ${PR_TIP}
echo ::set-output name=commit_sha::${PR_TIP}
git checkout ${{ github.event.pull_request.head.sha }}
fi
echo ::set-output name=commit_sha::$(git rev-parse HEAD)
id: get_pr_tip
- name: Run flake8
run: |
@ -46,3 +77,134 @@ jobs:
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
flake8-py2:
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 2.x
architecture: x64
- name: Fetch PyTorch
uses: actions/checkout@v1
- name: Checkout PR tip
run: |
set -eux
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# We are on a PR, so actions/checkout leaves us on a merge commit.
# Check out the actual tip of the branch.
git checkout ${{ github.event.pull_request.head.sha }}
fi
echo ::set-output name=commit_sha::$(git rev-parse HEAD)
id: get_pr_tip
- name: Run flake8
run: |
set -eux
pip install flake8
rm -rf .circleci
flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt
cat ${GITHUB_WORKSPACE}/flake8-output.txt
- name: Add annotations
uses: pytorch/add-annotations-github-action@master
with:
check_name: 'flake8-py2'
linter_output_path: 'flake8-output.txt'
commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
clang-tidy:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- name: Setup Python
uses: actions/setup-python@v1
with:
python-version: 3.x
architecture: x64
- name: Checkout PyTorch
uses: actions/checkout@v1
- name: Checkout PR tip
run: |
set -eux
if [[ "${{ github.event_name }}" == "pull_request" ]]; then
# We are on a PR, so actions/checkout leaves us on a merge commit.
# Check out the actual tip of the branch.
git checkout ${{ github.event.pull_request.head.sha }}
fi
echo ::set-output name=commit_sha::$(git rev-parse HEAD)
id: get_pr_tip
- name: Install dependencies
run: |
set -eux
# Install CUDA
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"
sudo apt-get update
sudo apt-get --no-install-recommends -y install cuda
# Install dependencies
pip install pyyaml
wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"
sudo apt-get update
sudo apt-get install -y clang-tidy-8
sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000
- name: Run clang-tidy
run: |
set -eux
git remote add upstream https://github.com/pytorch/pytorch
git fetch upstream "$GITHUB_BASE_REF"
BASE_SHA=${{ github.event.pull_request.base.sha }}
HEAD_SHA=${{ github.event.pull_request.head.sha }}
MERGE_BASE=$(git merge-base $BASE_SHA $HEAD_SHA)
if [[ ! -d build ]]; then
git submodule update --init --recursive
export USE_NCCL=0
# We really only need compile_commands.json, so no need to build!
time python setup.py --cmake-only build
# Generate ATen files.
time python aten/src/ATen/gen.py \
-s aten/src/ATen \
-d build/aten/src/ATen \
aten/src/ATen/Declarations.cwrap \
aten/src/THNN/generic/THNN.h \
aten/src/THCUNN/generic/THCUNN.h \
aten/src/ATen/nn.yaml \
aten/src/ATen/native/native_functions.yaml
# Generate PyTorch files.
time python tools/setup_helpers/generate_code.py \
--declarations-path build/aten/src/ATen/Declarations.yaml \
--nn-path aten/src
fi
# Run Clang-Tidy
# The negative filters below are to exclude files that include onnx_pb.h or
# caffe2_pb.h, otherwise we'd have to build protos as part of this CI job.
python tools/clang_tidy.py \
--verbose \
--paths torch/csrc/ \
--diff "$MERGE_BASE" \
-g"-torch/csrc/jit/export.cpp" \
-g"-torch/csrc/jit/import.cpp" \
-g"-torch/csrc/jit/netdef_converter.cpp" \
"$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt
cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt
- name: Add annotations
uses: suo/add-annotations-github-action@master
with:
check_name: 'clang-tidy'
linter_output_path: 'clang-tidy-output.txt'
commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}
regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorDesc>.*?) \[(?<errorCode>.*)\]'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

1
.gitignore vendored
View File

@ -42,6 +42,7 @@ dropout_model.pt
test/generated_type_hints_smoketest.py
test/htmlcov
test/cpp_extensions/install/
test/test-reports/
third_party/build/
tools/shared/_utils_internal.py
torch.egg-info/

2
.gitmodules vendored
View File

@ -117,4 +117,4 @@
[submodule "android/libs/fbjni"]
ignore = dirty
path = android/libs/fbjni
url = https://github.com/IvanKobzarev/fbjni.git
url = https://github.com/facebookincubator/fbjni.git

View File

@ -4,14 +4,6 @@ set -ex
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
# TODO: Migrate all centos jobs to use proper devtoolset
if [[ "$BUILD_ENVIRONMENT" == *py2-cuda9.0-cudnn7-centos7* ]]; then
# There is a bug in pango packge on Centos7 that causes undefined
# symbols, upgrading glib2 to >=2.56.1 solves the issue. See
# https://bugs.centos.org/view.php?id=15495
sudo yum install -y -q glib2-2.56.1
fi
# CMAKE_ARGS are only passed to 'cmake' and the -Dfoo=bar does not work with
# setup.py, so we build a list of foo=bars and then either convert it to
# -Dfoo=bars or export them before running setup.py
@ -169,7 +161,6 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then
export PATH="/usr/local/cuda/bin:$PATH"
fi
if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then
build_args+=("USE_ROCM=ON")
# This is needed to enable ImageInput operator in resnet50_trainer
build_args+=("USE_OPENCV=ON")
# This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

View File

@ -64,7 +64,7 @@ if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then
exit 0
fi
if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
# if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
# Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04
# See comments on
# https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830
@ -74,9 +74,9 @@ if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then
sudo pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl
sudo pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl
sudo pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl
else
pip install --user --no-cache-dir hypothesis==3.59.0
fi
# else
# pip install --user --no-cache-dir hypothesis==3.59.0
# fi
# Collect additional tests to run (outside caffe2/python)
EXTRA_TESTS=()
@ -93,6 +93,10 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then
# On ROCm, RCCL (distributed) development isn't complete.
# https://github.com/ROCmSoftwarePlatform/rccl
rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")
# This test has been flaky in ROCm CI (but note the tests are
# cpu-only so should be unrelated to ROCm)
rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/blobs_queue_db_test.py")
fi
# NB: Warnings are disabled because they make it harder to see what
@ -100,8 +104,13 @@ fi
echo "Running Python tests.."
if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then
# locale setting is required by click package with py3
export LC_ALL=C.UTF-8
export LANG=C.UTF-8
for loc in "en_US.utf8" "C.UTF-8"; do
if locale -a | grep "$loc" >/dev/null 2>&1; then
export LC_ALL="$loc"
export LANG="$loc"
break;
fi
done
fi
pip install --user pytest-sugar
@ -115,6 +124,7 @@ pip install --user pytest-sugar
--ignore "$caffe2_pypath/python/operator_test/matmul_op_test.py" \
--ignore "$caffe2_pypath/python/operator_test/pack_ops_test.py" \
--ignore "$caffe2_pypath/python/mkl/mkl_sbn_speed_test.py" \
--ignore "$caffe2_pypath/python/trt/test_pt_onnx_trt.py" \
${rocm_ignore_test[@]} \
"$caffe2_pypath/python" \
"${EXTRA_TESTS[@]}"
@ -123,7 +133,7 @@ pip install --user pytest-sugar
# torchvision tests #
#####################
if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
pip install -q --user git+https://github.com/pytorch/vision.git
pip install -q --user git+https://github.com/pytorch/vision.git@v0.5.0
pip install -q --user ninja
# JIT C++ extensions require ninja, so put it into PATH.
export PATH="/var/lib/jenkins/.local/bin:$PATH"
@ -131,8 +141,7 @@ if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then
# default pip version is too old(9.0.2), unable to support tag `manylinux2010`.
# Fix the pip error: Couldn't find a version that satisfies the requirement
sudo pip install --upgrade pip
pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==0.5.0.dev905
pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.1.0.dev1228
fi
"$ROOT_DIR/scripts/onnx/test.sh"
fi

View File

@ -36,6 +36,12 @@ if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then
exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"
fi
if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-mobile* ]]; then
# Use linux host toolchain + mobile build options in order to build & test
# mobile libtorch without having to setup Android/iOS toolchain/simulator.
exec ./scripts/build_mobile.sh -DBUILD_BINARY=ON "$@"
fi
echo "Python version:"
python --version
@ -61,6 +67,24 @@ if ! which conda; then
fi
fi
if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then
POSSIBLE_JAVA_HOMES=()
POSSIBLE_JAVA_HOMES+=(/usr/local)
POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)
POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)
for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do
if [[ -e "$JH/include/jni.h" ]] ; then
echo "Found jni.h under $JH"
export JAVA_HOME="$JH"
export BUILD_JNI=ON
break
fi
done
if [ -z "$JAVA_HOME" ]; then
echo "Did not find jni.h"
fi
fi
# Use special scripts for Android builds
if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then
export ANDROID_NDK=/opt/ndk
@ -112,9 +136,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
fi
python tools/amd_build/build_amd.py
# OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer
# LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip
USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user
python setup.py install --user
# runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping
bash tools/amd_build/unwrap_clang.sh
@ -137,10 +159,6 @@ if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then
export TORCH_CUDA_ARCH_LIST="6.0"
fi
if [[ "$BUILD_ENVIRONMENT" == *xenial-py3.6-gcc5.4* ]]; then
export DEBUG=1
fi
# Patch required to build xla
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
git clone --recursive https://github.com/pytorch/xla.git
@ -159,47 +177,60 @@ echo "The next three invocations are expected to fail with invalid command error
( ! get_exit_code python setup.py clean] )
( ! get_exit_code python setup.py clean bad_argument )
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
WERROR=1 python setup.py install
if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then
# ppc64le build fails when WERROR=1
# set only when building other architectures
# only use for "python setup.py install" line
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* && "$BUILD_ENVIRONMENT" != *clang* ]]; then
WERROR=1 python setup.py install
else
python setup.py install
fi
# TODO: I'm not sure why, but somehow we lose verbose commands
set -x
if which sccache > /dev/null; then
echo 'PyTorch Build Statistics'
sccache --show-stats
fi
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
fi
# Build custom operator tests.
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
python --version
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)"
make VERBOSE=1
popd
assert_git_not_dirty
else
python setup.py install
fi
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
make -j
popd
assert_git_not_dirty
fi
# TODO: I'm not sure why, but somehow we lose verbose commands
set -x
if which sccache > /dev/null; then
echo 'PyTorch Build Statistics'
sccache --show-stats
fi
assert_git_not_dirty
# Test documentation build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
pushd docs
# TODO: Don't run this here
pip_install -r requirements.txt || true
LC_ALL=C make html
popd
assert_git_not_dirty
fi
# Test standalone c10 build
if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then
mkdir -p c10/build
pushd c10/build
cmake ..
make -j
popd
assert_git_not_dirty
fi
# Test no-Python build
if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
# Test no-Python build
echo "Building libtorch"
# NB: Install outside of source directory (at the same level as the root
# pytorch folder) so that it doesn't get cleaned away prior to docker push.
@ -210,18 +241,6 @@ if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then
popd
fi
# Build custom operator tests.
CUSTOM_OP_BUILD="$PWD/../custom-op-build"
CUSTOM_OP_TEST="$PWD/test/custom_operator"
python --version
SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"
mkdir "$CUSTOM_OP_BUILD"
pushd "$CUSTOM_OP_BUILD"
cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)"
make VERBOSE=1
popd
assert_git_not_dirty
# Test XLA build
if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then
# TODO: Move this to Dockerfile.

View File

@ -158,7 +158,9 @@ fi
function pip_install() {
# retry 3 times
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"
# old versions of pip don't have the "--progress-bar" flag
pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@" ||\
pip install "$@" || pip install "$@" || pip install "$@"
}
function pip_uninstall() {
@ -166,6 +168,10 @@ function pip_uninstall() {
pip uninstall -y "$@" || pip uninstall -y "$@"
}
retry () {
$* || (sleep 1 && $*) || (sleep 2 && $*)
}
function get_exit_code() {
set +e
"$@"

View File

@ -13,12 +13,12 @@ mkdir -p ${WORKSPACE_DIR}
# If a local installation of conda doesn't exist, we download and install conda
if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then
mkdir -p ${WORKSPACE_DIR}
curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
retry curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh
retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3
fi
export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"
source ${WORKSPACE_DIR}/miniconda3/bin/activate
conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
retry conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
# The torch.hub tests make requests to GitHub.
#
@ -29,13 +29,13 @@ conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja
# > certificate verify failed: unable to get local issuer certificate
# > (_ssl.c:1056)
#
conda install -y -c conda-forge certifi
retry conda install -y -c conda-forge certifi
# Needed by torchvision, which is imported from TestHub in test_utils.py.
conda install -y pillow
retry conda install -y pillow
# Building with USE_DISTRIBUTED=1 requires libuv (for Gloo).
conda install -y libuv pkg-config
retry conda install -y libuv pkg-config
# Image commit tag is used to persist the build from the build job
# and to retrieve the build from the test job.

View File

@ -6,6 +6,9 @@ source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"
conda install -y six
pip install -q hypothesis "librosa>=0.6.2" psutil
# TODO move this to docker
pip install unittest-xml-reporting
# faulthandler become built-in since 3.3
if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then
pip install -q faulthandler

View File

@ -10,8 +10,10 @@ COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"
source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch (distributed only)"
if [ -n "${IN_CIRCLECI}" ]; then
# TODO move this to docker
pip_install unittest-xml-reporting
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get update
@ -28,7 +30,7 @@ if [ -n "${IN_CIRCLECI}" ]; then
fi
python tools/download_mnist.py --quiet -d test/cpp/api/mnist
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api
OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api
time python test/run_test.py --verbose -i distributed
time python test/run_test.py --verbose -i c10d
time python test/run_test.py --verbose -i c10d_spawn

View File

@ -12,6 +12,9 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"
echo "Testing pytorch"
if [ -n "${IN_CIRCLECI}" ]; then
# TODO move this to docker
pip_install unittest-xml-reporting
if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then
# TODO: move this to Docker
sudo apt-get -qq update
@ -32,6 +35,12 @@ if [ -n "${IN_CIRCLECI}" ]; then
fi
fi
if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then
# TODO: Move this to Docker
sudo apt-get -qq update
sudo apt-get -qq install --no-install-recommends libsndfile1
fi
# --user breaks ppc64le builds and these packages are already in ppc64le docker
if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
# JIT C++ extensions require ninja.
@ -40,7 +49,7 @@ if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then
export PATH="/var/lib/jenkins/.local/bin:$PATH"
# TODO: move this to Docker
pip_install --user hypothesis
pip_install --user "hypothesis==4.53.2"
# TODO: move this to Docker
PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)
@ -103,8 +112,18 @@ test_python_nn() {
assert_git_not_dirty
}
test_python_ge_config_simple() {
time python test/run_test.py --include jit_simple --verbose
assert_git_not_dirty
}
test_python_ge_config_legacy() {
time python test/run_test.py --include jit_legacy jit_fuser_legacy --verbose
assert_git_not_dirty
}
test_python_all_except_nn() {
time python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer
time python test/run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods
assert_git_not_dirty
}
@ -134,7 +153,7 @@ test_aten() {
}
test_torchvision() {
pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d
pip_install --user git+https://github.com/pytorch/vision.git@44a5bae933655ed7ff798669a43452b833f9ce01
}
test_libtorch() {
@ -195,15 +214,17 @@ test_backward_compatibility() {
pushd test/backward_compatibility
python dump_all_function_schemas.py --filename new_schemas.txt
pip_uninstall torch
pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
pip_install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html
python check_backward_compatibility.py --new-schemas new_schemas.txt
popd
set +x
assert_git_not_dirty
}
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then
(cd test && python -c "import torch; print(torch.__config__.show())")
(cd test && python -c "import torch; print(torch.__config__.parallel_info())")
fi
if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then
test_backward_compatibility
@ -211,6 +232,13 @@ if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then
elif [[ "${BUILD_ENVIRONMENT}" == *xla* || "${JOB_BASE_NAME}" == *xla* ]]; then
test_torchvision
test_xla
elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_legacy* || "${JOB_BASE_NAME}" == *ge_config_legacy* ]]; then
test_python_ge_config_legacy
elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_simple* || "${JOB_BASE_NAME}" == *ge_config_simple* ]]; then
test_python_ge_config_simple
elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then
# TODO: run some C++ tests
echo "no-op at the moment"
elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then
test_torchvision
test_python_nn

View File

@ -22,7 +22,7 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (
:: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352
call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0
)
pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow
pip install -q ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow
:: No need to install faulthandler since we only test Python >= 3.6 on Windows
:: faulthandler is builtin since Python 3.3

View File

@ -1,3 +1,3 @@
call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat
cd test && python run_test.py --exclude nn --verbose && cd ..
cd test && python run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose && cd ..
if ERRORLEVEL 1 exit /b 1

View File

@ -15,6 +15,10 @@ if(NOT CMAKE_VERSION VERSION_LESS 3.15.0)
cmake_policy(SET CMP0092 NEW)
endif()
if(NOT CMAKE_VERSION VERSION_LESS 3.10)
set(FIND_CUDA_MODULE_DEPRECATED ON)
endif()
# ---[ Project and semantic versioning.
project(Caffe2 CXX C)
@ -103,7 +107,8 @@ option(BUILD_BINARY "Build C++ binaries" OFF)
option(BUILD_DOCS "Build Caffe2 documentation" OFF)
option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)
option(BUILD_PYTHON "Build Python binaries" ON)
option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)
cmake_dependent_option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON
"NOT MSVC" OFF)
option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)
option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)
option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)
@ -115,6 +120,7 @@ cmake_dependent_option(
CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON
"NOT BUILD_SHARED_LIBS" OFF)
option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)
option(BUILD_JNI "Build JNI bindings" OFF)
cmake_dependent_option(
INSTALL_TEST "Install test binaries if BUILD_TEST is on" ON
"BUILD_TEST" OFF)
@ -140,7 +146,7 @@ option(USE_METAL "Use Metal for iOS build" ON)
option(USE_NATIVE_ARCH "Use -march=native" OFF)
cmake_dependent_option(
USE_NCCL "Use NCCL" ON
"USE_CUDA;UNIX;NOT APPLE" OFF)
"USE_CUDA OR USE_ROCM;UNIX;NOT APPLE" OFF)
cmake_dependent_option(
USE_STATIC_NCCL "Use static NCCL" OFF
"USE_NCCL" OFF)
@ -199,6 +205,8 @@ cmake_dependent_option(
"MSVC" OFF)
set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")
set(SELECTED_OP_LIST "" CACHE STRING
"Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default.")
# This is a fix for a rare build issue on Ubuntu:
# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk
@ -312,7 +320,6 @@ if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)
set(USE_FBGEMM OFF)
set(USE_PYTORCH_QNNPACK ON)
set(USE_QNNPACK OFF)
set(USE_STATIC_DISPATCH ON)
set(INTERN_DISABLE_ONNX ON)
set(INTERN_DISABLE_AUTOGRAD ON)
set(INTERN_USE_EIGEN_BLAS ON)
@ -490,7 +497,7 @@ if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")
endif()
if(ANDROID)
if(ANDROID AND (NOT ANDROID_DEBUG_SYMBOLS))
if(CMAKE_COMPILER_IS_GNUCXX)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")
else()
@ -642,5 +649,12 @@ if (BUILD_BINARY)
add_subdirectory(binaries)
endif()
# ---[ JNI
if (BUILD_JNI)
set(BUILD_LIBTORCH_WITH_JNI 1)
set(FBJNI_SKIP_TESTS 1)
add_subdirectory(android/pytorch_android)
endif()
include(cmake/Summary.cmake)
caffe2_print_configuration_summary()

View File

@ -4,10 +4,10 @@
/docs/cpp @goldsborough @ebetica @yf225
/torch/csrc/api/ @ebetica @goldsborough @yf225
/test/cpp/api/ @ebetica @goldsborough @yf225
/torch/lib/c10d/ @pietern @mrshenli
/torch/csrc/distributed/ @pietern @mrshenli
/torch/distributed/ @apaszke @pietern @mrshenli
/test/test_c10d.py @pietern @mrshenli
/torch/lib/c10d/ @pietern @mrshenli @zhaojuanmao
/torch/csrc/distributed/ @pietern @mrshenli @zhaojuanmao
/torch/distributed/ @apaszke @pietern @mrshenli @zhaojuanmao
/test/test_c10d.py @pietern @mrshenli @zhaojuanmao
/torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
# Not there to stricly require the approval, but to be tagged as a reviewer
@ -19,3 +19,9 @@
/torch/autograd/ @apaszke
/torch/jit/ @apaszke
/torch/utils/data/ @apaszke
# Distributed RPC Framework.
/torch/csrc/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
/torch/csrc/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
/torch/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
/torch/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao

View File

@ -216,8 +216,10 @@ To build the documentation:
cd docs
pip install -r requirements.txt
# `katex` must also be available in your PATH.
# If you are using Ubuntu or Debian, you can install it with:
# sudo apt install katex
# You can either install katex globally if you have properly configured npm:
# npm install -g katex
# Or if you prefer an uncontaminated global executable environment or do not want to go through the node configuration:
# npm install katex && export PATH="$PATH:$(pwd)/node_modules/.bin"
```
3. Generate the documentation HTML files. The generated files will be in `docs/build/html`.
@ -284,6 +286,57 @@ cd docs
make doctest
```
## Profiling with `py-spy`
Evaluating the performance impact of code changes in PyTorch can be complicated,
particularly if code changes happen in compiled code. One simple way to profile
both Python and C++ code in PyTorch is to use
[`py-spy`](https://github.com/benfred/py-spy), a sampling profiler for Python
that has the ability to profile native code and Python code in the same session.
`py-spy` can be installed via `pip`:
```bash
$ pip install py-spy
```
To use `py-spy`, first write a Python test script that exercises the
functionality you would like to profile. For example, this script profiles
`torch.add`:
```python
import torch
t1 = torch.tensor([[1, 1], [1, 1.]])
t2 = torch.tensor([[0, 0], [0, 0.]])
for _ in range(1000000):
torch.add(t1, t2)
```
Since the `torch.add` operation happens in microseconds, we repeat it a large
number of times to get good statistics. The most straightforward way to use
`py-spy` with such a script is to generate a [flame
graph](http://www.brendangregg.com/flamegraphs.html):
```bash
$ py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py
```
This will output a file named `profile.svg` containing a flame graph you can
view in a web browser or SVG viewer. Individual stack frame entries in the graph
can be selected interactively with your mouse to zoom in on a particular part of
the program execution timeline. The `--native` command-line option tells
`py-spy` to record stack frame entries for PyTorch C++ code. To get line numbers
for C++ code it may be necessary to compile PyTorch in debug mode by prepending
your `setup.py develop` call to compile PyTorch with `DEBUG=1`. Depending on
your operating system it may also be necessary to run `py-spy` with root
privileges.
`py-spy` can also work in an `htop`-like "live profiling" mode and can be
tweaked to adjust the stack sampling rate, see the `py-spy` readme for more
details.
## Managing Multiple Build Trees
One downside to using `python setup.py develop` is that your development
@ -438,6 +491,38 @@ ccache -F 0
# deploy (and add to ~/.bashrc for later)
export PATH="/usr/lib/ccache:$PATH"
```
It is also possible to install `ccache` via `conda` by installing it from the
community-maintained `conda-forge` channel. Here is how to set up `ccache` this
way:
```bash
# install ccache
conda install -c conda-forge ccache
# set up ccache compiler symlinks
mkdir ~/ccache
mkdir ~/ccache/lib
mkdir ~/ccache/cuda
ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/cc
ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/c++
ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/gcc
ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/g++
ln -s $CONDA_PREFIX/bin/ccache ~/ccache/cuda/nvcc
# update PATH to reflect symlink locations, consider
# adding this to your .bashrc
export PATH=~/ccache/lib:$PATH
export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc
# increase ccache cache size to 25 GiB
ccache -M 25Gi
```
To check this is working, do two clean builds of pytorch in a row. The second
build should be substantially and noticeably faster than the first build.
#### Use a faster linker
If you are editing a single file and rebuilding in a tight loop, the time spent
linking will dominate. The system linker available in most Linux distributions

View File

@ -44,7 +44,7 @@ At a granular level, PyTorch is a library that consists of the following compone
| [**torch.multiprocessing**](https://pytorch.org/docs/stable/multiprocessing.html) | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |
| [**torch.utils**](https://pytorch.org/docs/stable/data.html) | DataLoader and other utility functions for convenience |
Usually one uses PyTorch either as:
Usually PyTorch is used either as:
- a replacement for NumPy to use the power of GPUs.
- a deep learning research platform that provides maximum flexibility and speed.
@ -88,7 +88,7 @@ You get the best of speed and flexibility for your crazy research.
PyTorch is not a Python binding into a monolithic C++ framework.
It is built to be deeply integrated into Python.
You can use it naturally like you would use [NumPy](http://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](http://scikit-learn.org) etc.
You can use it naturally like you would use [NumPy](https://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](https://scikit-learn.org) etc.
You can write your new neural network layers in Python itself, using your favorite libraries
and use packages such as Cython and Numba.
Our goal is to not reinvent the wheel where appropriate.
@ -124,7 +124,7 @@ You can write new neural network layers in Python using the torch API
[or your favorite NumPy-based libraries such as SciPy](https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).
If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.
There is no wrapper code that needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).
No wrapper code needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).
## Installation
@ -145,12 +145,12 @@ Python wheels for NVIDIA's Jetson Nano, Jetson TX2, and Jetson AGX Xavier are av
- Python 2.7: https://nvidia.box.com/v/torch-weekly-cp27-jetson-jp42
- Python 3.6: https://nvidia.box.com/v/torch-weekly-cp36-jetson-jp42
They requires JetPack 4.2 and above and are maintained by @dusty-nv
They require JetPack 4.2 and above, and @dusty-nv maintains them
### From Source
If you are installing from source, we highly recommend installing an [Anaconda](https://www.anaconda.com/distribution/#download-section) environment.
If you are installing from source, you will need a C++14 compiler. Also, we highly recommend installing an [Anaconda](https://www.anaconda.com/distribution/#download-section) environment.
You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.
Once you have [Anaconda](https://www.anaconda.com/distribution/#download-section) installed, here are the instructions.
@ -167,7 +167,7 @@ If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xa
#### Install Dependencies
Common
Common (only install `typing` for Python <3.5)
```
conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing
```
@ -175,7 +175,7 @@ conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing
On Linux
```bash
# Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda90 # or [magma-cuda92 | magma-cuda100 ] depending on your cuda version
conda install -c pytorch magma-cuda90 # or [magma-cuda92 | magma-cuda100 | magma-cuda101 ] depending on your cuda version
```
#### Get the PyTorch Source
@ -209,13 +209,13 @@ If the version of Visual Studio 2017 is higher than 15.4.5, installing of "VC++
<br/> There is no guarantee of the correct building with VC++ 2017 toolsets, others than version 15.4 v14.11.
<br/> "VC++ 2017 version 15.4 v14.11 toolset" might be installed onto already installed Visual Studio 2017 by running its installation once again and checking the corresponding checkbox under "Individual components"/"Compilers, build tools, and runtimes".
NVTX is a part of CUDA distributive, where it is called "Nsight Compute". For installing it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.
NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.
Be sure that CUDA with Nsight Compute is installed after Visual Studio 2017.
Currently VS 2017, VS 2019 and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise it will use VS 2017.
<br/> If Ninja is selected as the generator, the latest MSVC which is newer than VS 2015 (14.0) will get selected as the underlying toolchain if you have Python > 3.5, otherwise VS 2015 will be selected so you'll have to activate the environment. If you use CMake <= 3.14.2 and has VS 2019 installed, then even if you specify VS 2017 as the generator, VS 2019 will get selected as the generator.
CUDA and MSVC has strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).
CUDA and MSVC have strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).
| CUDA version | Newest supported VS version |
| ------------ | ------------------------------------------------------- |
@ -234,7 +234,7 @@ set FORCE_PY27_BUILD=1
:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.
set CMAKE_GENERATOR=Visual Studio 15 2017
:: Read the content in the previous section carefully before you preceed.
:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
@ -331,7 +331,7 @@ Sending a PR without discussion might end up resulting in a rejected PR, because
PyTorch is a community driven project with several skillful engineers and researchers contributing to it.
PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury), [Soumith Chintala](http://soumith.ch) and [Gregory Chanan](https://github.com/gchanan) with major contributions coming from hundreds of talented individuals in various forms and means.
A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.
A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.
Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.

10
android/.gitignore vendored
View File

@ -6,11 +6,5 @@ gradle/wrapper
.idea/*
.externalNativeBuild
build
pytorch_android/src/main/cpp/libtorch_include/x86/**
pytorch_android/src/main/cpp/libtorch_include/x86_64/**
pytorch_android/src/main/cpp/libtorch_include/armeabi-v7a/**
pytorch_android/src/main/cpp/libtorch_include/arm64-v8a/**
pytorch_android/src/main/jniLibs/x86/**
pytorch_android/src/main/jniLibs/x86_64/**
pytorch_android/src/main/jniLibs/armeabi-v7a/**
pytorch_android/src/main/jniLibs/arm64-v8a/**
pytorch_android/src/main/cpp/libtorch_include/**
pytorch_android/src/main/jniLibs/**

118
android/README.md Normal file
View File

@ -0,0 +1,118 @@
# Android
## Demo applications and tutorials
Demo applications with code walk-through can be find in [this github repo](https://github.com/pytorch/android-demo-app).
## Publishing
##### Release
Release artifacts are published to jcenter:
```
repositories {
jcenter()
}
dependencies {
implementation 'org.pytorch:pytorch_android:1.3.0'
implementation 'org.pytorch:pytorch_android_torchvision:1.3.0'
}
```
##### Nightly
Nightly(snapshots) builds are published every night from `master` branch to [nexus sonatype snapshots repository](https://oss.sonatype.org/#nexus-search;quick~pytorch_android)
To use them repository must be specified explicitly:
```
repositories {
maven {
url "https://oss.sonatype.org/content/repositories/snapshots"
}
}
dependencies {
...
implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'
implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'
...
}
```
The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.4.0-SNAPSHOT`.
## Building PyTorch Android from Source
In some cases you might want to use a local build of pytorch android, for example you may build custom libtorch binary with another set of operators or to make local changes.
For this you can use `./scripts/build_pytorch_android.sh` script.
```
git clone https://github.com/pytorch/pytorch.git
cd pytorch
sh ./scripts/build_pytorch_android.sh
```
The workflow contains several steps:
1\. Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64)
2\. Create symbolic links to the results of those builds:
`android/pytorch_android/src/main/jniLibs/${abi}` to the directory with output libraries
`android/pytorch_android/src/main/cpp/libtorch_include/${abi}` to the directory with headers. These directories are used to build `libpytorch.so` library that will be loaded on android device.
3\. And finally run `gradle` in `android/pytorch_android` directory with task `assembleRelease`
Script requires that Android SDK, Android NDK and gradle are installed.
They are specified as environment variables:
`ANDROID_HOME` - path to [Android SDK](https://developer.android.com/studio/command-line/sdkmanager.html)
`ANDROID_NDK` - path to [Android NDK](https://developer.android.com/studio/projects/install-ndk)
`GRADLE_HOME` - path to [gradle](https://gradle.org/releases/)
After successful build you should see the result as aar file:
```
$ find pytorch_android/build/ -type f -name *aar
pytorch_android/build/outputs/aar/pytorch_android.aar
pytorch_android_torchvision/build/outputs/aar/pytorch_android.aar
libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni.aar
```
It can be used directly in android projects, as a gradle dependency:
```
allprojects {
repositories {
flatDir {
dirs 'libs'
}
}
}
android {
...
packagingOptions {
pickFirst "**/libfbjni.so"
}
...
}
dependencies {
implementation(name:'pytorch_android', ext:'aar')
implementation(name:'pytorch_android_torchvision', ext:'aar')
implementation(name:'pytorch_android_fbjni', ext:'aar')
}
```
At the moment for the case of using aar files directly we need additional configuration due to packaging specific (`libfbjni.so` is packaged in both `pytorch_android_fbjni.aar` and `pytorch_android.aar`).
```
packagingOptions {
pickFirst "**/libfbjni.so"
}
```
## More Details
You can find more details about the PyTorch Android API in the [Javadoc](https://pytorch.org/docs/stable/packages.html).

View File

@ -1,33 +1,33 @@
buildscript {
ext {
minSdkVersion = 21
targetSdkVersion = 28
compileSdkVersion = 28
buildToolsVersion = '28.0.3'
coreVersion = "1.2.0"
extJUnitVersion = "1.1.1"
runnerVersion = "1.2.0"
rulesVersion = "1.2.0"
junitVersion = "4.12"
}
repositories {
google()
mavenLocal()
mavenCentral()
jcenter()
}
dependencies {
classpath 'com.android.tools.build:gradle:3.3.2'
classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"
classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
}
}
allprojects {
buildscript {
ext {
minSdkVersion = 21
targetSdkVersion = 28
compileSdkVersion = 28
buildToolsVersion = '28.0.3'
coreVersion = "1.2.0"
extJUnitVersion = "1.1.1"
runnerVersion = "1.2.0"
rulesVersion = "1.2.0"
junitVersion = "4.12"
}
repositories {
google()
mavenLocal()
mavenCentral()
jcenter()
}
dependencies {
classpath 'com.android.tools.build:gradle:3.3.2'
classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"
classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"
classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"
}
}
repositories {
google()
jcenter()

100
android/build_test_app.sh Executable file
View File

@ -0,0 +1,100 @@
#!/bin/bash
set -eux
PYTORCH_DIR="$(cd $(dirname $0)/..; pwd -P)"
PYTORCH_ANDROID_DIR=$PYTORCH_DIR/android
WORK_DIR=$PYTORCH_DIR
echo "PYTORCH_DIR:$PYTORCH_DIR"
echo "WORK_DIR:$WORK_DIR"
echo "ANDROID_HOME:$ANDROID_HOME"
if [ ! -z "$ANDROID_HOME" ]; then
echo "ANDROID_HOME not set; please set it to Android sdk directory"
fi
if [ ! -d $ANDROID_HOME ]; then
echo "ANDROID_HOME not a directory; did you install it under $ANDROID_HOME?"
exit 1
fi
GRADLE_PATH=gradle
GRADLE_NOT_FOUND_MSG="Unable to find gradle, please add it to PATH or set GRADLE_HOME"
if [ ! -x "$(command -v gradle)" ]; then
if [ -z "$GRADLE_HOME" ]; then
echo GRADLE_NOT_FOUND_MSG
exit 1
fi
GRADLE_PATH=$GRADLE_HOME/bin/gradle
if [ ! -f "$GRADLE_PATH" ]; then
echo GRADLE_NOT_FOUND_MSG
exit 1
fi
fi
echo "GRADLE_PATH:$GRADLE_PATH"
ABIS_LIST="armeabi-v7a,arm64-v8a,x86,x86_64"
CUSTOM_ABIS_LIST=false
if [ $# -gt 0 ]; then
ABIS_LIST=$1
CUSTOM_ABIS_LIST=true
fi
echo "ABIS_LIST:$ABIS_LIST"
LIB_DIR=$PYTORCH_ANDROID_DIR/pytorch_android/src/main/jniLibs
INCLUDE_DIR=$PYTORCH_ANDROID_DIR/pytorch_android/src/main/cpp/libtorch_include
mkdir -p $LIB_DIR
rm -f $LIB_DIR/*
mkdir -p $INCLUDE_DIR
for abi in $(echo $ABIS_LIST | tr ',' '\n')
do
echo "abi:$abi"
OUT_DIR=$WORK_DIR/build_android_$abi
rm -rf $OUT_DIR
mkdir -p $OUT_DIR
pushd $PYTORCH_DIR
python $PYTORCH_DIR/setup.py clean
ANDROID_ABI=$abi BUILD_PYTORCH_MOBILE=1 VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)
cp -R $PYTORCH_DIR/build_android/install/lib $OUT_DIR/
cp -R $PYTORCH_DIR/build_android/install/include $OUT_DIR/
echo "$abi build output lib,include copied to $OUT_DIR"
LIB_LINK_PATH=$LIB_DIR/$abi
INCLUDE_LINK_PATH=$INCLUDE_DIR/$abi
rm -f $LIB_LINK_PATH
rm -f $INCLUDE_LINK_PATH
ln -s $OUT_DIR/lib $LIB_LINK_PATH
ln -s $OUT_DIR/include $INCLUDE_LINK_PATH
done
# To set proxy for gradle add following lines to ./gradle/gradle.properties:
# systemProp.http.proxyHost=...
# systemProp.http.proxyPort=8080
# systemProp.https.proxyHost=...
# systemProp.https.proxyPort=8080
if [ "$CUSTOM_ABIS_LIST" = true ]; then
NDK_DEBUG=1 $GRADLE_PATH -PnativeLibsDoNotStrip=true -PABI_FILTERS=$ABIS_LIST -p $PYTORCH_ANDROID_DIR clean test_app:assembleDebug
else
NDK_DEBUG=1 $GRADLE_PATH -PnativeLibsDoNotStrip=true -p $PYTORCH_ANDROID_DIR clean test_app:assembleDebug
fi
find $PYTORCH_ANDROID_DIR -type f -name *apk
find $PYTORCH_ANDROID_DIR -type f -name *apk | xargs echo "To install apk run: $ANDROID_HOME/platform-tools/adb install -r "
popd

View File

@ -1,6 +1,6 @@
ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64
VERSION_NAME=0.0.7-SNAPSHOT
VERSION_NAME=1.4.0-SNAPSHOT
GROUP=org.pytorch
MAVEN_GROUP=org.pytorch
POM_URL=https://github.com/pytorch/pytorch/tree/master/android
@ -22,3 +22,8 @@ ANDROID_MAVEN_GRADLE_PLUGIN_VERSION=2.1
# Gradle internals
org.gradle.internal.repository.max.retries=1
org.gradle.jvmargs=-XX:MaxMetaspaceSize=1024m
android.useAndroidX=true
android.enableJetifier=true
nativeLibsDoNotStrip=false

View File

@ -25,6 +25,18 @@ def getRepositoryPassword() {
return hasProperty('SONATYPE_NEXUS_PASSWORD') ? SONATYPE_NEXUS_PASSWORD : ""
}
def getHttpProxyHost() {
return project.properties['systemProp.http.proxyHost']
}
def getHttpProxyPort() {
return project.properties['systemProp.http.proxyPort']
}
def needProxy() {
return (getHttpProxyHost() != null) && (getHttpProxyPort() != null)
}
afterEvaluate { project ->
uploadArchives {
repositories {
@ -37,9 +49,15 @@ afterEvaluate { project ->
repository(url: getReleaseRepositoryUrl()) {
authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())
if (needProxy()) {
proxy(host: getHttpProxyHost(), port: getHttpProxyPort() as Integer, type: 'http')
}
}
snapshotRepository(url: getSnapshotRepositoryUrl()) {
authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())
if (needProxy()) {
proxy(host: getHttpProxyHost(), port: getHttpProxyPort() as Integer, type: 'http')
}
}
pom.project {

View File

@ -11,12 +11,15 @@ android {
sourceSets {
main {
manifest.srcFile '../fbjni/ApplicationManifest.xml'
manifest.srcFile '../fbjni/java/com/facebook/jni/AndroidManifest.xml'
java {
srcDir '../fbjni/java'
}
}
}
ndk {
abiFilters ABI_FILTERS.split(",")
}
}
buildTypes {
debug {
@ -35,6 +38,7 @@ android {
dependencies {
compileOnly 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
}
apply from: rootProject.file('gradle/release.gradle')

View File

@ -1,63 +1,110 @@
cmake_minimum_required(VERSION 3.4.1)
project(pytorch CXX)
project(pytorch_jni CXX)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_VERBOSE_MAKEFILE ON)
set(TRACE_ENABLED OFF)
if(DEFINED ENV{TRACE_ENABLED})
if($ENV{TRACE_ENABLED} STREQUAL "1")
message(STATUS "TRACE_ENABLED ON")
set(TRACE_ENABLED ON)
endif()
endif()
if(NOT TRACE_ENABLED)
message(STATUS "TRACE_ENABLED OFF")
endif()
set(pytorch_android_DIR ${CMAKE_CURRENT_LIST_DIR}/src/main/cpp)
set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})
if (ANDROID_ABI)
set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})
set(BUILD_SUBDIR ${ANDROID_ABI})
elseif(BUILD_LIBTORCH_WITH_JNI)
# Don't need LIBTORCH_HOME if we're building from within PyTorch.
else()
# Building against a pre-built libtorch.
if (NOT LIBTORCH_HOME)
message(FATAL_ERROR
"pytorch_android requires LIBTORCH_HOME to be defined for non-Android builds.")
endif()
set(libtorch_include_DIR ${LIBTORCH_HOME}/include)
link_directories(${LIBTORCH_HOME}/lib)
set(BUILD_SUBDIR host)
endif()
message(STATUS "libtorch dir:${libtorch_DIR}")
configure_file(
${pytorch_android_DIR}/cmake_macros.h.in
${pytorch_android_DIR}/cmake_macros.h)
file(GLOB pytorch_android_SOURCES
${pytorch_android_DIR}/*.cpp
${pytorch_android_DIR}/pytorch_jni_jit.cpp
${pytorch_android_DIR}/pytorch_jni_common.cpp
${pytorch_android_DIR}/pytorch_jni_common.h
)
add_library(pytorch SHARED
add_library(pytorch_jni SHARED
${pytorch_android_SOURCES}
)
target_compile_options(pytorch PRIVATE
target_compile_options(pytorch_jni PRIVATE
-fexceptions
)
target_include_directories(pytorch PUBLIC
target_include_directories(pytorch_jni PUBLIC
${libtorch_include_DIR}
)
set(BUILD_DIR ${CMAKE_SOURCE_DIR}/build)
file(MAKE_DIRECTORY ${BUILD_DIR})
set(fbjni_DIR ${CMAKE_CURRENT_LIST_DIR}/../libs/fbjni/)
set(fbjni_BUILD_DIR ${BUILD_DIR}/fbjni/${ANDROID_ABI})
set(fbjni_BUILD_DIR ${CMAKE_BINARY_DIR}/fbjni/${BUILD_SUBDIR})
add_subdirectory(${fbjni_DIR} ${fbjni_BUILD_DIR})
function(import_static_lib name)
add_library(${name} STATIC IMPORTED)
set_property(
TARGET ${name}
PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)
endfunction(import_static_lib)
if (ANDROID_ABI)
import_static_lib(libtorch)
import_static_lib(libc10)
import_static_lib(libnnpack)
import_static_lib(libpytorch_qnnpack)
import_static_lib(libeigen_blas)
import_static_lib(libcpuinfo)
import_static_lib(libclog)
function(import_static_lib name)
add_library(${name} STATIC IMPORTED)
set_property(
TARGET ${name}
PROPERTY IMPORTED_LOCATION
${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)
endfunction(import_static_lib)
target_link_libraries(pytorch
fbjni
-Wl,--gc-sections
-Wl,--whole-archive
libtorch
-Wl,--no-whole-archive
libc10
libnnpack
libpytorch_qnnpack
libeigen_blas
libcpuinfo
libclog
)
import_static_lib(libtorch)
import_static_lib(libc10)
import_static_lib(libnnpack)
import_static_lib(libpytorch_qnnpack)
import_static_lib(libeigen_blas)
import_static_lib(libcpuinfo)
import_static_lib(libclog)
# Link most things statically on Android.
target_link_libraries(pytorch_jni
fbjni
-Wl,--gc-sections
-Wl,--whole-archive
libtorch
-Wl,--no-whole-archive
libc10
libnnpack
libpytorch_qnnpack
libeigen_blas
libcpuinfo
libclog
)
else()
# Prefer dynamic linking on the host
target_link_libraries(pytorch_jni
fbjni
torch
c10
nnpack
pytorch_qnnpack
cpuinfo
clog
)
endif()

View File

@ -27,6 +27,10 @@ android {
}
sourceSets {
main {
java {
exclude 'org/pytorch/LiteModuleLoader.java'
exclude 'org/pytorch/LiteNativePeer.java'
}
jniLibs.srcDirs = ['src/main/jniLibs']
}
}
@ -42,6 +46,10 @@ android {
} else {
pickFirst '**/libfbjni.so'
}
if (nativeLibsDoNotStrip.toBoolean()) {
doNotStrip "**/*.so"
logger.warn('WARNING: nativeLibsDoNotStrip==true; debug symbols included')
}
}
useLibrary 'android.test.runner'
@ -53,6 +61,7 @@ dependencies {
api project(':fbjni')
implementation 'com.android.support:appcompat-v7:28.0.0'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
testImplementation 'junit:junit:' + rootProject.junitVersion
testImplementation 'androidx.test:core:' + rootProject.coreVersion
@ -72,4 +81,3 @@ task sourcesJar(type: Jar) {
}
artifacts.add('archives', sourcesJar)

View File

@ -0,0 +1,36 @@
// Copyright (c) Facebook, Inc. and its affiliates.
//
// This source code is licensed under the Apache-2 license found in the
// LICENSE file in the root directory of this source tree.
plugins {
id 'java-library'
}
repositories {
mavenLocal()
jcenter()
}
sourceSets {
main {
java.srcDir '../src/main/java'
}
test {
java {
srcDir '../src/androidTest/java'
exclude '**/PytorchInstrumented*'
}
resources.srcDirs = ["../src/androidTest/assets"]
}
}
dependencies {
compileOnly 'com.google.code.findbugs:jsr305:3.0.1'
implementation 'com.facebook.soloader:nativeloader:0.8.0'
implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'
testImplementation 'junit:junit:4.12'
}
apply from: rootProject.file('gradle/release.gradle')

View File

@ -0,0 +1,4 @@
POM_NAME=pytorch_java_only pytorch java api
POM_DESCRIPTION=pytorch_java_only pytorch java api
POM_ARTIFACT_ID=pytorch_java_only
POM_PACKAGING=jar

View File

@ -0,0 +1,22 @@
package org.pytorch;
import org.junit.BeforeClass;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.util.Objects;
public class PytorchHostTests extends PytorchTestBase {
@Override
protected String assetFilePath(String assetName) throws IOException {
Path tempFile = Files.createTempFile("test", ".pt");
try (InputStream resource = Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("test.pt"))) {
Files.copy(resource, tempFile, StandardCopyOption.REPLACE_EXISTING);
}
return tempFile.toAbsolutePath().toString();
}
}

View File

@ -2,8 +2,6 @@ package org.pytorch;
import android.content.Context;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import java.io.File;
@ -11,294 +9,15 @@ import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;
import androidx.test.ext.junit.runners.AndroidJUnit4;
import androidx.test.platform.app.InstrumentationRegistry;
import static org.junit.Assert.assertArrayEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertTrue;
@RunWith(AndroidJUnit4.class)
public class PytorchInstrumentedTests {
public class PytorchInstrumentedTests extends PytorchTestBase {
private static final String TEST_MODULE_ASSET_NAME = "test.pt";
@Before
public void setUp() {
System.loadLibrary("pytorch");
}
@Test
public void testForwardNull() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final IValue input =
IValue.tensor(Tensor.newInt8Tensor(new long[] {1}, Tensor.allocateByteBuffer(1)));
assertTrue(input.isTensor());
final IValue output = module.forward(input);
assertTrue(output.isNull());
}
@Test
public void testEqBool() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (boolean value : new boolean[] {false, true}) {
final IValue input = IValue.bool(value);
assertTrue(input.isBool());
assertTrue(value == input.getBool());
final IValue output = module.runMethod("eqBool", input);
assertTrue(output.isBool());
assertTrue(value == output.getBool());
}
}
@Test
public void testEqInt() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (long value : new long[] {Long.MIN_VALUE, -1024, -1, 0, 1, 1024, Long.MAX_VALUE}) {
final IValue input = IValue.long64(value);
assertTrue(input.isLong());
assertTrue(value == input.getLong());
final IValue output = module.runMethod("eqInt", input);
assertTrue(output.isLong());
assertTrue(value == output.getLong());
}
}
@Test
public void testEqFloat() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
double[] values =
new double[] {
-Double.MAX_VALUE,
Double.MAX_VALUE,
-Double.MIN_VALUE,
Double.MIN_VALUE,
-Math.exp(1.d),
-Math.sqrt(2.d),
-3.1415f,
3.1415f,
-1,
0,
1,
};
for (double value : values) {
final IValue input = IValue.double64(value);
assertTrue(input.isDouble());
assertTrue(value == input.getDouble());
final IValue output = module.runMethod("eqFloat", input);
assertTrue(output.isDouble());
assertTrue(value == output.getDouble());
}
}
@Test
public void testEqTensor() throws IOException {
final long[] inputTensorShape = new long[] {1, 3, 224, 224};
final long numElements = Tensor.numel(inputTensorShape);
final float[] inputTensorData = new float[(int) numElements];
for (int i = 0; i < numElements; ++i) {
inputTensorData[i] = i;
}
final Tensor inputTensor = Tensor.newFloat32Tensor(inputTensorShape, inputTensorData);
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final IValue input = IValue.tensor(inputTensor);
assertTrue(input.isTensor());
assertTrue(inputTensor == input.getTensor());
final IValue output = module.runMethod("eqTensor", input);
assertTrue(output.isTensor());
final Tensor outputTensor = output.getTensor();
assertNotNull(outputTensor);
assertArrayEquals(inputTensorShape, outputTensor.shape);
float[] outputData = outputTensor.getDataAsFloatArray();
for (int i = 0; i < numElements; i++) {
assertTrue(inputTensorData[i] == outputData[i]);
}
}
@Test
public void testEqDictIntKeyIntValue() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final Map<Long, IValue> inputMap = new HashMap<>();
inputMap.put(Long.MIN_VALUE, IValue.long64(-Long.MIN_VALUE));
inputMap.put(Long.MAX_VALUE, IValue.long64(-Long.MAX_VALUE));
inputMap.put(0l, IValue.long64(0l));
inputMap.put(1l, IValue.long64(-1l));
inputMap.put(-1l, IValue.long64(1l));
final IValue input = IValue.dictLongKey(inputMap);
assertTrue(input.isDictLongKey());
final IValue output = module.runMethod("eqDictIntKeyIntValue", input);
assertTrue(output.isDictLongKey());
final Map<Long, IValue> outputMap = output.getDictLongKey();
assertTrue(inputMap.size() == outputMap.size());
for (Map.Entry<Long, IValue> entry : inputMap.entrySet()) {
assertTrue(outputMap.get(entry.getKey()).getLong() == entry.getValue().getLong());
}
}
@Test
public void testEqDictStrKeyIntValue() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final Map<String, IValue> inputMap = new HashMap<>();
inputMap.put("long_min_value", IValue.long64(Long.MIN_VALUE));
inputMap.put("long_max_value", IValue.long64(Long.MAX_VALUE));
inputMap.put("long_0", IValue.long64(0l));
inputMap.put("long_1", IValue.long64(1l));
inputMap.put("long_-1", IValue.long64(-1l));
final IValue input = IValue.dictStringKey(inputMap);
assertTrue(input.isDictStringKey());
final IValue output = module.runMethod("eqDictStrKeyIntValue", input);
assertTrue(output.isDictStringKey());
final Map<String, IValue> outputMap = output.getDictStringKey();
assertTrue(inputMap.size() == outputMap.size());
for (Map.Entry<String, IValue> entry : inputMap.entrySet()) {
assertTrue(outputMap.get(entry.getKey()).getLong() == entry.getValue().getLong());
}
}
@Test
public void testListIntSumReturnTuple() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (int n : new int[] {0, 1, 128}) {
long[] a = new long[n];
long sum = 0;
for (int i = 0; i < n; i++) {
a[i] = i;
sum += a[i];
}
final IValue input = IValue.longList(a);
assertTrue(input.isLongList());
final IValue output = module.runMethod("listIntSumReturnTuple", input);
assertTrue(output.isTuple());
assertTrue(2 == output.getTuple().length);
IValue output0 = output.getTuple()[0];
IValue output1 = output.getTuple()[1];
assertArrayEquals(a, output0.getLongList());
assertTrue(sum == output1.getLong());
}
}
@Test
public void testOptionalIntIsNone() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
assertFalse(module.runMethod("optionalIntIsNone", IValue.long64(1l)).getBool());
assertTrue(module.runMethod("optionalIntIsNone", IValue.optionalNull()).getBool());
}
@Test
public void testIntEq0None() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
assertTrue(module.runMethod("intEq0None", IValue.long64(0l)).isNull());
assertTrue(module.runMethod("intEq0None", IValue.long64(1l)).getLong() == 1l);
}
@Test(expected = IllegalArgumentException.class)
public void testRunUndefinedMethod() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
module.runMethod("test_undefined_method_throws_exception");
}
@Test
public void testTensorMethods() {
long[] shape = new long[] {1, 3, 224, 224};
final int numel = (int) Tensor.numel(shape);
int[] ints = new int[numel];
float[] floats = new float[numel];
byte[] bytes = new byte[numel];
for (int i = 0; i < numel; i++) {
bytes[i] = (byte) ((i % 255) - 128);
ints[i] = i;
floats[i] = i / 1000.f;
}
Tensor tensorBytes = Tensor.newInt8Tensor(shape, bytes);
assertTrue(tensorBytes.dtype() == Tensor.DTYPE_INT8);
assertArrayEquals(bytes, tensorBytes.getDataAsByteArray());
Tensor tensorInts = Tensor.newInt32Tensor(shape, ints);
assertTrue(tensorInts.dtype() == Tensor.DTYPE_INT32);
assertArrayEquals(ints, tensorInts.getDataAsIntArray());
Tensor tensorFloats = Tensor.newFloat32Tensor(shape, floats);
assertTrue(tensorFloats.dtype() == Tensor.DTYPE_FLOAT32);
float[] floatsOut = tensorFloats.getDataAsFloatArray();
assertTrue(floatsOut.length == numel);
for (int i = 0; i < numel; i++) {
assertTrue(floats[i] == floatsOut[i]);
}
}
@Test(expected = IllegalStateException.class)
public void testTensorIllegalStateOnWrongType() {
long[] shape = new long[] {1, 3, 224, 224};
final int numel = (int) Tensor.numel(shape);
float[] floats = new float[numel];
Tensor tensorFloats = Tensor.newFloat32Tensor(shape, floats);
assertTrue(tensorFloats.dtype() == Tensor.DTYPE_FLOAT32);
tensorFloats.getDataAsByteArray();
}
@Test
public void testEqString() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
String[] values =
new String[] {
"smoketest",
"проверка не латинских символов", // not latin symbols check
"#@$!@#)($*!@#$)(!@*#$"
};
for (String value : values) {
final IValue input = IValue.string(value);
assertTrue(input.isString());
assertTrue(value.equals(input.getString()));
final IValue output = module.runMethod("eqStr", input);
assertTrue(output.isString());
assertTrue(value.equals(output.getString()));
}
}
@Test
public void testStr3Concat() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
String[] values =
new String[] {
"smoketest",
"проверка не латинских символов", // not latin symbols check
"#@$!@#)($*!@#$)(!@*#$"
};
for (String value : values) {
final IValue input = IValue.string(value);
assertTrue(input.isString());
assertTrue(value.equals(input.getString()));
final IValue output = module.runMethod("str3Concat", input);
assertTrue(output.isString());
String expectedOutput = new StringBuilder().append(value).append(value).append(value).toString();
assertTrue(expectedOutput.equals(output.getString()));
}
}
private static String assetFilePath(String assetName) throws IOException {
@Override
protected String assetFilePath(String assetName) throws IOException {
final Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();
File file = new File(appContext.getFilesDir(), assetName);
if (file.exists() && file.length() > 0) {

View File

@ -0,0 +1,285 @@
package org.pytorch;
import org.junit.Before;
import org.junit.Test;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import static org.junit.Assert.assertArrayEquals;
import static org.junit.Assert.assertFalse;
import static org.junit.Assert.assertNotNull;
import static org.junit.Assert.assertTrue;
public abstract class PytorchTestBase {
private static final String TEST_MODULE_ASSET_NAME = "test.pt";
@Test
public void testForwardNull() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final IValue input =
IValue.from(Tensor.fromBlob(Tensor.allocateByteBuffer(1), new long[] {1}));
assertTrue(input.isTensor());
final IValue output = module.forward(input);
assertTrue(output.isNull());
}
@Test
public void testEqBool() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (boolean value : new boolean[] {false, true}) {
final IValue input = IValue.from(value);
assertTrue(input.isBool());
assertTrue(value == input.toBool());
final IValue output = module.runMethod("eqBool", input);
assertTrue(output.isBool());
assertTrue(value == output.toBool());
}
}
@Test
public void testEqInt() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (long value : new long[] {Long.MIN_VALUE, -1024, -1, 0, 1, 1024, Long.MAX_VALUE}) {
final IValue input = IValue.from(value);
assertTrue(input.isLong());
assertTrue(value == input.toLong());
final IValue output = module.runMethod("eqInt", input);
assertTrue(output.isLong());
assertTrue(value == output.toLong());
}
}
@Test
public void testEqFloat() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
double[] values =
new double[] {
-Double.MAX_VALUE,
Double.MAX_VALUE,
-Double.MIN_VALUE,
Double.MIN_VALUE,
-Math.exp(1.d),
-Math.sqrt(2.d),
-3.1415f,
3.1415f,
-1,
0,
1,
};
for (double value : values) {
final IValue input = IValue.from(value);
assertTrue(input.isDouble());
assertTrue(value == input.toDouble());
final IValue output = module.runMethod("eqFloat", input);
assertTrue(output.isDouble());
assertTrue(value == output.toDouble());
}
}
@Test
public void testEqTensor() throws IOException {
final long[] inputTensorShape = new long[] {1, 3, 224, 224};
final long numElements = Tensor.numel(inputTensorShape);
final float[] inputTensorData = new float[(int) numElements];
for (int i = 0; i < numElements; ++i) {
inputTensorData[i] = i;
}
final Tensor inputTensor = Tensor.fromBlob(inputTensorData, inputTensorShape);
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final IValue input = IValue.from(inputTensor);
assertTrue(input.isTensor());
assertTrue(inputTensor == input.toTensor());
final IValue output = module.runMethod("eqTensor", input);
assertTrue(output.isTensor());
final Tensor outputTensor = output.toTensor();
assertNotNull(outputTensor);
assertArrayEquals(inputTensorShape, outputTensor.shape());
float[] outputData = outputTensor.getDataAsFloatArray();
for (int i = 0; i < numElements; i++) {
assertTrue(inputTensorData[i] == outputData[i]);
}
}
@Test
public void testEqDictIntKeyIntValue() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final Map<Long, IValue> inputMap = new HashMap<>();
inputMap.put(Long.MIN_VALUE, IValue.from(-Long.MIN_VALUE));
inputMap.put(Long.MAX_VALUE, IValue.from(-Long.MAX_VALUE));
inputMap.put(0l, IValue.from(0l));
inputMap.put(1l, IValue.from(-1l));
inputMap.put(-1l, IValue.from(1l));
final IValue input = IValue.dictLongKeyFrom(inputMap);
assertTrue(input.isDictLongKey());
final IValue output = module.runMethod("eqDictIntKeyIntValue", input);
assertTrue(output.isDictLongKey());
final Map<Long, IValue> outputMap = output.toDictLongKey();
assertTrue(inputMap.size() == outputMap.size());
for (Map.Entry<Long, IValue> entry : inputMap.entrySet()) {
assertTrue(outputMap.get(entry.getKey()).toLong() == entry.getValue().toLong());
}
}
@Test
public void testEqDictStrKeyIntValue() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
final Map<String, IValue> inputMap = new HashMap<>();
inputMap.put("long_min_value", IValue.from(Long.MIN_VALUE));
inputMap.put("long_max_value", IValue.from(Long.MAX_VALUE));
inputMap.put("long_0", IValue.from(0l));
inputMap.put("long_1", IValue.from(1l));
inputMap.put("long_-1", IValue.from(-1l));
final IValue input = IValue.dictStringKeyFrom(inputMap);
assertTrue(input.isDictStringKey());
final IValue output = module.runMethod("eqDictStrKeyIntValue", input);
assertTrue(output.isDictStringKey());
final Map<String, IValue> outputMap = output.toDictStringKey();
assertTrue(inputMap.size() == outputMap.size());
for (Map.Entry<String, IValue> entry : inputMap.entrySet()) {
assertTrue(outputMap.get(entry.getKey()).toLong() == entry.getValue().toLong());
}
}
@Test
public void testListIntSumReturnTuple() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
for (int n : new int[] {0, 1, 128}) {
long[] a = new long[n];
long sum = 0;
for (int i = 0; i < n; i++) {
a[i] = i;
sum += a[i];
}
final IValue input = IValue.listFrom(a);
assertTrue(input.isLongList());
final IValue output = module.runMethod("listIntSumReturnTuple", input);
assertTrue(output.isTuple());
assertTrue(2 == output.toTuple().length);
IValue output0 = output.toTuple()[0];
IValue output1 = output.toTuple()[1];
assertArrayEquals(a, output0.toLongList());
assertTrue(sum == output1.toLong());
}
}
@Test
public void testOptionalIntIsNone() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
assertFalse(module.runMethod("optionalIntIsNone", IValue.from(1l)).toBool());
assertTrue(module.runMethod("optionalIntIsNone", IValue.optionalNull()).toBool());
}
@Test
public void testIntEq0None() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
assertTrue(module.runMethod("intEq0None", IValue.from(0l)).isNull());
assertTrue(module.runMethod("intEq0None", IValue.from(1l)).toLong() == 1l);
}
@Test(expected = IllegalArgumentException.class)
public void testRunUndefinedMethod() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
module.runMethod("test_undefined_method_throws_exception");
}
@Test
public void testTensorMethods() {
long[] shape = new long[] {1, 3, 224, 224};
final int numel = (int) Tensor.numel(shape);
int[] ints = new int[numel];
float[] floats = new float[numel];
byte[] bytes = new byte[numel];
for (int i = 0; i < numel; i++) {
bytes[i] = (byte) ((i % 255) - 128);
ints[i] = i;
floats[i] = i / 1000.f;
}
Tensor tensorBytes = Tensor.fromBlob(bytes, shape);
assertTrue(tensorBytes.dtype() == DType.INT8);
assertArrayEquals(bytes, tensorBytes.getDataAsByteArray());
Tensor tensorInts = Tensor.fromBlob(ints, shape);
assertTrue(tensorInts.dtype() == DType.INT32);
assertArrayEquals(ints, tensorInts.getDataAsIntArray());
Tensor tensorFloats = Tensor.fromBlob(floats, shape);
assertTrue(tensorFloats.dtype() == DType.FLOAT32);
float[] floatsOut = tensorFloats.getDataAsFloatArray();
assertTrue(floatsOut.length == numel);
for (int i = 0; i < numel; i++) {
assertTrue(floats[i] == floatsOut[i]);
}
}
@Test(expected = IllegalStateException.class)
public void testTensorIllegalStateOnWrongType() {
long[] shape = new long[] {1, 3, 224, 224};
final int numel = (int) Tensor.numel(shape);
float[] floats = new float[numel];
Tensor tensorFloats = Tensor.fromBlob(floats, shape);
assertTrue(tensorFloats.dtype() == DType.FLOAT32);
tensorFloats.getDataAsByteArray();
}
@Test
public void testEqString() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
String[] values =
new String[] {
"smoketest",
"проверка не латинских символов", // not latin symbols check
"#@$!@#)($*!@#$)(!@*#$"
};
for (String value : values) {
final IValue input = IValue.from(value);
assertTrue(input.isString());
assertTrue(value.equals(input.toStr()));
final IValue output = module.runMethod("eqStr", input);
assertTrue(output.isString());
assertTrue(value.equals(output.toStr()));
}
}
@Test
public void testStr3Concat() throws IOException {
final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));
String[] values =
new String[] {
"smoketest",
"проверка не латинских символов", // not latin symbols check
"#@$!@#)($*!@#$)(!@*#$"
};
for (String value : values) {
final IValue input = IValue.from(value);
assertTrue(input.isString());
assertTrue(value.equals(input.toStr()));
final IValue output = module.runMethod("str3Concat", input);
assertTrue(output.isString());
String expectedOutput = new StringBuilder().append(value).append(value).append(value).toString();
assertTrue(expectedOutput.equals(output.toStr()));
}
}
protected abstract String assetFilePath(String assetName) throws IOException;
}

View File

@ -0,0 +1,3 @@
#pragma once
/* #undef TRACE_ENABLED */

View File

@ -0,0 +1,3 @@
#pragma once
#cmakedefine TRACE_ENABLED

View File

@ -1,657 +0,0 @@
#include <cassert>
#include <iostream>
#include <memory>
#include <string>
#include <fbjni/ByteBuffer.h>
#include <fbjni/fbjni.h>
#include <torch/script.h>
namespace pytorch_jni {
constexpr static int kTensorDTypeUInt8 = 1;
constexpr static int kTensorDTypeInt8 = 2;
constexpr static int kTensorDTypeInt32 = 3;
constexpr static int kTensorDTypeFloat32 = 4;
constexpr static int kTensorDTypeInt64 = 5;
constexpr static int kTensorDTypeFloat64 = 6;
template <typename K = jobject, typename V = jobject>
struct JHashMap
: facebook::jni::JavaClass<JHashMap<K, V>, facebook::jni::JMap<K, V>> {
constexpr static auto kJavaDescriptor = "Ljava/util/HashMap;";
using Super =
facebook::jni::JavaClass<JHashMap<K, V>, facebook::jni::JMap<K, V>>;
static facebook::jni::local_ref<JHashMap<K, V>> create() {
return Super::newInstance();
}
void put(
facebook::jni::alias_ref<facebook::jni::JObject::javaobject> key,
facebook::jni::alias_ref<facebook::jni::JObject::javaobject> value) {
static auto putMethod =
Super::javaClassStatic()
->template getMethod<facebook::jni::alias_ref<
facebook::jni::JObject::javaobject>(
facebook::jni::alias_ref<facebook::jni::JObject::javaobject>,
facebook::jni::alias_ref<facebook::jni::JObject::javaobject>)>(
"put");
putMethod(Super::self(), key, value);
}
};
static at::Tensor newAtTensor(
facebook::jni::alias_ref<facebook::jni::JBuffer> jbuffer,
facebook::jni::alias_ref<jlongArray> jshape,
jint jdtype) {
const auto rank = jshape->size();
const auto shapeArr = jshape->getRegion(0, rank);
std::vector<int64_t> shapeVec{};
shapeVec.reserve(rank);
auto numel = 1;
for (auto i = 0; i < rank; ++i) {
shapeVec.push_back(shapeArr[i]);
numel *= shapeArr[i];
}
JNIEnv* jni = facebook::jni::Environment::current();
caffe2::TypeMeta typeMeta{};
int dataElementSizeBytes = 0;
if (kTensorDTypeFloat32 == jdtype) {
dataElementSizeBytes = 4;
typeMeta = caffe2::TypeMeta::Make<float>();
} else if (kTensorDTypeInt32 == jdtype) {
dataElementSizeBytes = 4;
typeMeta = caffe2::TypeMeta::Make<int32_t>();
} else if (kTensorDTypeInt8 == jdtype) {
dataElementSizeBytes = 1;
typeMeta = caffe2::TypeMeta::Make<int8_t>();
} else if (kTensorDTypeUInt8 == jdtype) {
dataElementSizeBytes = 1;
typeMeta = caffe2::TypeMeta::Make<uint8_t>();
} else if (kTensorDTypeFloat64 == jdtype) {
dataElementSizeBytes = 8;
typeMeta = caffe2::TypeMeta::Make<double>();
} else if (kTensorDTypeInt64 == jdtype) {
dataElementSizeBytes = 8;
typeMeta = caffe2::TypeMeta::Make<int64_t>();
} else {
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Unknown Tensor jdtype %d",
jdtype);
}
const auto dataCapacity = jni->GetDirectBufferCapacity(jbuffer.get());
if (dataCapacity != numel) {
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Tensor dimensions(elements number:%d, element byte size:%d, total "
"bytes:%d) inconsistent with buffer capacity(%d)",
numel,
dataElementSizeBytes,
numel * dataElementSizeBytes,
dataCapacity);
}
return torch::from_blob(
jni->GetDirectBufferAddress(jbuffer.get()),
torch::IntArrayRef(shapeVec),
at::TensorOptions(typeMeta));
}
class JTensor : public facebook::jni::JavaClass<JTensor> {
public:
constexpr static const char* kJavaDescriptor = "Lorg/pytorch/Tensor;";
static facebook::jni::local_ref<JTensor> newJTensor(
facebook::jni::alias_ref<facebook::jni::JByteBuffer> jBuffer,
facebook::jni::alias_ref<jlongArray> jShape,
jint jdtype) {
static auto jMethodNewTensor =
JTensor::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JTensor>(
facebook::jni::alias_ref<facebook::jni::JByteBuffer>,
facebook::jni::alias_ref<jlongArray>,
jint)>("nativeNewTensor");
return jMethodNewTensor(
JTensor::javaClassStatic(), jBuffer, jShape, jdtype);
}
static facebook::jni::local_ref<JTensor> newJTensorFromAtTensor(
const at::Tensor& tensor) {
const auto scalarType = tensor.scalar_type();
int jdtype = 0;
if (at::kFloat == scalarType) {
jdtype = kTensorDTypeFloat32;
} else if (at::kInt == scalarType) {
jdtype = kTensorDTypeInt32;
} else if (at::kByte == scalarType) {
jdtype = kTensorDTypeUInt8;
} else if (at::kChar == scalarType) {
jdtype = kTensorDTypeInt8;
} else if (at::kLong == scalarType) {
jdtype = kTensorDTypeInt64;
} else if (at::kDouble == scalarType) {
jdtype = kTensorDTypeFloat64;
} else {
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"at::Tensor scalar type is not supported on java side");
}
const auto& tensorShape = tensor.sizes();
std::vector<int64_t> tensorShapeVec;
for (const auto& s : tensorShape) {
tensorShapeVec.push_back(s);
}
facebook::jni::local_ref<jlongArray> jTensorShape =
facebook::jni::make_long_array(tensorShapeVec.size());
jTensorShape->setRegion(0, tensorShapeVec.size(), tensorShapeVec.data());
facebook::jni::local_ref<facebook::jni::JByteBuffer> jTensorBuffer =
facebook::jni::JByteBuffer::allocateDirect(tensor.nbytes());
jTensorBuffer->order(facebook::jni::JByteOrder::nativeOrder());
std::memcpy(
jTensorBuffer->getDirectBytes(),
tensor.storage().data(),
tensor.nbytes());
return JTensor::newJTensor(jTensorBuffer, jTensorShape, jdtype);
}
static at::Tensor newAtTensorFromJTensor(
facebook::jni::alias_ref<JTensor> jtensor) {
static const auto dtypeMethod =
JTensor::javaClassStatic()->getMethod<jint()>("dtype");
jint jdtype = dtypeMethod(jtensor);
static const auto shapeField =
JTensor::javaClassStatic()->getField<jlongArray>("shape");
auto jshape = jtensor->getFieldValue(shapeField);
static auto dataBufferMethod =
JTensor::javaClassStatic()
->getMethod<
facebook::jni::local_ref<facebook::jni::JBuffer::javaobject>()>(
"getRawDataBuffer");
facebook::jni::local_ref<facebook::jni::JBuffer> jbuffer =
dataBufferMethod(jtensor);
return newAtTensor(jbuffer, jshape, jdtype);
}
};
class JIValue : public facebook::jni::JavaClass<JIValue> {
public:
constexpr static const char* kJavaDescriptor = "Lorg/pytorch/IValue;";
constexpr static int kTypeCodeNull = 1;
constexpr static int kTypeCodeTensor = 2;
constexpr static int kTypeCodeBool = 3;
constexpr static int kTypeCodeLong = 4;
constexpr static int kTypeCodeDouble = 5;
constexpr static int kTypeCodeString = 6;
constexpr static int kTypeCodeTuple = 7;
constexpr static int kTypeCodeBoolList = 8;
constexpr static int kTypeCodeLongList = 9;
constexpr static int kTypeCodeDoubleList = 10;
constexpr static int kTypeCodeTensorList = 11;
constexpr static int kTypeCodeList = 12;
constexpr static int kTypeCodeDictStringKey = 13;
constexpr static int kTypeCodeDictLongKey = 14;
static facebook::jni::local_ref<JIValue> newJIValueFromAtIValue(
const at::IValue& ivalue) {
if (ivalue.isNone()) {
static auto jMethodOptionalNull =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>()>(
"optionalNull");
return jMethodOptionalNull(JIValue::javaClassStatic());
} else if (ivalue.isTensor()) {
static auto jMethodTensor =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::local_ref<JTensor>)>("tensor");
return jMethodTensor(
JIValue::javaClassStatic(),
JTensor::newJTensorFromAtTensor(ivalue.toTensor()));
} else if (ivalue.isBool()) {
static auto jMethodBool =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(jboolean)>(
"bool");
return jMethodBool(JIValue::javaClassStatic(), ivalue.toBool());
} else if (ivalue.isInt()) {
static auto jMethodInt =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(jlong)>(
"long64");
return jMethodInt(JIValue::javaClassStatic(), ivalue.toInt());
} else if (ivalue.isDouble()) {
static auto jMethodDouble =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(jdouble)>(
"double64");
return jMethodDouble(JIValue::javaClassStatic(), ivalue.toDouble());
} else if (ivalue.isString()) {
static auto jMethodString =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<
facebook::jni::JString::javaobject>)>("string");
return jMethodString(
JIValue::javaClassStatic(),
facebook::jni::make_jstring(ivalue.toStringRef()));
} else if (ivalue.isTuple()) {
auto elementsVec = ivalue.toTuple()->elements();
static auto jMethodTupleArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<facebook::jni::JArrayClass<
JIValue::javaobject>::javaobject>)>("tuple");
auto jElementsArray =
facebook::jni::JArrayClass<JIValue::javaobject>::newArray(
elementsVec.size());
auto index = 0;
for (const auto& e : elementsVec) {
(*jElementsArray)[index++] = JIValue::newJIValueFromAtIValue(e);
}
return jMethodTupleArr(JIValue::javaClassStatic(), jElementsArray);
} else if (ivalue.isBoolList()) {
auto list = ivalue.toBoolList();
static auto jMethodBoolListArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<jbooleanArray>)>("boolList");
size_t n = list.size();
auto jArray = facebook::jni::make_boolean_array(n);
auto jArrayPinned = jArray->pin();
auto index = 0;
for (const auto& e : list) {
jArrayPinned[index++] = e;
}
return jMethodBoolListArr(JIValue::javaClassStatic(), jArray);
} else if (ivalue.isIntList()) {
auto list = ivalue.toIntList();
static auto jMethodLongListArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<jlongArray>)>("longList");
size_t n = list.size();
auto jArray = facebook::jni::make_long_array(n);
auto jArrayPinned = jArray->pin();
auto index = 0;
for (const auto& e : list) {
jArrayPinned[index++] = e;
}
return jMethodLongListArr(JIValue::javaClassStatic(), jArray);
} else if (ivalue.isDoubleList()) {
auto list = ivalue.toDoubleList();
static auto jMethoDoubleListArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<jdoubleArray>)>("doubleList");
size_t n = list.size();
auto jArray = facebook::jni::make_double_array(n);
auto jArrayPinned = jArray->pin();
auto index = 0;
for (const auto& e : list) {
jArrayPinned[index++] = e;
}
return jMethoDoubleListArr(JIValue::javaClassStatic(), jArray);
} else if (ivalue.isTensorList()) {
auto list = ivalue.toTensorList();
static auto jMethodTensorListArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<facebook::jni::JArrayClass<
JTensor::javaobject>::javaobject>)>("tensorList");
auto jArray = facebook::jni::JArrayClass<JTensor::javaobject>::newArray(
list.size());
auto index = 0;
for (const auto& e : list) {
(*jArray)[index++] = JTensor::newJTensorFromAtTensor(e);
}
return jMethodTensorListArr(JIValue::javaClassStatic(), jArray);
} else if (ivalue.isGenericList()) {
auto list = ivalue.toGenericList();
static auto jMethodListArr =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<facebook::jni::JArrayClass<
JIValue::javaobject>::javaobject>)>("list");
auto jArray = facebook::jni::JArrayClass<JIValue::javaobject>::newArray(
list.size());
auto index = 0;
for (const auto& e : list) {
(*jArray)[index++] = JIValue::newJIValueFromAtIValue(e);
}
return jMethodListArr(JIValue::javaClassStatic(), jArray);
} else if (ivalue.isGenericDict()) {
auto dict = ivalue.toGenericDict();
const auto keyType = dict.keyType();
if (!keyType) {
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Unknown IValue-Dict key type");
}
const auto keyTypeKind = keyType->kind();
if (c10::TypeKind::StringType == keyTypeKind) {
static auto jMethodDictStringKey =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<facebook::jni::JMap<
facebook::jni::alias_ref<
facebook::jni::JString::javaobject>,
facebook::jni::alias_ref<JIValue::javaobject>>>)>(
"dictStringKey");
auto jmap = JHashMap<
facebook::jni::alias_ref<facebook::jni::JString::javaobject>,
facebook::jni::alias_ref<JIValue::javaobject>>::create();
for (auto& pair : dict) {
jmap->put(
facebook::jni::make_jstring(pair.key().toString()->string()),
JIValue::newJIValueFromAtIValue(pair.value()));
}
return jMethodDictStringKey(JIValue::javaClassStatic(), jmap);
} else if (c10::TypeKind::IntType == keyTypeKind) {
static auto jMethodDictLongKey =
JIValue::javaClassStatic()
->getStaticMethod<facebook::jni::local_ref<JIValue>(
facebook::jni::alias_ref<facebook::jni::JMap<
facebook::jni::alias_ref<
facebook::jni::JLong::javaobject>,
facebook::jni::alias_ref<JIValue::javaobject>>>)>(
"dictLongKey");
auto jmap = JHashMap<
facebook::jni::alias_ref<facebook::jni::JLong::javaobject>,
facebook::jni::alias_ref<JIValue::javaobject>>::create();
for (auto& pair : dict) {
jmap->put(
facebook::jni::JLong::valueOf(pair.key().toInt()),
JIValue::newJIValueFromAtIValue(pair.value()));
}
return jMethodDictLongKey(JIValue::javaClassStatic(), jmap);
}
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Unsupported IValue-Dict key type");
}
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Unsupported IValue type %s",
ivalue.tagKind().c_str());
}
static at::IValue JIValueToAtIValue(
facebook::jni::alias_ref<JIValue> jivalue) {
static const auto typeCodeField =
JIValue::javaClassStatic()->getField<jint>("mTypeCode");
const auto typeCode = jivalue->getFieldValue(typeCodeField);
if (JIValue::kTypeCodeNull == typeCode) {
return at::IValue{};
} else if (JIValue::kTypeCodeTensor == typeCode) {
static const auto jMethodGetTensor =
JIValue::javaClassStatic()
->getMethod<facebook::jni::alias_ref<JTensor::javaobject>()>(
"getTensor");
return JTensor::newAtTensorFromJTensor(jMethodGetTensor(jivalue));
} else if (JIValue::kTypeCodeBool == typeCode) {
static const auto jMethodGetBool =
JIValue::javaClassStatic()->getMethod<jboolean()>("getBool");
// explicit cast to bool as jboolean is defined as uint8_t, IValue ctor
// for int will be called for jboolean
bool b = jMethodGetBool(jivalue);
return at::IValue{b};
} else if (JIValue::kTypeCodeLong == typeCode) {
static const auto jMethodGetLong =
JIValue::javaClassStatic()->getMethod<jlong()>("getLong");
return at::IValue{jMethodGetLong(jivalue)};
} else if (JIValue::kTypeCodeDouble == typeCode) {
static const auto jMethodGetDouble =
JIValue::javaClassStatic()->getMethod<jdouble()>("getDouble");
return at::IValue{jMethodGetDouble(jivalue)};
} else if (JIValue::kTypeCodeString == typeCode) {
static const auto jMethodGetString =
JIValue::javaClassStatic()->getMethod<jstring()>("getString");
return at::IValue{jMethodGetString(jivalue)->toStdString()};
} else if (JIValue::kTypeCodeTuple == typeCode) {
static const auto jMethodGetTuple =
JIValue::javaClassStatic()
->getMethod<facebook::jni::JArrayClass<
JIValue::javaobject>::javaobject()>("getTuple");
auto jarray = jMethodGetTuple(jivalue);
size_t n = jarray->size();
std::vector<at::IValue> elements;
elements.reserve(n);
for (auto i = 0; i < n; ++i) {
auto jivalue_element = jarray->getElement(i);
auto element = JIValue::JIValueToAtIValue(jivalue_element);
elements.push_back(std::move(element));
}
return c10::ivalue::Tuple::create(std::move(elements));
} else if (JIValue::kTypeCodeBoolList == typeCode) {
static const auto jMethodGetBoolList =
JIValue::javaClassStatic()->getMethod<jbooleanArray()>("getBoolList");
auto jArray = jMethodGetBoolList(jivalue);
auto jArrayPinned = jArray->pin();
size_t n = jArrayPinned.size();
c10::List<bool> list{};
list.reserve(n);
for (size_t i = 0; i < n; ++i) {
list.push_back(jArrayPinned[i]);
}
return at::IValue{std::move(list)};
} else if (JIValue::kTypeCodeLongList == typeCode) {
static const auto jMethodGetLongList =
JIValue::javaClassStatic()->getMethod<jlongArray()>("getLongList");
auto jArray = jMethodGetLongList(jivalue);
auto jArrayPinned = jArray->pin();
size_t n = jArrayPinned.size();
c10::List<int64_t> list{};
list.reserve(n);
for (size_t i = 0; i < n; ++i) {
list.push_back(jArrayPinned[i]);
}
return at::IValue{std::move(list)};
} else if (JIValue::kTypeCodeDoubleList == typeCode) {
static const auto jMethodGetDoubleList =
JIValue::javaClassStatic()->getMethod<jdoubleArray()>(
"getDoubleList");
auto jArray = jMethodGetDoubleList(jivalue);
auto jArrayPinned = jArray->pin();
size_t n = jArrayPinned.size();
c10::List<double> list{};
list.reserve(n);
for (size_t i = 0; i < n; ++i) {
list.push_back(jArrayPinned[i]);
}
return at::IValue{std::move(list)};
} else if (JIValue::kTypeCodeTensorList == typeCode) {
static const auto jMethodGetTensorList =
JIValue::javaClassStatic()
->getMethod<facebook::jni::JArrayClass<
JTensor::javaobject>::javaobject()>("getTensorList");
auto jArray = jMethodGetTensorList(jivalue);
size_t n = jArray->size();
c10::List<at::Tensor> list{};
list.reserve(n);
for (size_t i = 0; i < n; ++i) {
list.push_back(JTensor::newAtTensorFromJTensor(jArray->getElement(i)));
}
return at::IValue{std::move(list)};
} else if (JIValue::kTypeCodeList == typeCode) {
static const auto jMethodGetList =
JIValue::javaClassStatic()
->getMethod<facebook::jni::JArrayClass<
JIValue::javaobject>::javaobject()>("getList");
auto jarray = jMethodGetList(jivalue);
size_t n = jarray->size();
if (n == 0) {
return at::IValue{c10::impl::GenericList(c10::TensorType::get())};
}
auto jivalue_first_element = jarray->getElement(0);
auto first_element = JIValue::JIValueToAtIValue(jivalue_first_element);
c10::TypePtr typePtr = c10::attemptToRecoverType(first_element);
c10::impl::GenericList list{typePtr};
list.reserve(n);
list.push_back(first_element);
for (auto i = 1; i < n; ++i) {
auto jivalue_element = jarray->getElement(i);
auto element = JIValue::JIValueToAtIValue(jivalue_element);
list.push_back(element);
}
return at::IValue{list};
} else if (JIValue::kTypeCodeDictStringKey == typeCode) {
static const auto jMethodGetDictStringKey =
JIValue::javaClassStatic()
->getMethod<facebook::jni::JMap<jstring, JIValue::javaobject>::
javaobject()>("getDictStringKey");
auto jmap = jMethodGetDictStringKey(jivalue);
auto it = jmap->begin();
if (it == jmap->end()) {
return at::IValue{c10::impl::GenericDict(
c10::StringType::get(), c10::TensorType::get())};
}
auto firstEntryValue = JIValue::JIValueToAtIValue(it->second);
c10::TypePtr typePtr = c10::attemptToRecoverType(firstEntryValue);
c10::impl::GenericDict dict{c10::StringType::get(), typePtr};
dict.insert(it->first->toStdString(), firstEntryValue);
it++;
for (; it != jmap->end(); it++) {
dict.insert(
it->first->toStdString(), JIValue::JIValueToAtIValue(it->second));
}
return at::IValue{dict};
} else if (JIValue::kTypeCodeDictLongKey == typeCode) {
static const auto jMethodGetDictLongKey =
JIValue::javaClassStatic()
->getMethod<facebook::jni::JMap<
facebook::jni::JLong::javaobject,
JIValue::javaobject>::javaobject()>("getDictLongKey");
auto jmap = jMethodGetDictLongKey(jivalue);
auto it = jmap->begin();
if (it == jmap->end()) {
return at::IValue{c10::impl::GenericDict(
c10::IntType::get(), c10::TensorType::get())};
}
auto firstEntryValue = JIValue::JIValueToAtIValue(it->second);
c10::TypePtr typePtr = c10::attemptToRecoverType(firstEntryValue);
c10::impl::GenericDict dict{c10::IntType::get(), typePtr};
dict.insert(it->first->longValue(), firstEntryValue);
it++;
for (; it != jmap->end(); it++) {
dict.insert(
it->first->longValue(), JIValue::JIValueToAtIValue(it->second));
}
return at::IValue{dict};
}
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Unknown IValue typeCode %d",
typeCode);
}
};
class PytorchJni : public facebook::jni::HybridClass<PytorchJni> {
private:
friend HybridBase;
torch::jit::script::Module module_;
public:
constexpr static auto kJavaDescriptor = "Lorg/pytorch/Module$NativePeer;";
static facebook::jni::local_ref<jhybriddata> initHybrid(
facebook::jni::alias_ref<jclass>,
facebook::jni::alias_ref<jstring> modelPath) {
return makeCxxInstance(modelPath);
}
PytorchJni(facebook::jni::alias_ref<jstring> modelPath) {
auto qengines = at::globalContext().supportedQEngines();
if (std::find(qengines.begin(), qengines.end(), at::QEngine::QNNPACK) !=
qengines.end()) {
at::globalContext().setQEngine(at::QEngine::QNNPACK);
}
module_ = torch::jit::load(std::move(modelPath->toStdString()));
module_.eval();
}
static void registerNatives() {
registerHybrid({
makeNativeMethod("initHybrid", PytorchJni::initHybrid),
makeNativeMethod("forward", PytorchJni::forward),
makeNativeMethod("runMethod", PytorchJni::runMethod),
});
}
facebook::jni::local_ref<JIValue> forward(
facebook::jni::alias_ref<
facebook::jni::JArrayClass<JIValue::javaobject>::javaobject>
jinputs) {
std::vector<at::IValue> inputs{};
size_t n = jinputs->size();
inputs.reserve(n);
for (size_t i = 0; i < n; i++) {
at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i));
inputs.push_back(std::move(atIValue));
}
auto output = [&]() {
torch::autograd::AutoGradMode guard(false);
at::AutoNonVariableTypeMode non_var_type_mode(true);
return module_.forward(std::move(inputs));
}();
return JIValue::newJIValueFromAtIValue(output);
}
facebook::jni::local_ref<JIValue> runMethod(
facebook::jni::alias_ref<facebook::jni::JString::javaobject> jmethodName,
facebook::jni::alias_ref<
facebook::jni::JArrayClass<JIValue::javaobject>::javaobject>
jinputs) {
std::string methodName = jmethodName->toStdString();
std::vector<at::IValue> inputs{};
size_t n = jinputs->size();
inputs.reserve(n);
for (size_t i = 0; i < n; i++) {
at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i));
inputs.push_back(std::move(atIValue));
}
if (auto method = module_.find_method(methodName)) {
auto output = [&]() {
torch::autograd::AutoGradMode guard(false);
at::AutoNonVariableTypeMode non_var_type_mode(true);
return (*method)(std::move(inputs));
}();
return JIValue::newJIValueFromAtIValue(output);
}
facebook::jni::throwNewJavaException(
facebook::jni::gJavaLangIllegalArgumentException,
"Undefined method %s",
methodName.c_str());
}
};
} // namespace pytorch_jni
JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM* vm, void*) {
return facebook::jni::initialize(
vm, [] { pytorch_jni::PytorchJni::registerNatives(); });
}

Some files were not shown because too many files have changed in this diff Show More