pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-22 14:15:01 +08:00

Author	SHA1	Message	Date
Soumith Chintala	7f73f1d591	add python 3.8 workaround	2020-01-14 09:05:04 -08:00
Tongzhou Wang	ac15471de4	clarify when to use `as_tuple` in `torch.nonzero` (#32051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31798 Differential Revision: D19272332 Pulled By: zou3519 fbshipit-source-id: 954d086a7b9f1a719e0dac303a4253bf7ec8e9f4	2020-01-14 11:07:33 -05:00
Vishwak Srinivasan	49364eb426	Fix typographical error in torch.triu docstring (#32067 ) (#32122 ) Summary: below --> above Fixes https://github.com/pytorch/pytorch/issues/32032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32067 Differential Revision: D19355788 Pulled By: zou3519 fbshipit-source-id: dc7a2538a78cd11e72d47ad923ef50599a5a87e2	2020-01-14 10:02:37 -05:00
Soumith Chintala	bcf2d65446	disable two more tests	2020-01-13 21:57:12 -08:00
Soumith Chintala	f7a33f1eef	disable a few more tests because of OSX failures similar to #30604	2020-01-13 13:21:49 -08:00
Jeremy Lilley	bd584d52df	Disable test_backward_per_tensor in test_fake_quant (#30594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30594 This testcase started breaking, clean up for the build. ghstack-source-id: 94736837 Test Plan: Unittest disabling change Differential Revision: D18758635 fbshipit-source-id: 05df1158ff0ccd75e401f352da529fb663b1cae0	2020-01-13 13:15:20 -08:00
Jeremy Lilley	c697af4667	Temporarily disable test_numerical_consistency_per_tensor (#30600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30600 test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows. ghstack-source-id: 94742124 Test Plan: CircleCI tests Differential Revision: D18760287 fbshipit-source-id: 7f59355eab74e811bb370ad2836ed2f1def1f621	2020-01-13 13:15:14 -08:00
James Reed	0f3f4ec64c	Kill hypothesis deadline testing (#30890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890 We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing Test Plan: Imported from OSS Differential Revision: D18857597 Pulled By: jamesr66a fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6	2020-01-13 13:12:14 -08:00
Richard Zou	509df600bb	Revert "Remove javasphinx extension (#31955 )" (#32059 ) This reverts commit 8ada95e95092f93780bd56bad568e2491880e9fd.	2020-01-10 14:31:35 -05:00
gchanan	187101a88e	[v1.4.0] Minimal changes in interpolate to support Keypointrcnn (#32010 ) * Fix interpolate * add keypointrcnn test * update ort versio for test * pin tv version * Update test.sh * Get rid of onnxruntime test changes. * [v1.4.0] Added torchvision tests as part of ORT tests (#31835) Summary: Added torchvision tests as part of ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835 Reviewed By: hl475 Differential Revision: D19278607 Pulled By: houseroad fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd * Remove faster_rcnn and mask_rcnn tests. Co-authored-by: Lara Haidar <haidar.lara@gmail.com> Co-authored-by: Negin Raoof <neginmr@utexas.edu>	2020-01-10 12:04:29 -05:00
Richard Zou	e011d4a16e	Restore CUDA half linspace+logspace and add coverage tests (#31959 ) This PR restores the implementation of CUDA half linspace+logspace. I added tests for the following: - linspace+logspace have the same support for integral types on CPU/CUDA - Precision tests for CUDA half, float, and double. The precision for CUDA half seems bad, but I checked the numbers against previous versions of pytorch. The output of CUDA Half linspace+logspace are exactly the same when compared with 1.2.0. Equivalent-ish PR on master: https://github.com/pytorch/pytorch/pull/31962	2020-01-09 10:42:36 -05:00
Jessica Lin	8ada95e950	Remove javasphinx extension (#31955 ) See PR [31581](https://github.com/pytorch/pytorch/pull/31581) for more details.	2020-01-08 14:09:19 -08:00
peterjc123	21c2481dfe	Fix nvcc math functions for MSVC 2019 (#31704 ) (#31816 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31108. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31704 Differential Revision: D19256110 Pulled By: mingbowan fbshipit-source-id: a4aba2830aba002497f70a75ef995e5e7de08393 (cherry picked from commit 7a3ed36309f48cb833f1690991c7b0f59da6ce11)	2020-01-08 16:30:07 -05:00
Yuxin Wu	398e8ba182	Include two caffe2 ops in v1.4.0 (#31716 ) * move AliasWithNameOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31281 Reviewed By: houseroad Differential Revision: D19053453 fbshipit-source-id: 350bfd5c001db9c17916dcae7ade8f56db1e9841 * move BatchPermutationOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31350 Reviewed By: houseroad Differential Revision: D19053527 fbshipit-source-id: 50d11f137d0f5c07e8ad899a3a84d56a042bbc32 Co-authored-by: wat3rBro <wangyanghan6@gmail.com>	2020-01-08 13:28:13 -05:00
Jessica Lin	074b30cdcb	Restructure docs organization and naming and add Javadoc (#31581 ) * Restructure docs organization and naming and add Javadoc - Rename “Other Languages” → “Language Bindings” - Move the Community section to the bottom - Move "Language Bindings" above "Python API" - Add Javadoc url in index.rst * Delete no longer needed java rst files. Remove javasphinx extension.	2020-01-08 10:22:35 -08:00
gchanan	319bd5d431	Disable flaky TestMomentumSGD.test_fp16momentum_sgd (#31369 ) (#31637 ) Summary: Related to https://github.com/pytorch/pytorch/issues/31368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31369 Co-authored-by: Vitaly Fedyunin <vitalyf@fb.com>	2019-12-26 13:20:37 -08:00
gchanan	5a20bbd377	[v1.4.0 Support optional float parameters (float?, optional<double>). (#31530 ) This is going to be used by upsample (which currently uses magic values to represent optionals). For now, we just introduce a fake function for testing (torch._test_optional_float(x)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517	2019-12-26 10:50:33 -08:00
Mengtao Yuan	fa59a9e190	Dump operator names of a script module, v1.4.0 pick request (#30747 ) * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] * Dump operator names of a script module Summary: Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack']	2019-12-26 10:49:49 -08:00
Mingbo Wan	143868c3df	cherry pick 30320 (#31573 )	2019-12-23 22:49:26 -08:00
Mingbo Wan	964929fcc2	hacky way to fix android-ndk build (#31529 ) * hacky way to fix android build * should run!!! * again!!	2019-12-20 18:01:32 -08:00
Mingbo Wan	cd20ecb472	no xla build/test for v1.4.0 (#31518 )	2019-12-20 10:43:36 -08:00
Vishwak Srinivasan	19d4fd4910	Specify ordering on singular values and eigenvalues output from torch… (#30389 ) (#30575 ) Summary: ….svd/symeig respectively Changelog: - Adds a note to docstrings of the both functions specifying the ordering Fixes https://github.com/pytorch/pytorch/issues/30301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389 Differential Revision: D18707608 Pulled By: zou3519 fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8	2019-12-19 16:10:07 -08:00
gchanan	a7d187baa4	[v1.4.0] Fix reading `__cuda_array_interface__` inferred strides, add test. (#31450 ) This is a simpler fix than https://github.com/pytorch/pytorch/pull/24947, which both fixed the bug and updated the protocol version. This also adds a test (which the previous PR did not). So the plan is that master (1.5) will have the new protocol version (and a test), 1.4 will have the old protocol version and the test.	2019-12-19 16:09:37 -08:00
Vishwak Srinivasan	0541546ac5	Fix unflatten when dim is a negative integer (#31208 ) (#31432 ) Summary: Changelog: - Wrap dim to be a positive integer when dim is negative Pull Request resolved: https://github.com/pytorch/pytorch/pull/31208 Test Plan: - Updated tests in test_namedtensor.py Fixes https://github.com/pytorch/pytorch/issues/31184 Differential Revision: D19036569 Pulled By: zou3519 fbshipit-source-id: 86e01e20988dee7c4b6c73232f66282d687f9a2c	2019-12-19 16:09:28 -08:00
Gao, Xiang	369ab73efd	Fix copy kernel speed regression introduced in #29631 (#31279 ) (#31322 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31271 This fixes copy kernel speed regression introduced in https://github.com/pytorch/pytorch/issues/29631. The previous implementation forces the compiler to instantiate `static_cast_with_inter_type` because it is passed as an argument of a function. This behavior makes it impossible for compilers to do optimizations like automatic vectorization, and, function call itself is expensive compared to a single casting instruction. To check the change, run ``` readelf -Ws /home/xgao/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so \| grep static_cast_with_inter_type ``` On nightly build, we have output ``` 168217: 0000000001852bf0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsdE5applyEd 168816: 0000000001852d30 33 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEaE5applyEa 168843: 00000000018531f0 7 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIblE5applyEl 168930: 0000000001852c20 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIslE5applyEl 168935: 00000000018528d0 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_4HalfEE5applyES1_ 169023: 0000000001852f30 17 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEhE5applyEh 169713: 00000000018525c0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIahE5applyEh 170033: 0000000001852c10 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsiE5applyEi 170105: 0000000001852bd0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIshE5applyEh 170980: 0000000001852fc0 27 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdES1_IfEE5applyES3_ 171398: 0000000001852810 13 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdbE5applyEb 171574: 00000000018532e0 35 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbNS_8BFloat16EE5applyES1_ 171734: 0000000001852b20 6 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlSt7complexIdEE5applyES2_ 172422: 0000000001853350 54 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EaE5applyEa 172704: 00000000018533c0 38 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EfE5applyEf 172976: 0000000001852890 10 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIflE5applyEl 173038: 0000000001852f80 9 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEfE5applyEf 173329: 00000000018531c0 20 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbfE5applyEf 173779: 00000000018524d0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIhiE5applyEi 174032: 0000000001852960 14 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_8BFloat16EE5applyES1_ 174334: 0000000001852d60 29 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEdE5applyEd 174470: 0000000001852c60 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsNS_4HalfEE5applyES1_ 174770: 0000000001852bc0 15 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlNS_8BFloat16EE5applyES1_ 176408: 0000000001853980 144 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_4HalfEbE5applyEb 176475: 0000000001852790 128 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdNS_4HalfEE5applyES1_ .... ``` And after this PR, we get empty output ``` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31279 Differential Revision: D19075587 Pulled By: ngimel fbshipit-source-id: c20088241f39fa40c1d055f0a46eb5b9ece52e71	2019-12-19 16:09:11 -08:00
Nick Korovaiko	9f558e1ee6	turn off profiling graph exec (#30750 )	2019-12-19 16:08:59 -08:00
David Reiss	f0ddfff200	Fix exception message in Java Tensor (#30776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30205 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D18653568 Pulled By: dreiss fbshipit-source-id: a5fcb809eba641a7fbd0e99e835eceeb248e680c	2019-12-19 16:08:49 -08:00
Joseph Spisak	2de184b5a9	Update persons_of_interest.rst (#30648 )	2019-12-19 16:08:39 -08:00
Brian Vaughan	e0eeddfc78	torch.where changes made on 1.3.1 but not on master (#30729 ) * Make zeros argument of torch.where same dtype as other argument (#30661) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30661 Cherry-picked from https://github.com/pytorch/pytorch/pull/29080 Test Plan: Imported from OSS Differential Revision: D18781870 Pulled By: nairbv fbshipit-source-id: 9de85aa91bf7e0856f35c7c6238a8923315ed27f Co-authored-by: ifedan * Added check for torch.where on CPU that both arguments have same dtype (#30662) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662 Cherry picked from: https://github.com/pytorch/pytorch/pull/29081 Test Plan: Imported from OSS Differential Revision: D18782295 Pulled By: nairbv fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04 Co-authored-by: ifedan	2019-12-19 16:01:51 -08:00
James Reed	7727b57d08	[v1.4.0 cherrypick] Fix BC for quantized linear (#30629 )	2019-12-19 16:01:26 -08:00
Chris Gottbrath	9e7dc37f90	Updates to Quantization documentation (#30372 ) * added entires to quantization.rst per issue #27938 * more minor tweaks to quantization.rst to reflect the quantization support list (#27938) * added discussion about setting backend engine to QNNPACK to quantization.rst (#29735) * added docstrings to the fused functions in torch/nn/intrinsic/modules/fused.py (#26899) * fixed the docstring for torch.nn.intrinsic.quantized.ConvReLU3d (#27451) * fixed the formatting on fuse_modules() (#26305) * fixed rendering issue on QConfig (#30283) * resolved feedback on PR #30288. Thanks Raghu	2019-12-19 16:01:09 -08:00
Lu Fang	227017059f	Fix BC test for v1.4.0 (#31442 ) * Fix BC test for v1.4.0 * Print out all the broken ops * White list the broken ones	2019-12-19 14:16:24 -08:00
Lu Fang	aeeccc1486	Disable the rebase logic to make the CI pass (#31399 )	2019-12-18 12:21:13 -08:00
gchanan	0b91246cbd	[v1.4.0] Fix coverage and hypothesis conflict (#31429 ) Summary: Temporarily enforcing versions for all envs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31320 Differential Revision: D19122781 Pulled By: VitalyFedyunin fbshipit-source-id: fe6473b177367371387d4b3b873131e7ecfbc0f8	2019-12-18 12:16:05 -08:00
Mingbo Wan	0856d6f53c	use earlier docker image to make sure generated binary size is small (#31142 ) * use earlier docker image to make sure generated binary size is small * fix hypothesis version	2019-12-17 15:03:29 -08:00
Mingbo Wan	336e0d2874	our setup requires actions/checkout@v1 to work correctly (#31371 ) * checkout correct branch for linting * try #2 * try #3 * try #4	2019-12-17 10:56:50 -08:00
Lara	3b36f2068d	Revert "Merge branch 'v1.4.0' of https://github.com/pytorch/pytorch into lahaidar/cherry_pick_28324" This reverts commit 6207945564b317f4300264e80d125b9a7225b81e, reversing changes made to 27a2ecb0a5da9507a2b0a0315a7dfeab4b9f85f9.	2019-12-13 16:20:28 -08:00
Lara	6207945564	Merge branch 'v1.4.0' of https://github.com/pytorch/pytorch into lahaidar/cherry_pick_28324	2019-12-13 15:48:08 -08:00
Lara	aecae514ab	Merge branch 'cherry_pick_28324' of https://github.com/houseroad/pytorch into lahaidar/cherry_pick_28324	2019-12-13 15:45:32 -08:00
Lu Fang	27a2ecb0a5	Revert "[v1.4.0 cherrypick] ONNX Interpolate Add Scales Params (#31170 )" (#31272 ) This reverts commit e36fd7b0bae7b350886bf090f7ce222a0c6218df.	2019-12-13 15:14:42 -08:00
Lu Fang	e36fd7b0ba	[v1.4.0 cherrypick] ONNX Interpolate Add Scales Params (#31170 ) The original PR is #28324 We hope we can cover torchvision models in PyTorch ONNX exporter with release 1.4. This PR is part of it.	2019-12-13 15:08:35 -08:00
Lara	799cb646a6	update expect files	2019-12-13 11:36:06 -08:00
Lara	f60c63155a	ONNX Interpolate Add Scales Params (#28324 ) Summary: Fix for : https://github.com/pytorch/pytorch/issues/27176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324 Reviewed By: hl475 Differential Revision: D18309133 Pulled By: houseroad fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf	2019-12-13 11:36:06 -08:00
Mingbo Wan	954d9ea466	fix test ci by pinning hypothesis and correcting the import (#31201 ) * fix test ci by pinning hypothesis and correcting the import, from https://github.com/pytorch/pytorch/pull/31137 * also update for windows build	2019-12-13 11:30:57 -08:00
Lara	71185fb2a0	update expect files	2019-12-12 10:54:17 -08:00
Will Feng	a06f26560c	Make Conv{1,2,3}dOptions and ConvTranspose{1,2,3}dOptions different classes (#31005 ) Summary: Currently, both `Conv{1,2,3}dOptions` and `ConvTranspose{1,2,3}dOptions` are aliases of the `ConvOptions<{1,2,3}>` class, which causes confusion because the `ConvOptions` class has parameters such as `transposed` that shouldn't be exposed to the end user. (This has caused issues such as https://github.com/pytorch/pytorch/issues/30931.) This PR makes the following improvements: 1. Rename the original `torch::nn::ConvOptions<N>` class to `torch::nn::detail::ConvNdOptions<N>` class, to signify that it's an implementation detail and should not be used publicly. 2. Create new classes `torch::nn::ConvOptions<N>` and `torch::nn::ConvTransposeOptions<N>`, which have parameters that exactly match the constructor of `torch.nn.Conv{1,2,3}d` and `torch.nn.ConvTranspose{1,2,3}d` in Python API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31005 Differential Revision: D18898048 Pulled By: yf225 fbshipit-source-id: 7663d646304c8cb004ca7f4aa4e70d3612c7bc75	2019-12-12 11:46:33 -05:00
Lara	e4cec279c6	ONNX Interpolate Add Scales Params (#28324 ) Summary: Fix for : https://github.com/pytorch/pytorch/issues/27176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324 Reviewed By: hl475 Differential Revision: D18309133 Pulled By: houseroad fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf	2019-12-11 22:05:47 -08:00
Sebastian Messmer	b8b50aa909	Fix missing virtual destructor (#30927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927 Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen ghstack-source-id: 95144736 Test Plan: waitforsandcastle Differential Revision: D18870351 fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82	2019-12-09 12:47:01 -08:00
Tongzhou Wang	db686de13f	[1.4.0] Enable len(dataloader) for iterable dataset (#30828 ) * enable len(dl) for iterable dataset * warn if len was called	2019-12-06 18:25:14 -05:00
Sebastian Meßmer	288e463693	Fix a clang 7 compiler bug for c++14 mode (#30891 ) This is already fixed in master as part of `bc2e6d10fa`. Before this fix, compiling PyTorch with `-std=c++14` failed on clang 7 due to a compiler bug in the optimizer. With this fix, it works and people can compile PyTorch (or PyTorch extensions) with `-std=c++14`.	2019-12-06 14:11:12 -05:00
Joseph Spisak	73783d1048	Update persons_of_interest.rst	2019-12-05 21:27:01 -08:00
ngimel	8891d4eeb1	fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_overflow_large to working state (#30793 )	2019-12-04 23:13:17 -05:00
Rohan Varma	2085a6f329	Add local shutdown to process group agent (#30330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately. ghstack-source-id: 94673884 ghstack-source-id: 94673884 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18661775 fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2	2019-12-04 19:23:58 -08:00
Shen Li	3eda9e7da2	By default ignore RRef leaks during shutdown (#30217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217 Before this commit, RRefContext throws an error if it detects any RRef leak during shutdown. However, this requires applications to make sure that is has freed all references to RRefs in application code, which can be a bad debugging experience when for large applications. Besides, this also relies on Python GC to free things up in time, which might not always be true. After this commit, RRefContext would ignore leaking RRefs during shutdown, as shutdown is called when the application has finished training and no longer care about local states. Hence, it should be OK to just ignore those leaks and destroy OwnerRRefs. If application would like to enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak to False. Test Plan: Imported from OSS Differential Revision: D18632546 Pulled By: mrshenli fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38	2019-12-04 13:33:31 -05:00
Will Feng	fb8aa0e98c	Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684 ) Summary: This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to. Fixes https://github.com/pytorch/pytorch/issues/30682. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684 Differential Revision: D18795717 Pulled By: yf225 fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad	2019-12-04 11:35:19 -05:00
Rohan Varma	c79b79dadd	add default arg for init_method (#30208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208 Adds default arg for init_method so users don't have to pass this in, and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs. ghstack-source-id: 94500475 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18630074 fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a	2019-12-03 15:26:51 -08:00
Will Feng	21acca4528	Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626 ) Summary: PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements: 1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior. 2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/recurse=/false)` / `named_buffers(/recurse=/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626 Differential Revision: D18777507 Pulled By: yf225 fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b	2019-12-03 15:57:32 -05:00
Will Feng	f710757557	Skip undefined tensors when moving torch::nn module to a different device (#30523 ) Summary: This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523 Differential Revision: D18732904 Pulled By: yf225 fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d	2019-12-03 15:57:32 -05:00
Brian Vaughan	a5272cb643	Error instead of assertion failure for div by sparse (#30260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30260 fixes: https://github.com/pytorch/pytorch/issues/30044 Without this PR, ``` >>> torch.tensor(1.) / torch.tensor(1.).to_sparse() Traceback (most recent call last): File "<stdin>", line 1, in <module> RuntimeError: r.is_sparse() INTERNAL ASSERT FAILED at /Users/distiller/project/conda/conda-bld/pytorch_1570710797334/work/aten/src/ATen/native/sparse/SparseTensorMath.cpp:168, please report a bug to PyTorch. ``` Test Plan: Ran the same code with this change: ``` In [1]: import torch In [2]: torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse() --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) <ipython-input-2-7177f54f30bb> in <module> ----> 1 torch.tensor(1).to_sparse() / torch.tensor(1).to_sparse() RuntimeError: Unsupported tensor layout ``` Differential Revision: D18657387 Pulled By: nairbv fbshipit-source-id: cd23570d46f5b26fd84049e5e63b61b19835603d	2019-11-22 11:31:26 -08:00
Tao Xu	638f4c1fb3	Update Cocoapods to 1.4.0 (#30326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30326 Note that this PR won't trigger the cocoapods build. We'll push the binary and release the cocoapods after the branch cut. Test Plan: Imported from OSS Differential Revision: D18660308 Pulled By: xta0 fbshipit-source-id: 95dd97b7b67e70ecee3a65d8bbc125791872b7ca	2019-11-22 11:31:21 -08:00
James Reed	97fae401f0	Use LinearPackedParams everywhere Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30198 Test Plan: Imported from OSS Differential Revision: D18628003 Pulled By: jamesr66a fbshipit-source-id: 76ff0248fd859e805a15cde555d26dd2138636fa	2019-11-22 11:31:17 -08:00
James Reed	1cc321deed	Memoize parseIR calls in graph mode quantization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30188 Test Plan: Imported from OSS Differential Revision: D18625743 Pulled By: jamesr66a fbshipit-source-id: 88f9da8e79324ba91e3550a8fc1a05e85bb83a86	2019-11-22 11:31:13 -08:00
James Reed	65f465050b	Dont use SubgraphRewriter in FoldQuantizeCallIntoBuffer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30264 Test Plan: Imported from OSS Differential Revision: D18645531 Pulled By: jamesr66a fbshipit-source-id: 44fc0f0a3c8cabe62924baae0d556e43bbf637ec	2019-11-22 11:31:08 -08:00
Shen Li	a9f3f48f88	Revert D5578006: Add local shutdown to process group agent Test Plan: revert-hammer Differential Revision: D5578006 Original commit changeset: 6258879fb44c fbshipit-source-id: 11b893b3a280a8383eeb20a0548626811616dca1	2019-11-22 11:31:04 -08:00
Tao Xu	fa242246ee	add unit tests to iOS CI jobs (#30133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30133 ### Summary Recently we've found that the master branch was constantly broken due to some unwanted change being landed on mobile. The problem is that our CI was not able to detect the runtime errors. ### Previous work - Add an unit test target to the iOS TestApp ( #29962 ) - Update Fastlane to run tests ( #29963 ) ### What's been changed in CI 1. XCode version has been updated to 11.2.1 2. For iOS simulator build, we'll run some unit tests( currently only one) after the build test. Test Plan: Imported from OSS Differential Revision: D18641413 Pulled By: xta0 fbshipit-source-id: 12942206f1dee045b2addba3ae618760e992752c	2019-11-22 10:52:11 -08:00
Christian Puhrsch	7903fb118f	Move qkv_same, kv_same into branch (#30142 ) Summary: Perf improvements to multi_head_attention_forward - qkv_same and kv_same were not used outside of that branch. Further, kv_same was calculated even though it is not used if qkv_same Pull Request resolved: https://github.com/pytorch/pytorch/pull/30142 Differential Revision: D18610938 Pulled By: cpuhrsch fbshipit-source-id: 19b7456f20aef90032b0f42d7da8c8a2d5563ee3	2019-11-22 10:40:02 -08:00
Xintao Chen	5d7b2089e8	Draft version: Make AliasAnalysisKind optional in Op Registration API (#30187 ) Summary: Don't look into deep into the diff's implementation. The reason to send out this diff is to help sync on the design first. Once we agree on the design, I will update the implementation accordingly. Here is the basic design for achieving this functionality: Q1: Do we need to tell apart case between the following: case 1: registry 1: PURE -> registry 2: CONSERVATIVE case 2: registry 1: PURE -> registry 2: <not set> A: should be yes though, right now both cases have same value(due to defaulting to CONSERVATIVE) in operators_ and operatorLookupTable_. case 1 should be denied while case 2 should be legal case where registry 1 will be PURE at the end. How to tell apart both cases: Right now, AliasAnalysisKind::CONSERVATIVE is by default (code pointer: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/core/dispatch/OperatorOptions.h?lines=22%2C52) Current approach: Introducing a boolean flag in OperatorOptions called isDefault, defaulting to value true. When manually call setAliasAnalysis(AliasAnalysisKind), it will be set too false. And then when findSchema() in Dispatcher.cpp, we will check response's option's isDefault value. If isDefault = true, then with some sanity check and if all checks passed, we can update the option info in both operators_ and operatorLookupTable_ Other approaches: 1. Introducing a new AliasAnalaysisKind maybe called NOT_SPECIFIED. (I am not doing it this way since then I need to update other callosities related to AliasAnalaysisKind::CONSERVATIVE) Also, we will need to have additional logics to align between NOT_SPECIFIED and CONSERVATIVE What data to be updated: corresponding entry in std::list<OperatorDef> operators_ and LeftRight<ska::flat_hash_map<OperatorName, OperatorHandle>> operatorLookupTable_ (More things to be discussed here.) Do we need to trigger listeners if an entry get updated: I think no. callOnOperatorRegistered(op) seems only to be using OperatorHandle.schema now from the only callsite from register_c10_ops.cpp (code pointers: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/aten/src/ATen/core/dispatch/Dispatcher.cpp?commit=b4cefeaa98dca5b1ec5f7a0bca6028e368960244&lines=87-90 and https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/register_c10_ops.cpp?lines=178&link_ref=biggrep) However, things can be much more complicated if future extensions may use options when some listeners want to use options value to register operators. Future reading list + remaining questions: 1. How options get consumed on the other side. 2. Usages for fields in OperatorEntry besides schema/options/kernals Pull Request resolved: https://github.com/pytorch/pytorch/pull/30187 Test Plan: [xintchen@devvm6308.prn2 ~/fbsource/fbcode] buck test mode/dev //caffe2:ATen-core-test All tests passed Differential Revision: D18530964 Pulled By: charliechen0401 fbshipit-source-id: 60c0560a63a36e54f09f397667bb7122b61d6a8e	2019-11-22 10:20:41 -08:00
Rohan Varma	c478a92b93	Add local shutdown to process group agent (#30020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30020 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The destructor calls this same `localShutdown` method, but we ensure this is not called multiple times. ghstack-source-id: 94415336 Test Plan: Unit tests pass. Differential Revision: D5578006 fbshipit-source-id: 6258879fb44c9fca97fdfad64468c1488c16ac02	2019-11-22 10:03:00 -08:00
Martin Yuan	559b3b5a7a	Use unboxed registration for most of operators used in lite interpreter. (#30239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30239 Use unboxed registration per smessmer 's request. For some ops with optional arg or tensor list that unboxed registration are not supported, we still use boxed. Test Plan: Imported from OSS Differential Revision: D18653846 Pulled By: iseeyuan fbshipit-source-id: c22ce8111dfff0ba63316a9bcfe2b712b2d31fc1	2019-11-22 10:00:30 -08:00
Rohan Varma	f41422121e	default construct rpc agent options based on the backend type (#30201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30201 Provide a default constructor so that users don't have to construct RPC agent options. Also rename this to RPCBackend Options as suggested. ghstack-source-id: 94411768 Test Plan: Unit tests pass. Differential Revision: D18628698 fbshipit-source-id: 81fb45f124ad1006e628f6045162308093c9d446	2019-11-22 08:18:06 -08:00
Hong Xu	3455231e9c	Expose configuration of Numa directories to setup.py (#30104 ) Summary: https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30104 Differential Revision: D18656882 Pulled By: ezyang fbshipit-source-id: f932a98674033f1a3184dc1c22faa6f8c2b50134	2019-11-22 07:07:39 -08:00
Gerard Goossen	faacbfa8bf	Migrate index_add cpu from TH to ATen (#28421 ) Summary: Migrate index_add cpu from TH to ATen. I couldn't find replacement for get1d and set1d, so doing pointer arithmetic inplace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28421 Test Plan: existing tests Differential Revision: D18060971 Pulled By: ggoossen fbshipit-source-id: 413719990cdb2fe578964cde14e93577e48a4342	2019-11-22 06:25:13 -08:00
Luke Yeager	183aa1534f	Add --no_python flag (#29144 ) Summary: Allows you to use a bash script wrapper in-between launch and your training script. e.g. ``` python -m torch.distributed.launch --nproc_per_node=8 --no_python --use_env \ bash -c 'exec numactl --cpunodebind=$(( LOCAL_RANK / 4 )) "$@"' -- \ python train.py ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29144 Differential Revision: D18345647 Pulled By: pietern fbshipit-source-id: f05849c38c82de782988d07d300e00cf9f37253a	2019-11-22 06:05:41 -08:00
Pieter Noordhuis	29887f813a	Remove unused forward declaration (#30154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30154 This doesn't seem to be used in thread_pool.cpp. ghstack-source-id: 94264158 Test Plan: Let's see if this compiles. Differential Revision: D18614141 fbshipit-source-id: c6ff3db56b55fcee7d8123d909ee275690163ece	2019-11-22 05:24:53 -08:00
Pieter Noordhuis	a074080d57	Mark `c10d::~NCCLUtils` as noexcept (#29118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29118 It's never a good idea to throw from a destructor and per #28288 we can't use `std::make_shared` on a class with a `noexcept(false)` destructor. To fix this, we `abort` instead of throw from the `NCCLComm` destructor. Closes #28288. ghstack-source-id: 93182910 Test Plan: ProcessGroupNCCLErrorsTest runs successfully. Reviewed By: pritamdamania87 Differential Revision: D18298271 fbshipit-source-id: ccac37753fef64fb63cb304433f4f97dc5621379	2019-11-22 04:06:12 -08:00
Alexander Melnikov	95b451d386	fixing test_tensorboard for py2 (#30298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30298 This diff fixes test_tensorboard for python2: - proto serialization is different in py2 vs py3 (e.g. for bytes) -> simple string comparison will fail for test_pytorch_graph. Modified to make graph comparison field by field Reviewed By: J0Nreynolds Differential Revision: D18654691 fbshipit-source-id: fdbca32e9a7fc2ea70a040bb825eab8a48d0dfe4	2019-11-22 01:02:07 -08:00
Jiakai Liu	f5ef3a6fb6	disable JIT optimizer in Android wrapper for mobile custom build (#30285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30285 PR #30144 introduced custom build script to tailor build to specific models. It requires a list of all potentially used ops at build time. Some JIT optimization passes can transform the IR by replacing operators, e.g. decompose pass can replace aten::addmm with aten::mm if coefficients are 1s. Disabling optimization pass can ensure that the list of ops we dump from the model is the list of ops that are needed. Test Plan: - rerun the test on PR #30144 to verify the raw list without aten::mm works. Differential Revision: D18652777 Pulled By: ljk53 fbshipit-source-id: 084751cb9a9ee16d8df7e743e9e5782ffd8bc4e3	2019-11-22 00:25:04 -08:00
Jiakai Liu	1690feba9f	add mobile build CI with host toolchain (#30292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30292 We already have CI jobs to build Android/iOS libraries, but there are two issues: - It's no easy for people who are not regularly working on mobile to debug these CI errors as they need setup Android/iOS build environment; - It's hard to run cross-compiled mobile libraries as it requires emulator. It happened a couple times recently that it can build but fail to load and run a model with mobile build. To address these problems, create this new CI job to build mobile library with linux host toolchain so that we can build & test without involving Android/iOS environment/simulator. Will add tests on top of it next. Test Plan: - check the new CI job Differential Revision: D18654074 Pulled By: ljk53 fbshipit-source-id: eb1baee97a7b52c44979dbf1719c3357e08f895e	2019-11-22 00:02:27 -08:00
Johannes M Dieterich	48b943960e	Add bfloat16 support in linear algebra on ROCm (#27719 ) Summary: This adds support for gemm-style matrix multiplications with data and output in bf16 to PyTorch on ROCm to the backend (i.e., bgemm). Enable operators depending on bgemm. With this change, bf16 matrices on ROCm can be multiplied on the GPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27719 Differential Revision: D18653514 Pulled By: bddppq fbshipit-source-id: 805db923579bec6fc8fd1c51eeb5b1ef85a96758	2019-11-21 23:54:03 -08:00
Natalia Lunova	23650671a8	add_hparams() NoneType error (#30286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30286 add_hparams() in torch.utils.tensorboard.writer produced the following error python3.7/site-packages/torch/utils/tensorboard/writer.py", line 294, in add_hparams with SummaryWriter(log_dir=os.path.join(self.file_writer.get_logdir(), str(time.time()))) as w_hp: AttributeError: 'NoneType' object has no attribute 'get_logdir' Other methods such as add_scalar() and add_histogram() use self._get_file_writer() instead of self.file_writer directly. Test Plan: ``` writer = summary_writer() writer.add_hparams({"a": 0, "b": 0}, {"hparam/test_accuracy": 0.5})) writer.flush() writer.close() ``` Reviewed By: J0Nreynolds, sanekmelnikov Differential Revision: D18650610 fbshipit-source-id: 1039dd2067d37913a8a131c8b372491a63154899	2019-11-21 23:25:26 -08:00
Natalia Gimelshein	5e19460ced	cache tensor scalar_type in OperandInfo (#30065 ) Summary: Caches result of `scalar_type` call in TensorIterator and TensorOptions, because the call is expensive. This removes 120 - 150 ns of overhead (from 1.25 us to 1.12 us for out-of-place case, from 0.86 us to 0.73 us for inplace case) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30065 Test Plan: Covered by existing tests Differential Revision: D18576236 Pulled By: ngimel fbshipit-source-id: 17f63851a911fc572c2146f8a520b7f0dadfd14a	2019-11-21 23:25:22 -08:00
Tao Xu	73c9e6e6b6	Rename function parameters to avoid [-Werror,-Wshadow] (#30276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30276 ### Summary When building PyTorch for iOS in BUCK, the compiler complains about the ivar shadowing ``` /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h:184:144: error: declaration shadows a field of 'c10::Dispatcher' [-Werror,-Wshadow] inline Return Dispatcher::doCallUnboxed(const DispatchTable& dispatchTable, const LeftRight<ska::flat_hash_map<TensorTypeId, KernelFunction>>& backendFallbackKernels_, Args... args) const { ^ /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h:134:63: note: previous declaration is here LeftRight<ska::flat_hash_map<TensorTypeId, KernelFunction>> backendFallbackKernels_; ``` This happens because the internal iOS compiler enforces the `[-Werror, -Wshadow]` on every source file when compiling. Say in `benchmark.mm` we import `<torch/script.h>`, then it'll leads all the way to `Dispatcher.h` ``` In file included from Apps/Internal/PyTorchPlayground/PyTorchPlayground/Application/Benchmark/Benchmark.mm:6: In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/ATen.h:5: In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/Context.h:4: In file included from /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/Tensor.h:12: In file included from buck-out/cells/fbsource/gen/xplat/caffe2/TensorMethods.h/TensorMethods.h:10: /Users/taox/fbsource/xplat/caffe2/aten/src/ATen/core/dispatch/Dispatcher.h ``` It'd be better to have a separate name for function parameters. cc shoumikhin Test Plan: Imported from OSS Differential Revision: D18649116 Pulled By: xta0 fbshipit-source-id: 19f50b7a23c11dedcafc2ac2d85b45ae4999be2f	2019-11-21 21:59:41 -08:00
neginraoof	a822a1d2a8	Avoid overwriting output type in onnx graph (#25906 ) Summary: When creating the onnx graph, we overwrite the output type with the output type of the PT graph. In some special cases, when using scripting, the PT graph does not have type information. We want to avoid overwriting the input type is these cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25906 Reviewed By: hl475 Differential Revision: D18645903 Pulled By: houseroad fbshipit-source-id: 56acc43e0c15c74ac8ebd689e04f7371054e362e	2019-11-21 21:30:12 -08:00
David Reiss	30874b31a9	Enable JNI build on Mac host (#30207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30207 This should work now that we're not using gold-specific linker flags. Test Plan: CI Differential Revision: D18653521 Pulled By: dreiss fbshipit-source-id: 31c3cdbefc37b87bfb4140ffbac781131fe72ab3	2019-11-21 20:10:10 -08:00
David Reiss	e5fc86130a	Remove unnecessary linker flags from JNI host build (#30206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30206 - --whole-archive isn't needed because we link libtorch as a dynamic dependency, rather than static. - --gc-sections isn't necessary because most (all?) of the code in our JNI library is used (and we're not staticly linking libtorch). Removing this one is useful because it's not supported by lld. Test Plan: Built on Linux. Library size was unchanged. Upcoming diff enables Mac JNI build. Differential Revision: D18653500 Pulled By: dreiss fbshipit-source-id: 49ce46fb86a775186f803ada50445b4b2acb54a8	2019-11-21 20:10:06 -08:00
Shen Li	4609c626c5	Enable test_call_method_on_rref in rpc_test (#30261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30261 With #29827, the flakiness should disappear for test_call_method_on_rref Test Plan: Imported from OSS Differential Revision: D18645036 Pulled By: mrshenli fbshipit-source-id: 44d759062fc78b1a797266096dbb4ddd104f07eb	2019-11-21 19:38:19 -08:00
Shen Li	aa1e99e983	Fix two links in RPC API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30259 Test Plan: Imported from OSS Differential Revision: D18644749 Pulled By: mrshenli fbshipit-source-id: ff515d2588cd59e0d87f020a01885156a6644450	2019-11-21 19:32:22 -08:00
Jiakai Liu	168570b0da	move module_save.cpp to non-mobile build section in cmake (#30221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30221 PR #29881 moved Module::save() methods to a separate source file and removed C10_MOBILE gating logic. Seems it should stay with export_module.cpp (which is in "NOT INTERN_BUILD_MOBILE" section). Otherwise it causes link error with build_mobile.sh. Test: - build locally - check CI Test Plan: Imported from OSS Differential Revision: D18649234 Pulled By: ljk53 fbshipit-source-id: b6c90a532d191c41ce10c1047a869d8f73854c4d	2019-11-21 18:56:34 -08:00
Jonathan Reynolds	0c04763d59	Changes to get inlined graph and proper names after JIT updates (#30244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30244 This makes several small changes to the tensorboard graph parsing methods to address the recent changes to the PyTorch JIT trace/graph. - Inline graph to get information for all nodes - Assign and propagate scope names to GetAttr nodes - Prune all useless GetAttr nodes (any with a ClassType output type - tensors and primitives are kept) - Create output nodes so output tensor shape can be examined Reviewed By: sanekmelnikov Differential Revision: D18556323 fbshipit-source-id: b73a809bacfa554c3fe9c4ae3563525f57539874	2019-11-21 16:59:28 -08:00
Lara Haidar	983728489a	Add ONNX Tests for Torchvision Models (#30121 ) Summary: Adding tests for exporting Torchvision models to ONNX and testing them against ORT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30121 Reviewed By: hl475 Differential Revision: D18619563 Pulled By: houseroad fbshipit-source-id: 4f78f6876337b941d62efbf5c753c52f6c877d3c	2019-11-21 16:53:59 -08:00
Shen Li	fea963d3ae	Fix BackendType repr in doc (#30243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30243 Before this commit, rpc docs shows init_rpc as the following: ``` torch.distributed.rpc.init_rpc( name, backend=<BackendType.PROCESS_GROUP: BackendValue( construct_rpc_agent_options_handler=<function _process_group_construct_rpc_agent_options_handler>, init_backend_handler=<function _process_group_init_backend_handler>)>, init_method=None, rank=-1, world_size=None, rpc_agent_options=None ) ``` It unnecessarily leaks implementation details. This commit adds a __repr__ function to BackendType Enum class to address this problem. closes #29905 Test Plan: Imported from OSS Differential Revision: D18641559 Pulled By: mrshenli fbshipit-source-id: 19bf8a2d21c8207f026d097d8e3f077578d53106	2019-11-21 16:22:43 -08:00
Shen Li	063e22b7c2	Fix RRef design doc warning (#30240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30240 Get rid of the following warning when build docs: ``` /Users/shenli/Project/pytorch/docs/source/notes/rref.rst:184: WARNING: Error in "code" directive: maximum 1 argument(s) allowed, 6 supplied. .. code:: import torch import torch.distributed.rpc as rpc # on worker A rref = rpc.remote('B', torch.add, args=(torch.ones(2), 1)) # say the rref has RRefId 100 and ForkId 1 rref.to_here() ``` Test Plan: Imported from OSS Differential Revision: D18640016 Pulled By: mrshenli fbshipit-source-id: d527827f01183411d4b4c73e0a976bdd7fccbf49	2019-11-21 16:22:39 -08:00
Shen Li	e0325011e4	Add link to RRef protocol in RPC doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30218 Test Plan: Imported from OSS Differential Revision: D18638881 Pulled By: mrshenli fbshipit-source-id: ca6fae6f8cea8cdcc33d275dd71a347fbb5dd45c	2019-11-21 16:22:35 -08:00
Supriya Rao	f2f285c240	Add arguments to benchmark to run pytext models. Output results in ms. (#30273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30273 Pytext models expect input of the form `1xlength` and another input specifying the length. Add the `pytext_len` argument to specify this. ghstack-source-id: 94383501 Test Plan: ./speed_benchmark_torch --model model.pt --input_dims "1,4" --input_type int64 --warmup 10 --iter 10 --report_pep=true --pytext_len=4 Reviewed By: iseeyuan Differential Revision: D18646028 fbshipit-source-id: 7d5fe0c36da6e5f7b0261619ce4784a46b70f3d8	2019-11-21 16:03:00 -08:00
Mingbo Wan	b2b1601b30	Docker image build on CircleCI (#29932 ) Summary: the source are copied from https://github.com/pytorch/pytorch-ci-dockerfiles, added .circleci/docker/build_docker.sh to start building job with circleci specific variables. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29932 Differential Revision: D18645740 Pulled By: mingbowan fbshipit-source-id: 15fdec85ce59f72daa418ac59792535fed1d136b	2019-11-21 15:31:51 -08:00
Junjie Bai	352731bd6e	Revert D18632773: Split libtorch.so back into libtorch_{cpu,cuda,hip} Test Plan: revert-hammer Differential Revision: D18632773 Original commit changeset: ea717c81e0d7 fbshipit-source-id: 18601439f9f81c9f389020e5a0e4e04adb21772d	2019-11-21 15:01:09 -08:00
Mike Ruberry	eff4c4d7c1	Revert D18301806: Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL Test Plan: revert-hammer Differential Revision: D18301806 Original commit changeset: 03da6a26c41e fbshipit-source-id: c1324ee8d154e7e16f5dd4f1cf3625aaa566cd39	2019-11-21 14:50:07 -08:00
Chunli Fu	cbe0a996f0	Change dimType for shapeInfo (#30183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30183 Resubmit for D18579363 with fix Test Plan: see D18579363 Reviewed By: ipiszy Differential Revision: D18623090 fbshipit-source-id: 23c9176a22d9a5547e6b298f0d51717399d10751	2019-11-21 14:35:19 -08:00
Mike Ruberry	188d0a9add	Skips flaky UtilsNMSTest.GPUEqualsCPURotatedCorrectnessTest (#30053 ) Summary: See https://github.com/pytorch/pytorch/issues/26811. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30053 Differential Revision: D18597070 Pulled By: mruberry fbshipit-source-id: a3ab8abda8e019fb9978ad8d41ef44451129868c	2019-11-21 13:44:44 -08:00
Alan Du	f4b9690f2d	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#29095 ) Summary: Given that pybind11 implements these gil functions, I don't think it makes sense for Pytorch to have its own bespoke versions. Fixes https://github.com/pytorch/pytorch/issues/29065 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29095 Differential Revision: D18301806 Pulled By: ezyang fbshipit-source-id: 03da6a26c41ee65aaadf7b67b9f0b14d2def2a5a	2019-11-21 13:44:40 -08:00
Prasun Anand	0fdbb762d1	Warn user when resizing out Tensor after arange() (#29195 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28347 gchanan , I am generating a warning as follows: ``` (torch_new) prasun@prasun-xps:~/dev/explore-array-computing$ python arange_test.py Trying 45... Before arange shape is torch.Size([1, 45]) After arange shape is torch.Size([1, 45]) Trying 46... Before arange shape is torch.Size([1, 46]) After arange shape is torch.Size([1, 46]) Trying 47... Before arange shape is torch.Size([1, 47]) After arange shape is torch.Size([1, 47]) Trying 48... Before arange shape is torch.Size([1, 48]) After arange shape is torch.Size([1, 48]) Trying 49... Before arange shape is torch.Size([1, 49]) ../aten/src/ATen/native/RangeFactories.cpp:163: UserWarning: Size of out Tensor does not match the result Tensor. The output Tensor will be resized! After arange shape is torch.Size([50]) Traceback (most recent call last): File "arange_test.py", line 10, in <module> assert len(line.shape) == 2 AssertionError ``` Is this alright ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/29195 Differential Revision: D18638184 Pulled By: ezyang fbshipit-source-id: a93e4ce615b5a315570f9951021ef74fc1d895a6	2019-11-21 13:06:14 -08:00
Jerry Zhang	1bba0eb35b	Add `clone_instance` for Module (#30168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30168 Previous implementation of `clone` in `script::Module` copies both the module instance and the class type, after we enabled type sharing https://github.com/pytorch/pytorch/pull/26666 we also need to have a function to clone instance only and share the underlying class type. Test Plan: tbd Imported from OSS Differential Revision: D18631324 fbshipit-source-id: dbadcf19695faee0f755f45093b24618c047b9d1	2019-11-21 13:00:34 -08:00
Mikhail Zolotukhin	2c1c6de122	Represent the original python name the same way in traced and scripted modules. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29912 Test Plan: Imported from OSS Differential Revision: D18533135 Pulled By: ZolotukhinM fbshipit-source-id: 080dbafa5dcd8c1fb12fec0c956e52fceec430e7	2019-11-21 11:55:40 -08:00
Edward Yang	ec30d9028a	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#29731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29731 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. Some subtleties about the patch: - There were a few functions that crossed CPU-CUDA boundary without API macros. I just added them, easy enough. An inverse situation was aten/src/THC/THCTensorRandom.cu where we weren't supposed to put API macros directly in a cpp file. - DispatchStub wasn't getting all of its symbols related to static members on DispatchStub exported properly. I tried a few fixes but in the end I just moved everyone off using DispatchStub to dispatch CUDA/HIP (so they just use normal dispatch for those cases.) Additionally, there were some mistakes where people incorrectly were failing to actually import the declaration of the dispatch stub, so added includes for those cases. - torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) - The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 - In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/l ibprotobuf.a(arena.cc.o) is referenced by DSO" - A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly. This situation also happens with custom C++ extensions. - There's a ROCm compiler bug where extern "C" on functions is not respected. There's a little workaround to handle this. - Because I was too lazy to check if HIPify was converting TORCH_CUDA_API into TORCH_HIP_API, I just made it so HIP build also triggers the TORCH_CUDA_API macro. Eventually, we should translate and keep the nature of TORCH_CUDA_API constant in all cases. Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18632773 Pulled By: ezyang fbshipit-source-id: ea717c81e0d7554ede1dc404108603455a81da82	2019-11-21 11:27:33 -08:00
Jiakai Liu	d934cf484b	call find_package(OpenMP) only when USE_OPENMP=ON (#30223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30223 I ran into find_package(OpenMP) failure in some linux environment when USE_OPENMP=OFF. Figured this workaround to unblock - not sure how hard to find & fix the root cause of find_package() failure. Test: - works in my case; - will check CI; Test Plan: Imported from OSS Differential Revision: D18640309 Pulled By: ljk53 fbshipit-source-id: b5b30623f5da4edbe59574a8b35286b74c3225d3	2019-11-21 10:35:15 -08:00
Lingyi Liu	7d3afc4186	enable the per channel dynamic quantization (#30122 ) Summary: The PR tried to enable the per-channel(row-wise) dynamic quantization for linear operator. Given we have seen some accuracy drop due to the per-tensor quantization, we expect the per-channel could help improve the accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30122 Differential Revision: D18630541 Pulled By: lly-zero-one fbshipit-source-id: d52685deec5e7de46cd686ae649a8c8765b9cacf	2019-11-21 10:12:05 -08:00
Will Feng	3ba1456aee	Fix clip_grad_norm_ / clip_grad_value_ to take input by value instead of by non-const ref (#30216 ) Summary: The original design of `torch::nn::utils::clip_grad_norm_` / `clip_grad_value_` takes input by non-const reference, which prevents users from passing rvalue reference as input into the functions. This PR changes the functions to take input by value, which matches the Python version's semantics, and also adheres to the C++ API convention that if a function modifies its input in-place, it should take that input by value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30216 Differential Revision: D18632543 Pulled By: yf225 fbshipit-source-id: 97a09d6467f982fe9c8120f483a9c07fcf13699e	2019-11-21 10:07:00 -08:00
Wen Zhang	6e4c23b02f	Add RPC internal helper that overrides the default pickler. (#30185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30185 To enable share_memory over RPC, add an internal helper that overrides the default RPC pickler. Replace D18598974 ghstack-source-id: 94299660 Test Plan: `python test/test_rpc_spawn RpcTestWithSpawn.test_use_rpc_pickler` `buck test mode/dev-nosan //caffe2/test:rpc_spawn -- test_use_rpc_pickler` Reviewed By: mrshenli Differential Revision: D18621372 fbshipit-source-id: c680ef711b2c42524c47a5266e911fa8e0cd45ae	2019-11-21 10:01:02 -08:00
Nikolay Korovaiko	e3334723b2	fix a crash due in nested bailouts (#30097 ) Summary: A prim::BailOut also needs to capture max trip counts as for some graphs they aren't constants and they are used in continuation graphs to figure out the remaining number of iterations to run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30097 Differential Revision: D18624446 Pulled By: Krovatkin fbshipit-source-id: 085d25981c6669f65848996cd2d50066cc252048	2019-11-21 09:53:12 -08:00
Edward Yang	9e81616343	Merge Tensor and Variable types. (#28287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28287 This PR eliminates the static distinction between Tensor and Variable. Every Variable is a Tensor, no need to static_cast or call the Variable constructor. To do this, I need Tensor to have API parity with Variable. I have already moved most of the methods I don't want in Tensor off Variable. These implementations are all placed in Tensor.cpp. One API difference is that all Variable methods now have const, so we no longer have faux const-correctness (see https://github.com/zdevito/ATen/issues/27 for back story) This diff is BC breaking in a few ways: - Because torch::autograd::Variable is now just an alias of at::Tensor, ADL for `torch::autograd` functions no longer works, you have to explicitly qualify them with `torch::autograd` (examples: `torch/nn/parallel/data_parallel.h`) - Because Variable and Tensor are now the same type, code which assumes that they are different types (e.g., for the purposes of templating, or enable_if checks) will not work until you delete the (now) redundant overload/specialization. (examples: `torch/nn/modules/container/any.h`, `torch/csrc/utils/pybind.h`) Some other notes: - I'm not sure what was going with the old template implementation of `extract_vars`, but I couldn't get the sfinae version to work. Replacing it with an overloading based version made it work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18571426 Pulled By: ezyang fbshipit-source-id: 2ea8151e5f1d8512cdebf1345399642e68b707b8	2019-11-21 09:26:39 -08:00
Alban Desmaison	a78e7eadbd	Fix typo in extending doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30159 Differential Revision: D18619060 Pulled By: albanD fbshipit-source-id: 1109c8da6242dffd6315b0c9de0f8ca34df0b276	2019-11-21 08:12:32 -08:00
Tongzhou Wang	5d80f30f70	add missing space to mask index error msg Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30196 Differential Revision: D18632801 Pulled By: ezyang fbshipit-source-id: 73f0ba169813cf65f9815307129743ef6fcebcb3	2019-11-21 07:46:08 -08:00
Michael Carilli	e05e90c62e	TensorTypeId-based non-RAII setter/getter for LocalTensorTypeSet (#30113 ) Summary: As discussed in https://github.com/pytorch/pytorch/pull/29592, https://github.com/pytorch/pytorch/pull/29592#issuecomment-553043596. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30113 Differential Revision: D18620080 Pulled By: ezyang fbshipit-source-id: 0b10a703e68aca6a991d500fb478bd320006d31b	2019-11-21 07:13:03 -08:00
Wanchao Liang	f7b12a9858	fix aten::grad to return optional list (#29577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29577 `torch.autograd.grad` can return none is one of the input is not in the autograd graph or not requires_grad, this fix it so that it return a list of optional tensor instead of list of tensor. This might have BC issue unfortunately, but I think it's rare both internal and external (only training use it, and most of the training use backward, instead of autograd.grad), so whitelist it. Test Plan: Imported from OSS Differential Revision: D18491642 fbshipit-source-id: d32b2b3446cf9e8b9a98f6d203a21a75643d8991	2019-11-20 22:19:10 -08:00
Jianyu Huang	38ca3552d9	Unit Test for the Legacy Dynamic Quantized Linear operator (#23139 ) Summary: Add a unit test for the Dynamic Quantized Linear operator (```torch.fbgemm_linear_quantize_weight```, ```torch.fbgemm_pack_quantized_matrix```, and ```torch.fbgemm_linear_int8_weight```) in ```test_quantized.py```. Test Plan: buck test mode/dev caffe2/test:quantized -- 'test_qlinear_legacy $test_quantized\.TestDynamicQuantizedLinear$' --print-passing-details [jianyuhuang@devvm29567.prn1.facebook.com: ~/fbsource/fbcode/caffe2/test] $ buck test mode/dev caffe2/test:quantized -- 'test_dynamic_qlinear $test_quantized\.TestQuantizedLinear$' --print-passing-details Parsing buck files: finished in 1.8 sec Building: finished in 3.4 sec (100%) 6772/6772 jobs, 2 updated Total time: 5.2 sec Trace available for this run at /tmp/testpilot.20190714-220130.2698168.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 4f180136f799ab45ec2bf5d7644cb14955d4dd7a fbpkg 6c6253f255644ca3b8ce1bc5955b0f25 at Mon Jul 8 14:13:38 2019 by twsvcscm from / usr/local/fbprojects/packages/testinfra.testpilot/651/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617 ✓ caffe2/test:quantized - test_dynamic_qlinear (test_quantized.TestQuantizedLinear) 0.023 1/1 (passed) Test output: > test_dynamic_qlinear (test_quantized.TestQuantizedLinear) ... ok > > ---------------------------------------------------------------------- > Ran 1 test in 0.024s > > OK Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/1125900044862617 Summary (total time 9.03s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D16404027 fbshipit-source-id: 4c85dd255637fd8b1eb4830e0464f48c22706f41	2019-11-20 20:59:35 -08:00
James Reed	1eb9f49cc6	Fix test_jit under pytest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30212 Test Plan: Imported from OSS Differential Revision: D18632004 Pulled By: jamesr66a fbshipit-source-id: d5cfd351890140c604535744598d0f6ad8989450	2019-11-20 20:44:28 -08:00
Jianyu Huang	b154a8cfc7	Integrating the int64_t GEMM in FBGEMM into PyTorch Linear op (#30143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30143 We would like to integrate the int64 GEMM in FBGEMM into PyTorch. This brings ~4x speedup for the Linear op with LongTensor. Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import time import torch torch.set_num_threads(1) print("M, N, K, GOPS/sec") for M in range(128, 1025, 128): N = M K = M x = torch.LongTensor(M, K) w = torch.LongTensor(K, N) NITER = 20 # Test torch.nn.functional.linear s = time.time() for _ in range(NITER): torch.nn.functional.linear(x, w) # Z = x @ w elapsed_per_iter_linear = (time.time() - s) / NITER print( "{}, {}, {}, {:0.2f}".format(M, N, K, 2.0 * M * N * K / elapsed_per_iter_linear / 1e9) ) ``` Before this PR: ``` M, N, K, GOPS/sec 128, 128, 128, 2.31 256, 256, 256, 2.49 384, 384, 384, 2.54 512, 512, 512, 2.57 640, 640, 640, 2.46 768, 768, 768, 2.59 896, 896, 896, 2.59 1024, 1024, 1024, 2.61 ``` After this PR: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/int64_gemm]# python torch_linear.py M, N, K, GOPS/sec 128, 128, 128, 5.35 256, 256, 256, 8.34 384, 384, 384, 9.03 512, 512, 512, 9.22 640, 640, 640, 9.55 768, 768, 768, 9.73 896, 896, 896, 9.82 1024, 1024, 1024, 9.63 ``` ghstack-source-id: 94308012 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D18610019 fbshipit-source-id: f830660927b2666db34427d9de51db011f80f766	2019-11-20 20:22:50 -08:00
Rohan Varma	cc16819028	Add abort API in gloo ProcessGroup Send/Recv Work (#29928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29928 Original author: Shihao Xu - Add abort to `c10d::ProcessGroup::Work`. - Change the return type of `c10d::ProcessGroup::Work::wait()` to boolean to indicate if the work is aborted after waiting. - Add unit test for the correctness of abort. ghstack-source-id: 94305515 ghstack-source-id: 94305515 Differential Revision: D5685727 fbshipit-source-id: 6e682bb563c2393a5c303c877331140417d3f607	2019-11-20 20:18:54 -08:00
lsrock1	0a77c090d5	C++ parity, convert_parameters (#29267 ) Summary: yf225 https://github.com/pytorch/pytorch/issues/25883 update parameters_to_vector and vector_to_parameters check please! Pull Request resolved: https://github.com/pytorch/pytorch/pull/29267 Differential Revision: D18628571 Pulled By: yf225 fbshipit-source-id: 03783e6b0f8183dd97ae48f3da4acb1d07083555	2019-11-20 19:59:11 -08:00
Lara	bbb3c415c9	ONNX Hardtanh Opset 11 Support (#30169 ) Summary: Add support for hardtanh that was blacklisted in opset 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30169 Reviewed By: hl475 Differential Revision: D18619552 Pulled By: houseroad fbshipit-source-id: 0c1bfb0a53d1dd2327c5db7afd03a90482abb9fe	2019-11-20 18:59:00 -08:00
Ivan Kobzarev	fd74a19aa4	apply clang format -i (#30180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30180 Just applying `clang-format -i` to not mix it with other changes Test Plan: Imported from OSS Differential Revision: D18627473 Pulled By: IvanKobzarev fbshipit-source-id: ed341e356fea31b8515de29d5ea2ede07e8b66a2	2019-11-20 16:46:43 -08:00
Mingzhe Li	1aa80471b8	minor fix to filter (#30200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30200 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --ai_pep_format True --operators None --iterations -1 --warmup_iterations -1 --wipe_cache --forward_only False --device cpu --tag_filter all --use_jit False --operator_range b-z # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: batchnorm PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.29026457108557224"} PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.2813781425356865"} PyTorchObserver {"type": "PyTorch_batchnorm_M1_N256_K3136_cpu_Eager", "metric": "latency", "unit": "ms", "value": "0.28009670320898294"} ... Reviewed By: hl475 Differential Revision: D18627512 fbshipit-source-id: 23f622b96168f90a8d8648bfd9ff9a5116baafdf	2019-11-20 16:36:04 -08:00
Amy Yang	f1a0a27da1	col max hist observer Summary: Add InputColumnMaxHistogramNetObserver and InputColumnMaxHistogramObserver to dnnlowp observers. Sample output histogram at /mnt/public/amyyang/test/col_max_test.log (generated for ctr_web_feed) ``` columns: "op_index", "input_idx", "blob_name", "col_idx", "min", "max", "nbins" ``` Test Plan: Tested with ctr_web_feed Reviewed By: csummersea Differential Revision: D18194229 fbshipit-source-id: 1402fcdc174a1f52744c850f5e2cc3bdc73c3a45	2019-11-20 16:29:53 -08:00
James Reed	449828378d	Serialize ClassType as its qualname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30058 Test Plan: Imported from OSS Differential Revision: D18584269 Pulled By: jamesr66a fbshipit-source-id: 5f1d0142bd7cd94eecbd2ed9250a0de47639040b	2019-11-20 16:17:26 -08:00
Shen Li	2803261a23	Update API doc for wait_all_workers after rename Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30179 Test Plan: Imported from OSS Differential Revision: D18623092 Pulled By: mrshenli fbshipit-source-id: 1bbffc7476f256c156783274f7ef51342820edcd	2019-11-20 16:12:30 -08:00
Rohan Varma	de05114618	polish examples in docstrings and update docs to reflect correct use of (#30052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30052 Some of the examples provided in `rpc/api.py` were not updated along with the code changes, this PR updates them. Also removes the `dist.ProcessGroup` information since `init_rpc` now initializes a default process group. ghstack-source-id: 94273004 Test Plan: Unit tests pass Differential Revision: D18582596 fbshipit-source-id: a637683f0221f9600f7e50b74e9f7e5a1d331d8f	2019-11-20 15:30:38 -08:00
Jeremy Lilley	bebed492cf	Make RRefContext singleton leaky, deal with module destruct order race. (#30172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30172 RRefContext is a conventional singleton, used by rref.cpp. At module teardown time, it's not defined whether rref_context.cpp or rref.cpp will be destroyed first. We were observing a SIGSEGV because RRefContext is destroyed before a dangling ~UserRRef() call is able to execute. Particularly, the underlying ctx.agent()->getWorkerInfo(ownerId_) call failed. This change just avoids the SIGSEGV by forcing an intentional leak, though we still need to deal with why there's a dangling UserRref at module destruction time. ghstack-source-id: 94287441 Test Plan: existing test suite test_elastic_averaging in context of D18511430, where the segfault reproed reliable. Differential Revision: D18620786 fbshipit-source-id: 17b6ccc0eb1724b579a68615e4afb8e9672b0662	2019-11-20 15:12:51 -08:00
Nathan Goldbaum	211e39fd1c	add docs for profiling PyTorch with py-spy (#30166 ) Summary: This adds developer documentation for profiling PyTorch using py-spy. In my work on `__torch_function__` I found its ability to profile native code and dump flame graphs extremely useful. I'm not aware of another Python sampling profiler with similar functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30166 Differential Revision: D18625133 Pulled By: ezyang fbshipit-source-id: cf1b851564a07c9f12fcf1338ac4527f4a3c61c0	2019-11-20 15:09:40 -08:00
Wanchao Liang	36aaa299f8	shut up clang-tidy on ir.h/cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30118 Test Plan: Imported from OSS Differential Revision: D18620239 fbshipit-source-id: 5734d9d1f38a9b38ac4a1fc121fb246b783fa262	2019-11-20 13:19:25 -08:00
Jiakai Liu	43fb0015db	custom build script (#30144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30144 Create script to produce libtorch that only contains ops needed by specific models. Developers can use this workflow to further optimize mobile build size. Need keep a dummy stub for unused (stripped) ops because some JIT side logic requires certain function schemas to be existed in the JIT op registry. Test Steps: 1. Build "dump_operator_names" binary and use it to dump root ops needed by a specific model: ``` build/bin/dump_operator_names --model=mobilenetv2.pk --output=mobilenetv2.yaml ``` 2. The MobileNetV2 model should use the following ops: ``` - aten::t - aten::dropout - aten::mean.dim - aten::add.Tensor - prim::ListConstruct - aten::addmm - aten::_convolution - aten::batch_norm - aten::hardtanh_ - aten::mm ``` NOTE that for some reason it outputs "aten::addmm" but actually uses "aten::mm". You need fix it manually for now. 3. Run custom build script locally (use Android as an example): ``` SELECTED_OP_LIST=mobilenetv2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` 4. Checkout demo app that uses locally built library instead of downloading from jcenter repo: ``` git clone --single-branch --branch custom_build git@github.com:ljk53/android-demo-app.git ``` 5. Copy locally built libraries to demo app folder: ``` find ${HOME}/src/pytorch/android -name '*.aar' -exec cp {} ${HOME}/src/android-demo-app/HelloWorldApp/app/libs/ \; ``` 6. Build demo app with locally built libtorch: ``` cd ${HOME}/src/android-demo-app/HelloWorldApp ./gradlew clean && ./gradlew assembleDebug ``` 7. Install and run the demo app. In-APK arm-v7 libpytorch_jni.so build size reduced from 5.5M to 2.9M. Test Plan: Imported from OSS Differential Revision: D18612127 Pulled By: ljk53 fbshipit-source-id: fa8d5e1d3259143c7346abd1c862773be8c7e29a	2019-11-20 13:16:02 -08:00
vishwakftw	ae6af8d55f	Enable multinomial for torch.half (#29266 ) Summary: Changelog: - Re-enable multinomial sampling when the probability tensor has `dtype == torch.half`. It seems to have been missed in https://github.com/pytorch/pytorch/issues/28481. Fixes https://github.com/pytorch/pytorch/issues/29211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29266 Differential Revision: D18619105 Pulled By: ezyang fbshipit-source-id: 1f87e5183e75de5c5e0ffde862fc72d040b32864	2019-11-20 13:06:46 -08:00
svcscm	51259e5024	Updating submodules Summary: GitHub commits: `7cc9d9257b` `93d91859c8` `ab0a6495f6` `3cd75736a7` `fb3e6aac5d` `4ac5fd6ed9` `cf783ae678` `6aaaa4754f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 60b6d716aa073eda5fc6dbcbd0daeee536c25314	2019-11-20 13:02:49 -08:00
Edward Yang	c4e7f1b232	Revert D18579363: Change dimType for shapeInfo Test Plan: revert-hammer Differential Revision: D18579363 Original commit changeset: 72d5a2a8a20a fbshipit-source-id: 282c195a160892641728d0fbcc2e704a4b5b2d05	2019-11-20 12:59:02 -08:00
James Reed	c2b7b2cbf8	Make observed values actually flow through observers (#30140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30140 This seems more semantically correct to me, and makes it so we don't have to iterate over Uses of observed values Test Plan: Imported from OSS Differential Revision: D18610676 Pulled By: jamesr66a fbshipit-source-id: f835266f148bd8198b05cd9df95276e1112dd250	2019-11-20 12:48:16 -08:00
James Reed	2d534abb39	Modernize graph mode IR API calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30130 Test Plan: Imported from OSS Differential Revision: D18608004 Pulled By: jamesr66a fbshipit-source-id: 42e946ec96b1d26a364abe0a7eb71aa0aecc52ed	2019-11-20 12:48:12 -08:00
Shen Li	73cf4d468f	Design doc for Remote Reference (#30066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30066 This commit adds design reasoning and walks through four scenarios for RRef. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D18595094 Pulled By: mrshenli fbshipit-source-id: 134102901ce515a44a2e7cd013b62143a6158120	2019-11-20 12:42:28 -08:00
Will Feng	5cbdbddc12	Add test for F::max_unpool3d, and update parity table Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30171 Differential Revision: D18620503 Pulled By: yf225 fbshipit-source-id: 52adf9a6c0238b5cdb2e11e03807fb7dd73880bf	2019-11-20 12:42:24 -08:00
Rohan Varma	f304bd5062	rename join_rpc to wait_all_workers in public api (#30050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30050 Renames this API to wait_all_workers as discussed. ghstack-source-id: 94273005 Test Plan: Unit tests pass Differential Revision: D18581466 fbshipit-source-id: 4ff5d5fb2d528f17252d5b5f30c3047d2efb92bf	2019-11-20 12:38:35 -08:00
Will Feng	a460c856dd	Fix naming for kl_div and binary_cross_entropy functional options (#30146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146 This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options. Test Plan: Imported from OSS Differential Revision: D18618971 Pulled By: yf225 fbshipit-source-id: 2af62c1a0ace2cd0c36c2f1071639bf131d8fe61	2019-11-20 12:23:50 -08:00
Mingzhe Li	9cb8fb61c2	update operator_range discription in op bench (#30170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30170 as title Test Plan: ``` buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range ef ... ValueError: The correct format for operator_range is <start>-<end>, or <point>, <start>-<end> buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range a-b # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 60.551 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cuda # Input: M: 8, N: 32, K: 256, device: cuda Forward Execution Time (us) : 67.716 ... buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/benchmark_all_other_test.par --tag_filter all --iterations 1 --operator_range b,d-f # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cpu # Input: M: 1, N: 256, K: 3136, device: cpu Forward Execution Time (us) : 296.004 ... Reviewed By: hl475 Differential Revision: D18619975 fbshipit-source-id: 08f27ee2aeda47be431385f4b20ef7fbeb797516	2019-11-20 12:07:14 -08:00
Shen Li	ff7afede92	Stop showing .api as an API path component in RPC docs (#30160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30160 The path torch.distributed.rpc.api is an implementation detail, which should not be used by applications to import RPC APIs. Instead, all RPC APIs are exposed directly as torch.distributed.rpc.*. This commit makes the API doc consistent with the above expectation. Test Plan: Imported from OSS Differential Revision: D18616359 Pulled By: mrshenli fbshipit-source-id: 8207f7d36c24cf55af737c03a27fd1896c231641	2019-11-20 12:04:10 -08:00
Owen Anderson	0762bbfc9a	Eliminate tensor copies from compute_common_type_ in TensorIterator. (#30018 ) Summary: This requires refactoring at::native::result_type to operate as a state machine, processing the input types one at a time. There may be other places in the code base that could benefit from adopting this approach as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30018 Differential Revision: D18606427 Pulled By: resistor fbshipit-source-id: f6b779326bdb746508690cf7ca6de777adc66244	2019-11-20 11:51:28 -08:00
Chunli Fu	ff94ddda08	Change dimType for shapeInfo (#30047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30047 Previously, we use dimType to represent dimension type for a tensor. Now, change it to vector<DimType> to record dim type for every dimension of the tensor. Reviewed By: yinghai, ipiszy Differential Revision: D18579363 fbshipit-source-id: 72d5a2a8a20a7653e73e64c8eb97f7eed953ea93	2019-11-20 11:43:35 -08:00
Michael Suo	7201a2e854	remove consistency check from setup (#30043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30043 This is already checked on in the GH actions linter, so this check is redundant. And putting it in `setup` has the effect of blocking direct changes to config.yml when I want to experiment, which is a bit bothersome. Test Plan: Imported from OSS Differential Revision: D18611674 Pulled By: suo fbshipit-source-id: f81670ae9f264408a3ea72c1ba5fcea208681311	2019-11-20 11:14:47 -08:00
Raghuraman Krishnamoorthi	67b77afcdf	Fast histogram observer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29790 Test Plan: import torch import time import numpy as np from torch.quantization.observer import HistogramObserver X = torch.randn(1,1,224,224) obs = HistogramObserver(2048) acc_time = 0 for i in range(100): X = torch.randn(10,1,320,320) start = time.time() obs(X) #obs.forward_new(X) acc_time = acc_time + time.time()-start print(acc_time) Imported from OSS Differential Revision: D18508562 fbshipit-source-id: 456e82360ce1b3f9d8b6e1832d23f1339655011a	2019-11-20 11:14:41 -08:00
Will Feng	f03db0cd19	Add torch::nn::functional to C++/Python parity tracker (#29819 ) Summary: This PR adds all `torch::nn::functional` functions and updated their parity status in the C++/Python parity tracker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29819 Differential Revision: D18617762 Pulled By: yf225 fbshipit-source-id: 75a4d770e2da28b626f785cab243465dbc51efd1	2019-11-20 11:14:36 -08:00
Jerry Zhang	f2b851a9e5	Returning axis from calculate_qparams (#29494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29494 `calculate_qparams` of per channel quantization should return the axis, this PR added this and also added corresponding support in graph mode Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18580905 fbshipit-source-id: f9691c1f043f8bca39f81716a4d0b10f60a65396	2019-11-20 11:06:48 -08:00
Jerry Zhang	64817a43d2	Test for per channel graph mode quantization (#29493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29493 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18580907 fbshipit-source-id: 05218e012c0322bb88714670d5dbe9332252f2ee	2019-11-20 11:06:44 -08:00
David Reiss	fbcb88e8b3	Split module.cpp and export.cpp to support saving on mobile (#29881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29881 Breaking these into separate files allows us to have three different builds: - Mobile inference-only. - Mobile with module saving. - Server with module saving and other export functions like ONNX. And this can be accomplished just by selecting which cpp files to compile, without setting any preprocessor flags. Test Plan: CI. Local mobile+saving build. Reviewed By: smessmer Differential Revision: D18509296 fbshipit-source-id: 9438273bac4624df5c7f035b2bacb901cce43053	2019-11-20 10:47:21 -08:00
Will Feng	72bc7bf37b	Revert D18612158: Fix naming for kl_div and binary_cross_entropy functional options Test Plan: revert-hammer Differential Revision: D18612158 Original commit changeset: 8c403fa1c2a0 fbshipit-source-id: f22d7c4664119d4e7397fc017bacecf3e318af11	2019-11-20 10:26:31 -08:00
Mingzhe Li	d11dfd1a84	only run embeddingbag op on cpu (#30163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30163 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag Parsing buck files: finished in 0.9 sec Building: finished in 02:32.5 min (100%) 7358/7358 jobs, 1 updated Total time: 02:33.5 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --operators embeddingbag Parsing buck files: finished in 0.9 sec Building: finished in 5.3 sec (100%) 5604/5604 jobs, 0 updated Total time: 6.3 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue_cpu # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True, device: cpu Forward Execution Time (us) : 62.608 ... Reviewed By: hl475 Differential Revision: D18617540 fbshipit-source-id: 062dd73c455db8b67749078603745651b55254b2	2019-11-20 10:02:39 -08:00
Will Feng	e84fcc1fd1	Fix naming for kl_div and binary_cross_entropy functional options (#30146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30146 This PR fixes naming for kl_div and binary_cross_entropy functional options, to be more consistent with the naming scheme of other functional options. Test Plan: Imported from OSS Differential Revision: D18612158 Pulled By: yf225 fbshipit-source-id: 8c403fa1c2a0a65734a3ec2387cc0937c46cab24	2019-11-20 09:44:21 -08:00
Edward Yang	b0309d1b5b	More documentation on caffe2_interface_library (#29903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29903 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18616888 Pulled By: ezyang fbshipit-source-id: 360760a688dcc8ba117cd79d89db2afb2c35ab27	2019-11-20 08:58:01 -08:00
Iurii Zdebskyi	36a47d71e1	Enabled bfloat16 for cuda (#27259 ) Summary: Enabled basic support for bfloat16 on cuda Tested via unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/27259 Differential Revision: D17728661 Pulled By: izdeby fbshipit-source-id: 99efb6bc4aec029fe6bbc8a68963dca9c9dc5810	2019-11-20 08:49:56 -08:00
Shen Li	551e387fff	Disable flaky test test_graph_for_py_nested_remote_call Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30132 Test Plan: Imported from OSS Differential Revision: D18609560 Pulled By: mrshenli fbshipit-source-id: 00fbfc8753e002808f49cf9f09ce0c0966a74485	2019-11-20 07:44:00 -08:00
Vitaly Fedyunin	13283e0cbb	Change order of recalculating numel and restriding (#30025 ) Summary: Fix order of recalculating numel and restriding as first one should always go first Pull Request resolved: https://github.com/pytorch/pytorch/pull/30025 Differential Revision: D18576446 Pulled By: VitalyFedyunin fbshipit-source-id: fe9e18ec2bbb7b43d634e150f8979b8d6b7c5196	2019-11-20 07:36:14 -08:00
xiaobing.zhang	c2c835dd95	Port sigmoid backward to Aten(CPU+CUDA) (#29185 ) Summary: VitalyFedyunin, This PR is about port sigmoid backward to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" if torch.cuda.is_available(): device = "cuda" #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) for i in range(1000): output = input.sigmoid().sum() output.backward() #get running time for n in [100, 10000]: bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) for i in range(10000): output = input.sigmoid().sum() t1 = _time() output.backward() t2 = _time() bwd_t = bwd_t + (t2 - t1) bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d), backwad avg time is %.2f (ms)." % (n, bwd_avg)) ``` Test Device: CPU: skx-8280, GPU: Tesla P40 Perfromance: Before: ``` GPU: input size(128, 100), backwad avg time is 0.14 (ms). input size(128, 10000), backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100), backwad avg time is 0.06 (ms). input size(128, 10000), backwad avg time is 4.21 (ms). OMP_NUM_THREADS=1 input size(128, 100), backwad avg time is 0.06 (ms). input size(128, 10000), backwad avg time is 2.30 (ms). ``` After: ``` GPU: input size(128, 100), backwad avg time is 0.14 (ms). input size(128, 10000), backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100), backwad avg time is 0.05 (ms). input size(128, 10000), backwad avg time is 0.48 (ms). OMP_NUM_THREADS=1 input size(128, 100), backwad avg time is 0.04 (ms). input size(128, 10000), backwad avg time is 0.86 (ms). ``` How to set number thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29185 Differential Revision: D18587352 Pulled By: VitalyFedyunin fbshipit-source-id: 8167ca261960399f795d35a83fa8c4be365bc4da	2019-11-20 07:31:42 -08:00
albanD	c0104a1c89	Fix typo in comment in cpp_extension (#30028 ) Summary: From https://github.com/pytorch/pytorch/issues/26614 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30028 Differential Revision: D18597666 Pulled By: albanD fbshipit-source-id: 93bf0e4ee34a63df4b544d44f630a9c0fc95fd83	2019-11-20 07:16:48 -08:00
Pavel Belevich	f8e7f3fca4	C++ API parity: BCEWithLogitsLoss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28783 Test Plan: Imported from OSS Differential Revision: D18202435 Pulled By: pbelevich fbshipit-source-id: 011b028bbb2a091e98d3548616b99d7b4569c239	2019-11-20 06:46:38 -08:00
Michael Suo	93db2b86d1	Fix type sharing on loaded ScriptModules (#29826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29826 After save/load, we lose concrete type information. So if you tried to script something that contained a loaded ScriptModule as a submodule, the following sequence happened: 1. During ConcreteType inference, the loaded submodule got a new inferred type. 2. But it already has a type! So there was a type mismatch. To fix this, we should generate a ConcreteType directly from the loaded submodule type (similar to what we do for interfaces). This makes sense too--the ConcreteModuleType should be empty, since all the "sugaredness" was stripped out during the save/load process. Test Plan: Imported from OSS Differential Revision: D18575009 Pulled By: suo fbshipit-source-id: 4d329b7e9b7e7624f459e50092e35ab0ab813791	2019-11-20 01:13:09 -08:00
Michael Suo	558a777615	Re-unify module and interface in ConcreteModuleType (#29825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29825 We made `ModuleInfo` a union initially to represent the idea that a submodule could either be a regular module or a module interface. This PR represents module interfaces as a ConcreteModuleType with no info (e.g. no "sugaredness"), and with the interface type as the underlying `jitType_`. This has the effect of reducing the special casing around adding/maintaining module info. Test Plan: Imported from OSS Differential Revision: D18575011 Pulled By: suo fbshipit-source-id: 53e297b39aa1a03bcdadd795ff225aa68fec9d70	2019-11-20 01:13:06 -08:00
Michael Suo	63e66fd267	Split ConcreteModuleType into two types (#29824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29824 We have two distinct phases/uses for ConcreteModuleType: 1. We are building it up and using it to check whether we can reuse JIT types. (RawConcreteModuleType) 2. We are using it to satisfy ModuleValue::attr queries. (ConcreteModuleType) These types share an underlying `ConcreteModuleTypeData` which actually stores the relevant info. Previously they were the same type because I was lazy, but it's been the source of a bug. So split them to formalize the differing invariants for the two phases. Test Plan: Imported from OSS Differential Revision: D18575010 Pulled By: suo fbshipit-source-id: 3e4ebcd36e78b947150d8f0dbb74ecccad23e7c4	2019-11-20 01:13:02 -08:00
svcscm	7495c25440	Updating submodules Summary: GitHub commits: `b21fd47972` `950060c67b` `d5cfc73665` `195d10ad15` `22c4b39574` `0306e01233` `fc0ad8b966` `6f87219b24` `9c674a1271` `69ac8aeb62` `672beabd4c` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 96ba9389d7c7faf53c0c5775a425dbea17da217a	2019-11-19 23:21:05 -08:00
Pritam Damania	c06f9023e5	Polish rpc docstring. (#30069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30069 1) Fix rpc docstrings 2) Fix some links ghstack-source-id: 94250890 Test Plan: waitforbuildbot Differential Revision: D18588231 fbshipit-source-id: 33846ace1afa94d25f34b0370437abf6d9408f06	2019-11-19 23:10:14 -08:00
Linbin Yu	def2985e90	add flag to strip C10 error message (#30111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30111 Add flag to strip C10 error message. To ensure there's no size regression, add the same flag to existing caffe2 and pytorch build Test Plan: size bot check Reviewed By: dreiss Differential Revision: D18577969 fbshipit-source-id: 84ac57b11ec5c29e831d619260024a0a4a6fdcd0	2019-11-19 22:53:59 -08:00
Pritam Damania	88ef402cb5	Add distributed optimizer section to distributed autograd design doc. (#30068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30068 ghstack-source-id: 94228719 Test Plan: waitforbuildbot Differential Revision: D18556536 fbshipit-source-id: decd6927bfdd1ee3c81fef7430aa7095d7f38d33	2019-11-19 22:43:03 -08:00
Yanli Zhao	b410d864c9	make python remote exception to rethrow when using remote reference to itself (#29930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29930 Right now, python call remote exception rethrown is coupled with deserializtiaon. For owner ref, the setValue() and getValue() do not use serialization and deserialization, so when users create a ref to itself, and call ownerRef.to_here(), python call remote exception will not be rethrown. This diff is to move remote exception rethrown out of deserialization, and exception can be handled for ownerRef.localValue() or ownerRef.to_here() close #29924 ghstack-source-id: 94210894 Test Plan: unit tests Differential Revision: D18541916 fbshipit-source-id: 7cda93f623d52c740b3c1b1fa9a442f866984340	2019-11-19 21:33:21 -08:00
Ivan Kobzarev	1b26e3ff6d	fbjni gradle obey ABI_FILTERS parameter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30135 Test Plan: Imported from OSS Differential Revision: D18610031 Pulled By: IvanKobzarev fbshipit-source-id: 7dd8240b71e9f6d77f723243991cd1b5c9984df6	2019-11-19 20:09:48 -08:00
Pavel Belevich	cc81769e10	C++ API parity: isfinite Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30083 Test Plan: Imported from OSS Differential Revision: D18594723 Pulled By: pbelevich fbshipit-source-id: 5970e0aa6ef8994e9c4a741784fd053383aaceb7	2019-11-19 20:00:05 -08:00
Jerry Zhang	b2291d4600	Make PerChannelMinMaxObserver scriptable using `torch.jit.ignore` (#29416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29416 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18580906 fbshipit-source-id: 5370300b89e26c2b4662b17e51284e8708cb5843	2019-11-19 19:12:55 -08:00
Shihao Xu	80e3f17301	Resubmit "Add `RpcAgentOptions` struct type, which bundles different required arguments for different `RpcAgent`s" (#30093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30093 https://github.com/pytorch/pytorch/pull/28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in https://github.com/pytorch/pytorch/issues/29031. To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields. ghstack-source-id: 94197295 Test Plan: ### OSS RPC + RRef tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork ``` ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_rpc_fork_test -- test_sync_rpc ``` ### Prototype RRef tests ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc_thrift_rpc_agent ``` ### Dist autograd ``` buck test mode/dev-nosan caffe2/test:dist_autograd_fork ``` ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:thrift_dist_autograd_fork_test ``` Differential Revision: D18595578 fbshipit-source-id: 616fca3b844c171ed5277bbc6a2b1693bc3a8065	2019-11-19 18:52:30 -08:00
Guanheng Zhang	15bc41a8aa	Overwrite __setstate__ func in MultiheadAttention (#29001 ) Summary: Overwrite `__setstate__` func in nn.MultiheadAttention func and add `self._qkv_same_embed_dim` attribute in the `dict`. Current users should not be affected by the change. The changes have been tested to load a MultiheadAttention model trained by PyTorch 1.1. If users have an old MultiheadAttention model, please use `torch.load` func to load the old model for inference under v1.4.0 and above. ``` import torch model = torch.load('old_v1.1.0_MultiheadAttention.pt') # model works for torch 1.4 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29001 Differential Revision: D18257671 Pulled By: zhangguanheng66 fbshipit-source-id: fa41b85f6d53034dc9f445af60f2ad9636e9abf7	2019-11-19 18:32:44 -08:00
Alisson Gusatti Azzolini	07e14c7cd0	DistributedOptimizer: wait for all workers to finish _LocalOptimizer constructor (#30062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30062 This allows to catch exceptions during optimizer creation. ghstack-source-id: 94232436 Test Plan: new unit test. Differential Revision: D18586108 fbshipit-source-id: 71cfdf337fe803dbea8787b4c68e5a52b70a1f68	2019-11-19 18:30:00 -08:00
Tao Xu	2367e71f55	Disable ProfilingGraphExecutorImpl for mobile (#30067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30067 ### Summary The mobile build has been broken since last week due to a runtime error caused by a missing operator in JIT: ```shell libc++abi.dylib: terminating with uncaught exception of type torch::jit::script::ErrorReport: Unknown builtin op: aten::_adaptive_avg_pool2d_backward. Could not find any similar ops to aten::_adaptive_avg_pool2d_backward. This op may not exist or may not be currently supported in TorchScript. : at <string>:9:28 grad_self = grad.expand(self.size()) / (self_size[-1] * self_size[-2]) else: grad_self = torch._adaptive_avg_pool2d_backward(grad, self) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE return grad_self ``` ### How this happens Since we've disabled the autograd for the opensourced version, the `backward` ops won't get registered by JIT. When `forward` runs, a `GraphExecutor` will be created according to the value of `executor_mode`. In the mobile case , this one was set to true, which gives us the `ProfilingGraphExecutorImpl` object. Seems like this executor will eventually try to emit IR for autograd schemas? which causes the error. ### Fix There are two ways to fix it. 1. Add a macro to disable `profiling_mode` as well as `executor_mode` on mobile. Like what `FBCODE_CAFFE2` does [here](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/profiling_graph_executor_impl.cpp#L22). 2. Disable the two modes in runtime, by calling ` torch::jit::getExecutorMode() = false;` before calling forward. (IMO, The second fix is sort of a workaround as it doesn't make sense from a user perspective (Why I need to do this). But the up side is that we don't have to introduce yet another macro ) Feel free to drop comments, if there is a better way to fix it. ### How this was not detected by our mobile CI We're working on adding runtime tests to our mobile build to prevent similar issues like this. ### Test Plan - The error above disappears - Don't break CI cc AshkanAliabadi Test Plan: Imported from OSS Differential Revision: D18605998 Pulled By: xta0 fbshipit-source-id: 11fa85c2b44d54bc28a9c45731af0f5d17d5804c	2019-11-19 18:04:57 -08:00
Mikhail Zolotukhin	2c8dce915c	Show full call stack in TorchScript exception even when calls were inlined. Summary: This uses newly added InlinedCallStack to print the original call stack even if the real call stack is shallower because of inlining. This change also makes torchscript stacktraces look like python ones. Example: ``` torch.jit.script def baz(c, b): return c + b torch.jit.script def foo(c, b): return baz(c, b) torch.jit.script def bar(c, b): return foo(c, b) bar(torch.rand(10), torch.rand(9)) ``` Output before: ``` Traceback (most recent call last): File "fail.py", line 25, in <module> bar(torch.rand(10), torch.rand(9)) RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0 The above operation failed in interpreter, with the following stack trace: at fail.py:15:11 torch.jit.script def baz(c, b): return c + b ~~~~~ <--- HERE ``` Output after: ``` Traceback (most recent call last): File "fail.py", line 41, in <module> bar(torch.rand(10), torch.rand(9)) RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0 The above operation failed in interpreter. Traceback (most recent call last): File "fail.py", line 33 torch.jit.script def bar(c, b): return foo(c, b) ~~~ <--- HERE File "fail.py", line 29, in foo torch.jit.script def foo(c, b): return baz(c, b) ~~~ <--- HERE File "fail.py", line 25, in baz torch.jit.script def baz(c, b): return c + b ~~~~~ <--- HERE ``` Output of non-scripted python code: ``` Traceback (most recent call last): File "fail.py", line 36, in <module> bar(torch.rand(10), torch.rand(9)) File "fail.py", line 21, in bar return foo(c, b) File "fail.py", line 18, in foo return baz(c, b) File "fail.py", line 15, in baz return c + b RuntimeError: The size of tensor a (10) must match the size of tensor b (9) at non-singleton dimension 0 ``` Differential Revision: D18532812 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: e7e5ba5e4a8f1c7086406271d0f1685d9db8541a	2019-11-19 17:58:55 -08:00
Mikhail Zolotukhin	a9d1465c82	Add logging to inliner. (#27922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27922 gh-metadata: pytorch pytorch 27922 gh/ZolotukhinM/140/head Differential Revision: D17914135 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: d75bdf1efbfdc877f10017b16046bdbdc97e2dd6	2019-11-19 17:58:51 -08:00
Mikhail Zolotukhin	59eb682ce3	Add InlinedCallStack class. (#27921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27921 InlinedCallstack serves a similar purpose to Scope, but instead of storing string names of the functions it stores pointer to Function objects themselves. Currently, scopes are used in tracing and callstacks are used in scripting - hopefully I would be able to merge them in future. gh-metadata: pytorch pytorch 27921 gh/ZolotukhinM/139/head Differential Revision: D17914132 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: b1daa6700199ee1a97a7f49a6fced9ac0dc13051	2019-11-19 17:58:46 -08:00
Mikhail Zolotukhin	12263cfa98	Make inlineCallTo to take Function instead of Graph as the callee argument. (#27920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27920 gh-metadata: pytorch pytorch 27920 gh/ZolotukhinM/138/head Differential Revision: D17914133 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 6aec2a71ed5718fecab81a107e37b26088b94c65	2019-11-19 17:58:42 -08:00
Mikhail Zolotukhin	0eb8c3dbfb	Add a variant of insertGraph that fills values map. (#27919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27919 gh-metadata: pytorch pytorch 27919 gh/ZolotukhinM/137/head Differential Revision: D17914134 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ecc85c97b497eaf82e25e9c6b4477f6b1103bf69	2019-11-19 17:58:37 -08:00
Junjie Bai	e951f7cf58	Add Python3 ROCm CentOS docker image (#30119 ) Summary: `959b068874` `d2ee605730` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30119 Differential Revision: D18604645 Pulled By: bddppq fbshipit-source-id: d9375e44dad9570ef8fc3d1bbd557795543f8bb2	2019-11-19 17:54:05 -08:00
Will Feng	bb1d9b238d	torch::nn::FractionalMaxPool{2,3}d module and functional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29933 Test Plan: Imported from OSS Differential Revision: D18548174 Pulled By: yf225 fbshipit-source-id: 070776db6e8b7ad94d9b7cbd82b3d6966f061a46	2019-11-19 17:24:07 -08:00
Divyansh Singhvi	ec52d911bd	InstanceNorm{1,2,3}d (#28790 ) Summary: Hi yf225, I have a few doubts related to implementation: 1) What tests do I have to write? 2) What does _load_state_from_dict does? 3) Do I need to override reset() function as I can not see it's utility? 4) InstanceNormOptions could be removed with BatchNormOptions, but I find that `track_running_status` is not defined instead `stateful` is defined. InstanceNorm{1,2,3}d https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28790 Differential Revision: D18588666 Pulled By: yf225 fbshipit-source-id: bb9b81f01f62c3fc8765fa0ba0716768087ee155	2019-11-19 16:57:01 -08:00
Ivan Kobzarev	8e3486de81	No debug symbols in release android buidls (#30123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30123 In groovy string `'false'` is resolved as boolean `true` thats why even as in `gradle.properties`: ``` nativeLibsDoNotStrip=false ``` branch `if (nativeLibsDoNotStrip)` always passed Test Plan: Imported from OSS Differential Revision: D18606907 Pulled By: IvanKobzarev fbshipit-source-id: c10140e775624294c732e78ae3c41e05c7c9ad92	2019-11-19 16:44:56 -08:00
Tao Xu	5fa941d4e2	update fastlane to use Scanfile (#29963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29963 ### Summary To run unit tests via Fastlane, simply run `fastlane scan`. Under the hood, it uses `xcodebuild` to run the unit tests. The Scanfile serves as a config file for Fastlane where you can specify parameters you want to pass to `xcodebuild`. More about Scan - https://docs.fastlane.tools/actions/scan/ ### Test Plan - `fastlane scan` is able to run on CI machines. Test Plan: Imported from OSS Differential Revision: D18606098 Pulled By: xta0 fbshipit-source-id: b4727d964fa56076b2ff383b40d1b13607721394	2019-11-19 16:32:26 -08:00
Will Feng	99c59d73a7	Remove input_channels / output_channels / with_bias from ConvOptions (#29838 ) Summary: Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838 Differential Revision: D18597943 Pulled By: yf225 fbshipit-source-id: 59101437f032f042574998eb90eaf0be09352364	2019-11-19 16:28:54 -08:00
Shihao Xu	868cb05a30	Resubmit "Add RpcAgentTestFixture to extract duplicate code" (#30092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30092 There are duplicate code for component that rely on RpcAgent. Extract them into a re-usable test fixture class. ghstack-source-id: 94196891 Test Plan: ### RPC + RRef ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck test mode/dev-nosan //caffe2/test:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift ``` ### Dist Autograd ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn_thrift ``` ### Dist Optimizer ``` buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork_thrift buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn_thrift ``` Differential Revision: D18595408 fbshipit-source-id: 8360759c63e838fb19d4eb1aeacca0bf8eb4b55f	2019-11-19 16:24:51 -08:00
Vitaly Fedyunin	877c96cddf	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30008 Test Plan: Imported from OSS Differential Revision: D18575981 Pulled By: VitalyFedyunin fbshipit-source-id: ec3418257089ad57913932be1a8608cd20ce054c	2019-11-19 16:19:29 -08:00
Vitaly Fedyunin	e46babb637	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30007 Test Plan: Imported from OSS Differential Revision: D18575982 Pulled By: VitalyFedyunin fbshipit-source-id: 83be0857fe1080216cd09547a2b3d34455a0cce4	2019-11-19 16:19:24 -08:00
Vitaly Fedyunin	04018ba865	explicitly provide memory format when calling to *_like operators (Redo of 81bf7364) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30006 Test Plan: Imported from OSS Differential Revision: D18575984 Pulled By: VitalyFedyunin fbshipit-source-id: b72ea0404f0363001c94f39567c0aeae71cb1f67	2019-11-19 16:19:20 -08:00
Vitaly Fedyunin	66913fe5c1	explicitly provide memory format when calling to *_like operators (Redo of cc1c01) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30005 Test Plan: Imported from OSS Differential Revision: D18575976 Pulled By: VitalyFedyunin fbshipit-source-id: 94cc213f42f9bd50eaa096872f38c4563e5c9ba1	2019-11-19 16:19:16 -08:00
Vitaly Fedyunin	dc9e7b73e1	explicitly provide memory format when calling to *_like operators (Redo of e3e06549) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30004 Test Plan: Imported from OSS Differential Revision: D18575977 Pulled By: VitalyFedyunin fbshipit-source-id: 344e9a11c93c7e4a822f424c94fa2255592d118e	2019-11-19 16:19:11 -08:00
Vitaly Fedyunin	66cb93c762	explicitly provide memory format when calling to *_like operators (Redo of 4b4aa) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30003 Test Plan: Imported from OSS Differential Revision: D18575975 Pulled By: VitalyFedyunin fbshipit-source-id: ce767d116bd821c8e16a7fc7d1be3fca957dcada	2019-11-19 16:19:07 -08:00
Vitaly Fedyunin	295feb4e9a	explicitly provide memory format when calling to *_like operators (Redo of ce438f6967) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30002 Test Plan: Imported from OSS Differential Revision: D18575983 Pulled By: VitalyFedyunin fbshipit-source-id: f018c04c2799a42196077e9868f799cbb047ac6d	2019-11-19 16:19:03 -08:00
Vitaly Fedyunin	20b73e1805	explicitly provide memory format when calling to *_like operators (Redo of 631b22d) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30001 Test Plan: Imported from OSS Differential Revision: D18575979 Pulled By: VitalyFedyunin fbshipit-source-id: d6fe8a6e1b45673f85a0dd49bd6becfadc5091b4	2019-11-19 16:18:58 -08:00
Vitaly Fedyunin	c15a4a0971	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30000 Test Plan: Imported from OSS Differential Revision: D18575980 Pulled By: VitalyFedyunin fbshipit-source-id: b0e804fe84ada0617852025fa502c0fb93849cb9	2019-11-19 16:18:54 -08:00
Mingzhe Li	2b1466e665	allow operator_range to take multiple ranges (#30124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30124 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operator_range a,b-c # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cuda # Input: M: 8, N: 32, K: 256, device: cuda Forward Execution Time (us) : 71.683 # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cuda # Input: M: 1, N: 256, K: 3136, device: cuda Forward Execution Time (us) : 118.840 # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N8192_K1_cuda # Input: M: 1, N: 8192, K: 1, device: cuda Forward Execution Time (us) : 134.274 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M128_N128_K1_dim1_cuda # Input: M: 128, N: 128, K: 1, dim: 1, device: cuda Forward Execution Time (us) : 109.172 ... Reviewed By: hl475 Differential Revision: D18605640 fbshipit-source-id: 4ae9b91a50c4cdf1b161b6c5c58f365ba514050c	2019-11-19 16:15:46 -08:00
Will Feng	05a7aaa742	Pass Tensor instead of Tensor& to torch::nn functionals that can change input in place (#30112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30112 Currently, we have torch::nn functionals that takes `input` as `Tensor&` in order to be able to in-place change `input`'s value. We likely shouldn't do this because it will prevent the following use case: ```cpp F::elu(torch::tensor(1), F::ELUFuncOptions().inplace(true)) ``` The solution is to change the type of `input` to `Tensor`, so that we can pass an rvalue into the functional. Test Plan: Imported from OSS Differential Revision: D18601580 Pulled By: yf225 fbshipit-source-id: 639a86eb62f6c986b0f20bf7e201983e83126e73	2019-11-19 16:11:39 -08:00
nuka137	a75b669b0f	C++ API: torch::nn::ConvTranspose{1,2,3}d (#29721 ) Summary: Add torch::nn::ConvTranspose{1,2,3}d module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29721 Differential Revision: D18588943 Pulled By: yf225 fbshipit-source-id: d4dbb091389367e70459399d5cda3778325c2120	2019-11-19 16:04:12 -08:00
Jerry Zhang	c2e576e74b	Per channel quantization support in insert_prepack_unpack (#29701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29701 att Test Plan: python test/test_jit.py 'TestJit.test_insert_prepack_unpack' Imported from OSS Differential Revision: D18580908 fbshipit-source-id: 2d1ce9b6279586198cb53a7fd2a35325fa20bf20	2019-11-19 15:53:04 -08:00
Pritam Damania	63c957cd94	Use std::shared_ptr for DistAutogradContext. (#29770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29770 We were passing around const and non-const references for DistAutogradContext from DistAutogradContainer. This wasn't safe since the context could be deleted from the container and a thread might still be using the reference. This usually would happen when a backward pass fails on the node driving the backward pass (resulting in delete context messages being sent to all nodes) but other nodes are still executing code related to that autograd context. This was also the reason why `test_backward_autograd_engine_error` was flaky. Using a std::shared_ptr everywhere ensures we're safe and never crash. Closes #28928 Closes #26922 ghstack-source-id: 94201446 Differential Revision: D18494814 fbshipit-source-id: 0c925fdbd5755f6d876dad56885e2cbaf41fc5f0	2019-11-19 15:50:42 -08:00
Daya Khudia	79b797ccac	Build time warning on windows for fbgemm (#29062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29062 Build time warning ghstack-source-id: 94202405 Test Plan: None Reviewed By: jianyuh Differential Revision: D18279505 fbshipit-source-id: 873cdeb848d34849d6babc435b1a42171f0609a3	2019-11-19 14:30:20 -08:00
Shen Li	5aa50c7f3c	Enable test_nested_rref in rpc_test.py (#30100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30100 As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. Test Plan: Imported from OSS Differential Revision: D18597002 Pulled By: mrshenli fbshipit-source-id: 64aa6a59248e5d1b7e1ad1aebffb6a25248388d2	2019-11-19 13:28:05 -08:00
Shen Li	a243e0872e	Enable test_nested_remote in rpc_test.py (#30099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30099 As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. Test Plan: Imported from OSS Differential Revision: D18597003 Pulled By: mrshenli fbshipit-source-id: ebfb1f6f3f961d98351e06ce4b951793a9b95398	2019-11-19 13:28:01 -08:00
Shen Li	8912e6caf5	Enable test_nested_rpc in rpc_test.py (#30098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30098 As after #29827 we only test RPC using spawn, the multi-thread/fork error should disappear. Test Plan: Imported from OSS Differential Revision: D18597001 Pulled By: mrshenli fbshipit-source-id: 68256289085fac1a9ca76d5b4882e97e2f81d1f4	2019-11-19 13:27:57 -08:00
Jerry Zhang	a689e3a0c4	Support per channel quantization in insert_quant_dequant and fold_prepack (#29492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29492 Previously graph mode quantization only works for per tensor quantization, this PR added support for per channel quantization as well, changes include - insert per channel quantization calls(insert_quant_dequant) - add support of folding for prepacked per channel quantized weight (fold_prepack) Test Plan: test is not possible until we can script PerChannelObserver, which comes in https://github.com/pytorch/pytorch/pull/29416 we'll add test in a separate PR after that. Imported from OSS Differential Revision: D18580444 fbshipit-source-id: 347c07f201648ec49f070523642a9170278f8aa4	2019-11-19 12:25:28 -08:00
Mingzhe Li	0ab03d3283	only run embeddingbag benchmark on cpu (#30106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30106 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 1 --device cuda --operators embeddingbag # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all Reviewed By: hl475 Differential Revision: D18598198 fbshipit-source-id: 9b7d103410f1183fdf6776047ea2ef8dba4b7831	2019-11-19 12:07:34 -08:00
Michael Suo	4b0a6d299c	test reporting (#29658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29658 This PR makes our test scripts output artifacts that CircleCI can understand. This has a few benefits: 1. We can actually see failed tests and their output in the job screen (instead of having to scroll through logs) 2. We can use the CircleCI test metadata API to track failed tests programmatically. it looks like this (old ui): https://circleci.com/gh/pytorch/pytorch/3546584?pipelines-ui-opt-out or this (new ui): https://app.circleci.com/jobs/github/pytorch/pytorch/3546584/tests Test Plan: Imported from OSS Differential Revision: D18597261 Pulled By: suo fbshipit-source-id: 07fc7d26bbb834e13cc4cc0e48178645ae6579f5	2019-11-19 11:15:31 -08:00
Edward Yang	1dbc84ab6d	Remove unnecessary conditional (#29901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29901 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18594828 Pulled By: ezyang fbshipit-source-id: cf4ade2da9bf8769cfb3149713941aa9e5e0d197	2019-11-19 11:06:30 -08:00
Tao Xu	57acc2ff3a	add an unit test target to TestApp (#29962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29962 ### Summary Recently we've found that the master branch was constantly broken due to some unwanted change being landed on mobile. The problem is that our CI was not able to detect the runtime errors. Starting from this PR, we'll add some unit tests to the iOS Simulator build. As follows: 1. Add an unit test target to XCode (this PR) 2. Use Fastlane to run the tests on CI 3. Modify the CI scripts to trigger tests ### Test Plan - Don't break the existing CI jobs unless they are flaky. Test Plan: Imported from OSS Differential Revision: D18582908 Pulled By: xta0 fbshipit-source-id: f960c47d3bbda79e754a0513e8711867fd3588d2	2019-11-19 11:03:45 -08:00
Mingzhe Li	23991e89cc	change operator_range to work with lower and upper in op bench (#30096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30096 as title Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1 --operator_range a-a # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.quint8_contigTrue # Input: N: 2, dtype: torch.quint8, contig: True Forward Execution Time (us) : 22.251 # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.qint8_contigTrue # Input: N: 2, dtype: torch.qint8, contig: True Forward Execution Time (us) : 17.247 # Benchmarking PyTorch: add # Mode: Eager # Name: add_N2_dtypetorch.qint32_contigTrue # Input: N: 2, dtype: torch.qint32, contig: True Forward Execution Time (us) : 29.653 ... Reviewed By: hl475 Differential Revision: D18596447 fbshipit-source-id: eac8d9d90db244aa9799293c22bb0d30cf3edf58	2019-11-19 11:01:02 -08:00
David Riazati	dca123e76d	Add zipfile serialization (#29232 ) Summary: Stacked PRs * https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC * https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization This adds a serialization method that uses a zipfile (https://github.com/pytorch/pytorch/issues/26567). Right now it is guarded behind a flag `_use_new_zipfile_serialization`. In release mode it seems to have performance about the same / slightly better than the current serialization in some simple benchmarks for large/small tensors. Follow ups: * Flip the `_use_new_zipfile_serialization` flag Pull Request resolved: https://github.com/pytorch/pytorch/pull/29232 Differential Revision: D18332036 Pulled By: driazati fbshipit-source-id: 1bac0847c4d599612cba905f2cac8248783be2f4	2019-11-19 10:17:32 -08:00
Ailing Zhang	2b02d154db	Implement fast pass for CPU scalars /number literals (#29915 ) Summary: The main changes in this PR are: - skip device dispatch for CPU scalars (number literals also fall into this). In most cases scalars should be on CPU for best perf, but if users explicitly put on other device, we will respect that setting and exit fast pass. - directly manipulate Tensor data_ptr when filling scalar into a 1-element tensor. Some perf benchmark numbers: ``` ## Before In [4]: def test(x): ...: x = x + 2 ...: return x ...: In [5]: with torch.no_grad(): ...: x = torch.ones(100) ...: %timeit {test(x)} ...: 79.8 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ## After In [2]: def test(x): ...: x = x + 2 ...: return x ...: In [3]: with torch.no_grad(): ...: x = torch.ones(100) ...: %timeit {test(x)} ...: 60.5 µs ± 334 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` Before the patch `tensor_slow` took 15.74% of total time. <img width="1186" alt="Screen Shot 2019-11-15 at 12 49 51 PM" src="https://user-images.githubusercontent.com/5248122/68976895-cc808c00-07ab-11ea-8f3c-7f15597d12cf.png"> After the patch `tensor_slow` takes 3.84% of total time. <img width="1190" alt="Screen Shot 2019-11-15 at 1 13 03 PM" src="https://user-images.githubusercontent.com/5248122/68976925-e28e4c80-07ab-11ea-94c0-91172fc3bb53.png"> cc: roosephu who originally reported this issue to me. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29915 Differential Revision: D18584251 Pulled By: ailzhang fbshipit-source-id: 2353c8012450a81872e1e09717b3b181362be401	2019-11-19 10:14:38 -08:00
Suyash458	e88d096321	C++/Python API Parity: add AlphaDropout (#28424 ) Summary: - add `AlphaDropoutImpl` to `modules/dropout.h` and `modules/dropout.cpp` - add `functional/dropout.h` containing the `alpha_dropout` function - include `functional/dropout.h` in `nn/functional.h` - add functional and module tests - related issue https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28424 Differential Revision: D18589162 Pulled By: yf225 fbshipit-source-id: c85734e02431a6c052515e26b11ca30ad7303644	2019-11-19 10:05:51 -08:00
Mingzhe Li	1597f22982	fix device check in op bench (#30091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30091 as title Test Plan: ``` Before: buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test -- --device cuda # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 91.190 # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 27.062 After: # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 28.154 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M512_N512_cuda # Input: M: 512, N: 512, device: cuda Forward Execution Time (us) : 15.959 ... Reviewed By: hl475 Differential Revision: D18595176 fbshipit-source-id: 048c5b7b2a5318c3687412e12e8d2d5f380a8139	2019-11-19 10:05:47 -08:00
Peter Bell	37ca5a8a64	convert_sync_batchnorm should not convert _InstanceNorm instances (#29985 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29187 This introduces a new class `_NormBase` that `_InstanceNorm` and `_BatchNorm` inherit from separately. This means the `isinstance(module, _BatchNorm)` check won't falsely pass for `_InstanceNorm`. The suggested fix of adding `and not isinstance(module, _InstanceNorm)` works as well, but requires introducing a cyclic dependency between `instancenorm.py` and `batchnorm.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29985 Differential Revision: D18588104 Pulled By: yf225 fbshipit-source-id: f599da3b902ad9c56836db4d429bfc462ed51338	2019-11-19 09:39:36 -08:00
Lara Haidar	45024e7a35	Support Exporting Bitshift to ONNX (#28210 ) Summary: Support exporting left/right bitshifts to ONNX for all opset versions. ONNX has a bitshift operator in opset 11, but it only supports unsigned ints, so it can't be used in PyTorch (since only uint8 is the only uint type). Pull Request resolved: https://github.com/pytorch/pytorch/pull/28210 Reviewed By: hl475 Differential Revision: D18575512 Pulled By: houseroad fbshipit-source-id: 74161db67f599996a0614981edcc171af6780d21	2019-11-19 09:25:50 -08:00
Dylan Bespalko	a3494bd56b	CPU-Strided-Complex Fixes for real and imag ops (#29840 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) - [x] Replaced std:real(a) with a.real() in kernel level code. - [x] Fixed Vec256_base implementation of complex ops so that it works correctly on Non-AVX devices. - [x] Fix NumericUtils.h cc: iotamudelta, ezyang, bddppq, zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/29840 Differential Revision: D18531274 Pulled By: ezyang fbshipit-source-id: 0fa842c68e4bd55134fe0271880e2d15fe692b7f	2019-11-19 09:21:44 -08:00
Edward Yang	7d287688eb	Revert D5689636: Add RpcAgentTestFixture to extract duplicate code Test Plan: revert-hammer Differential Revision: D5689636 Original commit changeset: f35eea1359ad fbshipit-source-id: 31928fce5e96b3beceefbc9a03f54769f10b7e1a	2019-11-19 08:14:44 -08:00
Edward Yang	1dda8186ae	Revert D18549919: Add `RpcAgentOptions` struct type, which bundles different required arguments for different `RpcAgent`s Test Plan: revert-hammer Differential Revision: D18549919 Original commit changeset: b9f3f1a41d1f fbshipit-source-id: 2d5e578d18c0725b59eb99a0e942fbf7fe3341ee	2019-11-19 08:14:40 -08:00
Yanli Zhao	861ef05015	Remove rpc fork and dist autograd fork tests from PyTorch repo (#29827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29827 There are known issues for "fork tests + OMP" in Pytorch, rpc and dist autograd tests use OMP thread pools, this caused rpc fork and dist autograd fork tests to be flaky. So remove these fork tests from PyTorch repo. rpc spawn and dist autograd spawn tests are still running. Test Plan: unit tests Differential Revision: D18507384 fbshipit-source-id: 9e239f13850832b4b84724828537f73512f3fca9	2019-11-19 07:02:59 -08:00
Rohan Varma	83513506c3	poll for timed out futures in process group agent (#29601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29601 Follow up from https://github.com/pytorch/pytorch/pull/28392. Adds a background thread to `ProcessGroupAgent` that polls for timed out RPCs at a pre-set interval, and marks them as completed with a timeout exception if they have timed out. Also deletes the futures from the corresponding maps `futures_` and `futureTimeouts`. Unit tests are added to ensure that timed out RPCs are appropriately cleaned up. Also adds a `shutdown` variable to process group agent to control the shutting down of this background thread, which can eventually be extended to use for controlling a clean shutdown of process group agent. ghstack-source-id: 94175131 Test Plan: Added unit tests Differential Revision: D18434215 fbshipit-source-id: c48abdb8759fe1447200ec66bb9d4b1c50ec4535	2019-11-19 06:42:04 -08:00
Shihao Xu	21dc1d4543	Add `RpcAgentOptions` struct type, which bundles different required arguments for different `RpcAgent`s (#29972 ) Summary: https://github.com/pytorch/pytorch/pull/28226 introduced `worker_to_id` arg to the `def init_rpc` function for other `RpcAgent`. While it's not really used by `ProcessGroupAgent`. Cleanup is wanted for this, as described in https://github.com/pytorch/pytorch/issues/29031. To adapt to the difference of different `RpcAgent`, adding a `RpcAgentOptions` base classes, which allow leveraging inheritance to add extra fields. closes https://github.com/pytorch/pytorch/issues/29031 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29972 Differential Revision: D18549919 Pulled By: xush6528 fbshipit-source-id: b9f3f1a41d1ff18498734081870820b055d56f5b	2019-11-19 01:00:08 -08:00
Summer Deng	82b6300fea	Disable openmp in static and dynamic histograms (#30072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30072 Fix the test failure with mode/opt-lto by disabling openmp in both static and dynamic histograms. We will just use single thread in histogram processing as it's the common use case. Test Plan: ``` buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger/workflows:int8_static_quantization_exporter -- --model-dir /mnt/public/summerdeng/ads/ --model-name downsized_ins_97293388_0.predictor --run --iter 10 --dataset-path /mnt/public/summerdeng/ads/ctr_instagram_story_int8/dataset/train/dataset_115764229_10 --hive-path="hive://ad_delivery/ig_ad_prefiltered_training_data_orc_injected/ds=2019-09-09/pipeline=ctr_instagram_story_click_only_model_opt_out_df" --collect-histogram --activation-histogram-file=/mnt/public/summerdeng/ads/ctr_instagram_story_int8/activation_histograms/dummy_debug_OOM.txt ``` ``` buck test mode/opt-lto caffe2/caffe2/quantization/server:dynamic_histogram_test -- --run-disabled ``` Reviewed By: hx89 Differential Revision: D18554614 fbshipit-source-id: cfff51174154e753b7123b4ec502b88ffc508917	2019-11-19 00:32:46 -08:00
Natalia Gimelshein	a9ad2e2f00	fix batch norm for empty inputs (#30035 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/29578 Shape check is moved up as much as possible, because backends by and large don't correctly handle empty inputs, so check needs to be done before backend selection. That also automatically takes care of backward, because forward for empty input is automatically differentiable, so no backend-specific backward routines are ever called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30035 Test Plan: tests for empty inputs are added. Differential Revision: D18584427 Pulled By: ngimel fbshipit-source-id: a42918f50eb1f6995921aafa92879cd42dd5e9e1	2019-11-18 23:08:12 -08:00
Martin Yuan	c272758b43	Mobile module forward() pass input by value. (#30060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30060 Mobile forward() passed inputs by reference, which is different from JIT's script::module. To make it consistent, change it pass by value. Test Plan: Imported from OSS Differential Revision: D18587786 Pulled By: iseeyuan fbshipit-source-id: fa398124fd0a5168f708733ff88f0ba327726f43	2019-11-18 22:33:38 -08:00
neginraoof	267fd4a06c	Fix for batch norm 2D with affine=False (#29458 ) Summary: This is a fix for batch norm 2D with affine=False. Repro: https://github.com/pytorch/pytorch/issues/29271 Error is because the output of the unsqueeze op does not have scalar type information. So I moved the references to scalar type after the unsqueeze line. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29458 Reviewed By: hl475 Differential Revision: D18400975 Pulled By: houseroad fbshipit-source-id: f5c5633857c584edcef3b9e9946861dcfccccd75	2019-11-18 21:52:11 -08:00
Vitaly Fedyunin	a4f60b64dc	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29391 Test Plan: Imported from OSS Differential Revision: D18429726 Pulled By: VitalyFedyunin fbshipit-source-id: 07dfff568ad776cf792122913530566d53be55fa	2019-11-18 21:47:52 -08:00
Vitaly Fedyunin	2dba553990	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29390 Test Plan: Imported from OSS Differential Revision: D18429722 Pulled By: VitalyFedyunin fbshipit-source-id: e5f40da1550b4316e9c4725adbdf557c832b7563	2019-11-18 21:47:47 -08:00
Vitaly Fedyunin	3045b2a366	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29389 Test Plan: Imported from OSS Differential Revision: D18429731 Pulled By: VitalyFedyunin fbshipit-source-id: 99ee8ae11fbaf05c91903d7df7622c90369ce7ce	2019-11-18 21:47:43 -08:00
Vitaly Fedyunin	735517fa87	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29388 Test Plan: Imported from OSS Differential Revision: D18429725 Pulled By: VitalyFedyunin fbshipit-source-id: 6b7662874e229e6fb0d4bbcf32ec15fc824d6118	2019-11-18 21:47:39 -08:00
Mingzhe Li	5b15f32697	rename benchmark_all_other_test (#30048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30048 as title (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_other_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 142.032 ... Reviewed By: hl475 Differential Revision: D18580754 fbshipit-source-id: 125482d2987cbdb1d019ccedf56a9da5a7cebaba	2019-11-18 21:39:31 -08:00
Alisson Gusatti Azzolini	97156f548d	Add hash and equality operators for WorkerInfo (#29958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29958 DistributedOptimizer relies on hashing WorkerInfo in order to coalesce fan-out RPCs. This will likely be a very common use case (EASGD will do the same, for example). ghstack-source-id: 94169198 Test Plan: unit test. Differential Revision: D18548257 fbshipit-source-id: 7d67d4e1b9bc60403c372164982a75ae8c1d8389	2019-11-18 20:47:13 -08:00
Mingzhe Li	8b9bac1fad	add operator-range argument to the op bench (#30051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30051 This argument takes hyphen delimited start and end chars to filter operators. If the first character of an operator is in the start and end range, it will be tested. Otherwise skipped. (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range b-c # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ceil # Mode: Eager # Name: ceil_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 110.720 # Benchmarking PyTorch: ceil_ # Mode: Eager # Name: ceil__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 51.128 ... buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator_range None # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 107.113 # Benchmarking PyTorch: abs_ # Mode: Eager # Name: abs__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 54.259 ... Reviewed By: hl475 Differential Revision: D18581910 fbshipit-source-id: b1a1a7ba76f4d6a61c8a1659f15e9c66097654d4	2019-11-18 20:34:43 -08:00
Mingzhe Li	64706e0a74	change conv, batchnorm input shapes (#30041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30041 as title (Note: this ignores all push blocking failures!) Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 751635.354 Reviewed By: hl475 Differential Revision: D18579767 fbshipit-source-id: 53bfac704828a836412434a66000c17f6ac1c727	2019-11-18 20:34:28 -08:00
Mingzhe Li	3250d5008f	change the starting iters to reduce execution time (#30040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30040 The benchmark will run each test in a loop of 200 iters, then keep doubling the number of iters until the time is significant. For operators which have very large input shapes, the initial 200 iters will take too much time which is not really necessary. This diff changed that 200 to 100. (Note: this ignores all push blocking failures!) Test Plan: ``` Before # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 729634.577 After # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : None # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_in_c512_out_c512_kernel3_stride2_N8_H64_W64_cpu # Input: in_c: 512, out_c: 512, kernel: 3, stride: 2, N: 8, H: 64, W: 64, device: cpu Forward Execution Time (us) : 718315.899 Reviewed By: hl475 Differential Revision: D18579588 fbshipit-source-id: ef52474cf77e7549bbab0a9ae7b1b0c04023d208	2019-11-18 20:34:16 -08:00
Will Feng	3bd0f476d4	Revert D18233037: C++ API parity: isfinite Test Plan: revert-hammer Differential Revision: D18233037 Original commit changeset: c76b9467bbc1 fbshipit-source-id: 97d2cfa9de767a8c3a0ca919f9d768e959fa484e	2019-11-18 20:26:19 -08:00
Pritam Damania	63f4b607aa	Ensure initializedContextIds_ map is cleaned up appropriately in DistEngine. (#29787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29787 The initializedContextIds_ map was never cleaned up in DistEngine and kept on growing as we continue to run backward passes. To fix this, in this PR we ensure that the context id is cleaned up from this map once we are done with the backward pass. Closes #29083 ghstack-source-id: 94161770 Test Plan: waitforbuildbot Differential Revision: D18498937 fbshipit-source-id: 8d31fc066f6994627766f2b6ca36efa1bef89840	2019-11-18 20:11:18 -08:00
Xingying Cheng	26dabad5a4	Add LiteModule java class for lite interpreter. (#30061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30061 Create INativePeer Interface and move NativePeer class from Module.java. Create LiteModuleLoader and LiteNativePeer.java for Lite Interpreter binding. ghstack-source-id: 94169187 Reviewed By: dreiss Differential Revision: D18511688 fbshipit-source-id: 1a69c94b28c8a02631f53079ca7ddcaa57eca38f	2019-11-18 19:53:20 -08:00
svcscm	a1fc46d2b5	Updating submodules Summary: GitHub commits: `385acc503c` `b35b183e45` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: c7eccd88c804f1afd1db8d52221665b87ab51837	2019-11-18 19:09:52 -08:00
Pavel Belevich	8df5e10ee9	C++ API parity: isfinite Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28918 Test Plan: Imported from OSS Differential Revision: D18233037 Pulled By: pbelevich fbshipit-source-id: c76b9467bbc1fbb2c9bf49855895c98438b36c12	2019-11-18 19:06:57 -08:00
Pritam Damania	5d69bc1eda	Add docs for distributed optimizer. (#29971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29971 ghstack-source-id: 94132160 Test Plan: waitforbuildbot Differential Revision: D18554631 fbshipit-source-id: c4485f7cff5159f423d0f35d1caf71074b62dc28	2019-11-18 18:51:26 -08:00
Xingying Cheng	4f94aed8a3	Reformatting module class. (#29957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29957 Reformatting module class. ghstack-source-id: 94058645 Test Plan: buck build xplat/caffe2/android:pytorch Reviewed By: iseeyuan Differential Revision: D18548185 fbshipit-source-id: 8c1f5cbf491d42915e091e6245b4f308eb162f93	2019-11-18 18:39:29 -08:00
Pritam Damania	ab93b3df60	Polish distributed autograd docs. (#29942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29942 1) Added links to the design. 2) Fixed function signautres. 3) Expanded examples ghstack-source-id: 94162372 Test Plan: waitforbuildbot Differential Revision: D18547103 fbshipit-source-id: 067ba166c107ed14085af8ee3306d3f8a9dcebe7	2019-11-18 18:13:08 -08:00
Pritam Damania	df6a1c0437	Remove rpc.sync_rpc from the public API. (#30033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30033 Removing this API for now since we don't have a concrete use-case for this yet and as a result exposing this as a public API might result in users depending on this API. We can always add some variant of this API back if needed later. ghstack-source-id: 94138302 Test Plan: waitforbuildbot Differential Revision: D18578056 fbshipit-source-id: 078c62331725e03bd5702624afc16b1cdcdf26a4	2019-11-18 18:02:07 -08:00
jiej	905792af1f	disabling persistent mode for cuDNN BN on NCHW (#30031 ) Summary: This is to help the bisecting for unstable convergence that https://github.com/pytorch/pytorch/issues/29997 targets, comparing to the other PR, this one is a smaller hammer (few lines of code change) and would facilitate our future repro/fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30031 Differential Revision: D18577624 Pulled By: VitalyFedyunin fbshipit-source-id: 92a76cf5db24b25105395f80086d90d8e51dcc4b	2019-11-18 17:28:27 -08:00
jiej	9c7e604c60	SyncBatchNorm Update on input dimension checks (#29626 ) Summary: update the requirements on input dimensions for `torch.nn.SyncBatchNorm`: 1. 2D inputs is now permissible, https://github.com/pytorch/pytorch/issues/20204 ; 2. requires at least two element along normalization plane (BatchNorm behavior); Pull Request resolved: https://github.com/pytorch/pytorch/pull/29626 Differential Revision: D18492531 Pulled By: albanD fbshipit-source-id: f008e46a2d520d73c3c2730890a7424eba2ede9e	2019-11-18 16:09:51 -08:00
Mengshi Zhang	5b6dd52e3c	Build Unit Test of SparseRAdam Summary: We added caffe2 python wrapper and unit test for the SparseRAdam C++ operator. Test Plan: Unit test is constructed following the design pattern of [Wngrad optimizer](https://our.intern.facebook.com/intern/diff/D8655724/). Test passed smoothly. buck test //caffe2/caffe2/python:optimizer_test -- TestSparseRAdam Test result: {F221144048} Reviewed By: wx1988 Differential Revision: D18330650 fbshipit-source-id: e0f4724c2b616b665e2a0fe2e5c3430696cca7ee	2019-11-18 15:22:37 -08:00
Jerry Zhang	64cdc648da	fix submodule traversal in FoldPrepackedWeightIntoModule (#29925 ) Summary: similar to https://github.com/pytorch/pytorch/pull/29914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29925 Differential Revision: D18548029 Pulled By: jerryzh168 fbshipit-source-id: 7b36133454c5190be19380bf125203807ea0b129	2019-11-18 13:34:45 -08:00
svcscm	b4f33c1c21	Updating submodules Summary: GitHub commits: `3acb25f216` `b830bffa96` `fc7064cb4e` `aa3975852e` `b2a3d8944d` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 34571d2a94a8fd93d744dc58a0ba7681f3fdc6b2	2019-11-18 13:08:32 -08:00
Shihao Xu	8dd67057f1	Add RpcAgentTestFixture to extract duplicate code (#29747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29747 There are duplicate code for component that rely on RpcAgent. Extract them into a re-usable test fixture class. Test Plan: ### RPC + RRef ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck test mode/dev-nosan //caffe2/test:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift ``` ### Dist Autograd ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift buck test mode/dev-nosan //caffe2/test:dist_autograd_spawn_thrift ``` ### Dist Optimizer ``` buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn ``` ``` buck test mode/dev-nosan //caffe2/test:dist_optimizer_fork_thrift buck test mode/dev-nosan //caffe2/test:dist_optimizer_spawn_thrift ``` Differential Revision: D5689636 fbshipit-source-id: f35eea1359addaaac9bd8d00d0a5df228a236511	2019-11-18 12:54:17 -08:00
Shen Li	6d6380fd4e	Update CODEOWNERS for distributed and rpc modules Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29988 Test Plan: Imported from OSS Differential Revision: D18576548 Pulled By: mrshenli fbshipit-source-id: 1170b6970727c9698b6fdbf0c40fc317d17ea8ea	2019-11-18 12:45:52 -08:00
Brian Vaughan	adfb8a4888	Fix bug in atomicAdd for int16_t (#29231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29231 Fixes: https://github.com/pytorch/pytorch/issues/29153 Bug is that atomicAdd doesn't correctly add values for some dtypes due to incorrect casting. Was returning zeros. Incorrect behavior before this PR: ``` In [23]: sparse=torch.sparse_coo_tensor(indices=torch.tensor([[0,0],[1,1]]), values=torch.tensor([5, 6], dtype=torch.int16), size=(2,2), device='cuda', dtype=torch.int16 ) In [24]: sparse Out[24]: tensor(indices=tensor([[0, 0], [1, 1]]), values=tensor([5, 6]), device='cuda:0', size=(2, 2), nnz=2, dtype=torch.int16, layout=torch.sparse_coo) In [25]: sparse.coalesce() Out[25]: tensor(indices=tensor([[0], [1]]), values=tensor([11]), device='cuda:0', size=(2, 2), nnz=1, dtype=torch.int16, layout=torch.sparse_coo) In [26]: sparse.to_dense() Out[26]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) In [27]: sparse.coalesce().to_dense() Out[27]: tensor([[ 0, 11], [ 0, 0]], device='cuda:0', dtype=torch.int16) In [30]: torch.add(torch.zeros([2,2],dtype=torch.int16, device='cuda'), sparse) Out[30]: tensor([[0, 0], [0, 0]], device='cuda:0', dtype=torch.int16) ``` Test Plan: Imported from OSS Differential Revision: D18575666 Pulled By: nairbv fbshipit-source-id: 9b193b386bf4a9615014aa890d2e9f4f694940ac	2019-11-18 12:42:02 -08:00
Junjie Bai	45e980a243	Skip broken test test_cuda_kernel_loop_overflow_large (#30021 ) Summary: The previous "expectedFailure" decoration has broken ROCm CI https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/7674//console ``` 16:23:52 test_cuda_kernel_loop_overflow_large (__main__.TestCuda) ... unexpected success ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30021 Differential Revision: D18574931 Pulled By: bddppq fbshipit-source-id: 7b5240f9f3a610adda633f8b0dd9137e40b12e2f	2019-11-18 12:38:37 -08:00
Mingzhe Li	189b24ebe9	reorganize test binaries of op bench (#30023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30023 This diff doesn't change how users run the benchmarks. But under the hood, we group all the tests into three groups: unary test, quantized test, and the rest ops (we name it others here). Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 17914.301 ... # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu_bwd2 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 66525.855 ... # Benchmarking PyTorch: mul # Mode: Eager # Name: mul_N2_dtypetorch.qint32_contigTrue # Input: N: 2, dtype: torch.qint32, contig: True Forward Execution Time (us) : 290.555 ... Reviewed By: hl475 Differential Revision: D18574719 fbshipit-source-id: f7ff1d952031129adde51ebf002e4891bd484680	2019-11-18 12:21:26 -08:00
Supriya Rao	91c6d2e51c	Add support for quantized operator conversion from PT to C2 via ONNX (#29694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29694 This PR adds preliminary support required to be able to run quantized pytorch models on a C2 backend. For quantized ops we use a custom domain name 'caffe2' to register the ops if they are in the "quantized" namespace. The change also adds JIT pass to unpack the quantized weights and insert the unpacked values into the graph. The actual tensor values are looked up from the params dict. Test Plan: python test/onnx/test_pytorch_onnx_caffe2.py TestQuantizedOps Imported from OSS Reviewed By: houseroad Differential Revision: D18467130 fbshipit-source-id: 53ebd8c43935f7d7e74305dad6c231a2247df176	2019-11-18 12:12:40 -08:00
Lei Zhang	b45069b59f	fix fc fp16 quantization (#29469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29469 The original approach is to save both fp16 and fp32 for all models, which increased the filesize and memory. This diff is to save 'used' blobs into predictor file. Test Plan: fc clone workflow : f149878151 ctr mbl feed test with fc fp16 quantization: f149996395 No fp32 in local file {F221750392} QRT after the fix: https://fburl.com/qrt/cp8r8263 Reviewed By: wx1988 Differential Revision: D18382503 fbshipit-source-id: 231c41668f25b1d35ca8d4358ce9b12ba60a4f91	2019-11-18 11:26:49 -08:00
Mengshi Zhang	a3ee504c33	Integrate RAdam to SparseAdamOp Summary: T53944549 aims to integrate [`RAdam`](https://arxiv.org/pdf/1908.03265.pdf) optimizer to `Adam`. In this diff, we first try to integrate `RAdam` to `SparseAdamOp` on CPU platform. Note that `adam_op.cc` and `adam_op_gpu.cu` may be implemented in other diffs. The implementation of `RAdam` follows the algorithm below: {F220259279} The algorithm of [`Adam`](https://arxiv.org/pdf/1412.6980.pdf) is attached: {F220389971} Test Plan: Run `buck build caffe2` successfully. Reviewed By: wx1988 Differential Revision: D18239578 fbshipit-source-id: fdc028261ee20986cae1f30f1d26d8705587331a	2019-11-18 10:20:01 -08:00
Will Feng	82682b3e96	Revert D18531481: Remove input_channels / output_channels / with_bias from ConvOptions Test Plan: revert-hammer Differential Revision: D18531481 Original commit changeset: e48d9e8cf110 fbshipit-source-id: a233425cc10278552674c48b6b577ef53fca0632	2019-11-18 09:10:54 -08:00
Edward Yang	f6cadad174	Delete redefinitions of methods in Variable already present on Tensor. (#29667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29667 Some previous implementations are defined in native_functions.yaml. In this case, I don't define them explicitly in Tensor; instead they are placed in VariableTypeManual.cpp. When I did this, I would have deleted documentation; instead, this documentation was moved to native_functions.yaml This also replaces `current_version` with just `_version`. This is a carved out portion of #28287, rebased past Tensor-Variable merge. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18504934 Pulled By: ezyang fbshipit-source-id: be7adf45b637daffe2b0b1631eb31d967525fc31	2019-11-18 08:12:16 -08:00
Edward Yang	1ab2f043ba	Move most methods off Variable into torch::autograd::impl functions. (#29665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29665 Our intention is to merge the static distinction between Tensor and Variable. Ordinarily, this would entail merging the methods of Tensor and Variable. But there are a lot of "private"-ish methods on Variable that we don't actually want to dump onto the Tensor class. So, as prep work, we move all of those methods off of Variable and into the torch::autograd::impl namespace (impl as in, please don't use this end users). This ends up being a fairly large patch because all of the call sites have to play ball too. While I was on the topic, I also moved any of the touched functions into the C++ file, so that modifying them would not trigger a recompilation of all of torch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18496169 Pulled By: ezyang fbshipit-source-id: afb203252620ec274be596b3e7b1d84d321bad3a	2019-11-18 08:12:12 -08:00
SsnL	38340f59fd	randint accept generator=None (#29748 ) Summary: This PR fixes the inconsistent behavior of `randint`'s `generator=` kwarg. It does not accept `None`, which is inconsistent with how other random functions behave: ``` In [12]: torch.randint(0, 4, size=(2,3), generator=torch.Generator()) Out[12]: tensor([[2, 0, 1], [0, 1, 3]]) In [13]: torch.randint(0, 4, size=(2,3), generator=None) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-13-a6bc6525a1e1> in <module> ----> 1 torch.randint(0, 4, size=(2,3), generator=None) TypeError: randint() received an invalid combination of arguments - got (int, int, generator=NoneType, size=tuple), but expected one of: * (int high, tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) * (int high, tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) * (int low, int high, tuple of ints size, torch.Generator generator, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) * (int low, int high, tuple of ints size, Tensor out, torch.dtype dtype, torch.layout layout, torch.device device, bool requires_grad) ``` Other random functions work fine: ``` In [9]: torch.bernoulli(torch.ones(3)) Out[9]: tensor([1., 1., 1.]) In [10]: torch.bernoulli(torch.ones(3), generator=None) Out[10]: tensor([1., 1., 1.]) ``` This PR also documents the `generator=` kwarg, and fixes https://github.com/pytorch/pytorch/issues/29683 since it's a related easy fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29748 Differential Revision: D18529951 Pulled By: ezyang fbshipit-source-id: e956cc989decc94e9483fd4a30f9255240d7c07e	2019-11-18 08:07:29 -08:00
MrTsepa	94016b153a	Fix typo in documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29755 Differential Revision: D18529963 Pulled By: ezyang fbshipit-source-id: 8d9100f00c46238fa3210944864b1d178717499f	2019-11-18 07:44:12 -08:00
Edward Yang	a573f8f7d7	Disable broken test_cuda_kernel_loop_overflow_large test (#29904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29904 See https://github.com/pytorch/pytorch/issues/26838 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18539740 Pulled By: ezyang fbshipit-source-id: c3dcaaa0d8eedcfa4173c2b6ec139090bdace4b4	2019-11-18 07:38:34 -08:00
svcscm	7782f4bc50	Updating submodules Summary: GitHub commits: `ea4aa9fc07` `54e6aa5568` `da41ae5048` `da70fce0d3` `0bec77c2d2` `09fd20898f` `b47c7f5c77` `5762809397` `241c174631` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 1739f00a0f1e4ffe4b5ebb9e6f5dce403a5adf8b	2019-11-18 07:09:35 -08:00
Edward Yang	0e5200adfe	Refactor target_compile_options into torch_compile_options (#29730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29730 Back in the day, Caffe2 had a good idea: instead of spattering target_compile_options all over the codebase, define a helper function which sets all the options for a target. This is especially helpful if I want to split libtorch.so into libtorch_cpu.so and libtorch_cuda.so; I need a way to easily apply options to multiple targets. A shared helper function is just the ticket. I moved every target_compile_options call in caffe2/CMakeLists.txt that didn't seem target dependent (exclusions included OpenMP flags, API-related macros, ONNX related macros and HIP flags) into torch_compile_options. I slavishly preserved the structure: there's a nearly redundant WERROR if() in the output but I preserved it. There is one thing I don't like about this, which is that now the compile options are off in a random directory that no one would expect. But c'est la vie... Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18571166 Pulled By: ezyang fbshipit-source-id: 21cd5f7663485077600782078fbb1787fab09035	2019-11-18 07:05:48 -08:00
Edward Yang	1381301d46	Remove AT_LINK_STYLE entirely. (#29729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29729 It already errored when you built with CUDA/HIP support as no longer supported; now I expunge it entirely. Along the way, I delete useless INTERFACE libraries (which aren't used anywhere else in the cmake.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18571167 Pulled By: ezyang fbshipit-source-id: f88c73a16fad3b61eaa7745a2d15514c68704bec	2019-11-18 07:05:43 -08:00
Rohan Varma	639133d6d1	rename init_model_parallel to init_rpc (#29762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29762 Rename this API as discussed, since it's use cases extend beyond only model parallelism. ghstack-source-id: 94020627 Test Plan: Unit tests pass Differential Revision: D18491743 fbshipit-source-id: d07676bb14f072c64da0ce99ee818bcc582efc57	2019-11-18 06:07:44 -08:00
Vitaly Fedyunin	5f510374e7	Add torch.memory_format support to the TorchScript Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28544 Test Plan: Imported from OSS Differential Revision: D18093801 Pulled By: VitalyFedyunin fbshipit-source-id: 2c82a1508da50a24825b44939434d86546cf1e19	2019-11-18 05:35:49 -08:00
Vitaly Fedyunin	cb43170dcb	Add memory format support to the `resize_` op. (#28292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28292 Allows to simplify patterns like: 1. output.resize_({sizeB, sizeC, osizeH, osizeW}).as_strided_({sizeB, sizeC, osizeH, osizeW}, {sizeCosizeHosizeW, 1, osizeW*sizeC, sizeC}); 2. output.resize_({nbatch, nInputPlane, outputHeight, outputWidth}); indices.resize_({nbatch, nInputPlane, outputHeight, outputWidth}); output.unsafeGetTensorImpl()->empty_tensor_restride(memory_format); indices.unsafeGetTensorImpl()->empty_tensor_restride(memory_format); 3. gradInput.resize_as_(input); gradInput.unsafeGetTensorImpl()->empty_tensor_restride(memory_format); Test Plan: Imported from OSS Differential Revision: D18044978 Pulled By: VitalyFedyunin fbshipit-source-id: bbf67c25f9cf88bc6e949089a3b247df50f86dc4	2019-11-18 05:35:44 -08:00
Vitaly Fedyunin	a7df36964c	TensroIterator preserve format for binary, ternary operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28291 Test Plan: Imported from OSS Differential Revision: D18044977 Pulled By: VitalyFedyunin fbshipit-source-id: 793bab47d8cfc1b0d6229f1b0688352ee94c3e48	2019-11-18 05:35:40 -08:00
Vitaly Fedyunin	b80c4f60fb	Add channels last support to cuda.comm.scatter and gather Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28077 Test Plan: Imported from OSS Differential Revision: D17980305 Pulled By: VitalyFedyunin fbshipit-source-id: e4741194baac3d93f2d53724582dc4c38f82ee84	2019-11-18 05:35:35 -08:00
Vitaly Fedyunin	026a2a4ec4	Kill `operator==` of TensorOptions as confusing one Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28076 Test Plan: Imported from OSS Differential Revision: D17980306 Pulled By: VitalyFedyunin fbshipit-source-id: 2f206d5069ce0bd828d4e96f2e98cf2baa1dfec7	2019-11-18 05:35:29 -08:00
Vitaly Fedyunin	9f3b347874	Add memory format support to `resize_as_` operator (#27979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27979 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980311 Pulled By: VitalyFedyunin fbshipit-source-id: 12d013521091fcc9c045833577f6dc78d7b1e68f	2019-11-18 05:35:23 -08:00
svcscm	a3588b6ed9	Updating submodules Summary: GitHub commits: `62c3b48cf4` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 41d1346f2405bce84984b02e3a951bb0e30868b7	2019-11-18 05:35:17 -08:00
svcscm	bb217eee98	Updating submodules Summary: GitHub commits: `4624a94bf7` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 128a7f6b1e3207bcea19925e1709b0ecc0c957ab	2019-11-17 23:13:14 -08:00
James Reed	18bdf97dbb	Factor Module into Object and Module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29500 Test Plan: Imported from OSS Differential Revision: D18463064 Pulled By: jamesr66a fbshipit-source-id: d37bef242a8626593d4b8754042152cfc0f0acb2	2019-11-17 22:58:50 -08:00
svcscm	14946a8891	Updating submodules Summary: GitHub commits: `eeb38ffd62` `f27f096824` `d5c51096af` `76432027c0` `e6135854c5` `83800eae9a` `5a5b563db5` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 1eab55cd73b143acedfad7bf6fcad44b8a2cc12e	2019-11-17 18:38:06 -08:00
svcscm	6bf87dae90	Updating submodules Summary: GitHub commits: `ead3bceee0` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: cefa462dc00d8e9d43474689042cc5043c99644f	2019-11-17 18:38:01 -08:00
svcscm	2b5213d94c	Updating submodules Summary: GitHub commits: `f163b30ade` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 5350123e471a38585893e75adffbdedd05f72167	2019-11-17 02:20:28 -08:00
Martin Yuan	b011461c9f	Add missing operators for pytext, v2 (#29970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29970 Add operators and JMP instruction used in PyText model in lite interpreter. Test Plan: Imported from OSS Differential Revision: D18555483 fbshipit-source-id: e5124d908762f78fb548505aecf33be8c8503275	2019-11-16 23:59:12 -08:00
Martin Yuan	6980cb2519	Add overload name to JIT prim operators, version 2 (#29960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29960 Overload name is required in mobile operators with the same name but different schema. Since it's not used in JIT, it's safe to add overload names for JIT operators. Test Plan: Imported from OSS Differential Revision: D18555484 fbshipit-source-id: b451379af24e255d8b0c61b964ae32fd1a64ed34	2019-11-16 23:59:07 -08:00
Will Feng	689b4bea7b	torch::nn::GLU and F::glu (#29922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29922 * #29920 [C++ API] torch::nn::GroupNorm and F::group_norm Test Plan: Imported from OSS Differential Revision: D18558818 Pulled By: yf225 fbshipit-source-id: ff80d634309fcb55f53db8dcf86eb9cf8161b37e	2019-11-16 21:03:38 -08:00
Will Feng	d5bf51b684	torch::nn::GroupNorm and F::group_norm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29920 Test Plan: Imported from OSS Differential Revision: D18539314 Pulled By: yf225 fbshipit-source-id: dabbbaac31796fe7bfde02487737971bde699c1c	2019-11-16 19:22:11 -08:00
svcscm	93c5d79953	Updating submodules Summary: GitHub commits: `56fc7ed20e` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 6d48ebb6f5b631a7b3e5bcd96fa2fa92ca6c1ba5	2019-11-16 13:23:45 -08:00
Linbin Yu	30d37e82db	Revert D18521937: Enable full error message for mobile builds Test Plan: revert-hammer Differential Revision: D18521937 Original commit changeset: 99673b60a03d fbshipit-source-id: 1946982201e4a21015bc9cd8abaa64a68ff8774f	2019-11-16 12:20:27 -08:00
PyExtreme	e1d13f4f8b	C++ API parity: NLLLoss & CrossEntropyLoss (#29812 ) Summary: Hi yf225 , I have added NLLLoss and CrossEntropyLoss. ``` Also, while using log_softmax in cross_entropy_loss, I am getting an error ../caffe2/../torch/csrc/api/include/torch/nn/functional/loss.h:537:63: error: no matching function for call to log_softmax(const at::Tensor&)’ const Tensor& log_softmax_input = torch::log_softmax(input); aten/src/ATen/Functions.h:5551:22: note: candidate: at::Tensor at::log_softmax(const at::Tensor&, int64_t, c10::optional<c10::ScalarType>) static inline Tensor log_softmax(const Tensor & self, int64_t dim, c10::optional<ScalarType> dtype) { ^~~~~~~~~~~ aten/src/ATen/Functions.h:5551:22: note: candidate expects 3 arguments, 1 provided ``` I think the other two parameters should be optional as in python frontend(shown in documentation here at https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.log_softmax ). Rest, there were no errors in build and tests have passed Pull Request resolved: https://github.com/pytorch/pytorch/pull/29812 Differential Revision: D18548249 Pulled By: yf225 fbshipit-source-id: 2ab350abd2a6f498d4dba2345f51ad87471f3038	2019-11-16 10:49:09 -08:00
Will Feng	890a3f8b8d	Remove input_channels / output_channels / with_bias from ConvOptions (#29838 ) Summary: Since torchvision is not using input_channels / output_channels / with_bias in ConvOptions anymore (https://github.com/pytorch/vision/pull/1576), we can remove the bridges now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29838 Differential Revision: D18531481 Pulled By: yf225 fbshipit-source-id: e48d9e8cf110095f83d9ed18b9fec020ec725f3e	2019-11-16 10:46:50 -08:00
Xiaomeng Yang	0995929971	Improve legacy QuantizedLinear functions to reduce overhead (#29773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29773 Improve legacy QuantizedLinear functions to reduce overhead. Separate from the stack of D18381988. Test Plan: buck test mode/dev-nosan //caffe2/test:jit -- "quant" Reviewed By: lly-zero-one Differential Revision: D18494988 fbshipit-source-id: 5627d7e8b0b7a750852eead9e28c5a9b3fa70559	2019-11-16 08:25:11 -08:00
svcscm	66bd0ed940	Updating submodules Summary: GitHub commits: `207328497a` `c272123098` `cdcd46de4e` `1c093d3fa7` `e18b3c2e6e` `746161a422` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 253c3a9d70da0cbaf34dc38414966ccccf40533c	2019-11-16 06:26:50 -08:00
Jongsoo Park	649e7f057e	fix comment index_size->output_size (#29831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29831 As title. Thanks Aleks Zi for finding this! Test Plan: Just changing comments Reviewed By: zlateski Differential Revision: D18511259 fbshipit-source-id: 5f1ad9ba53db9b22622a556ec214ced361ec016a	2019-11-16 01:49:02 -08:00
Chunli Fu	58ee61176c	SeqBlobReader Implementation (#29888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29888 Extract some common functions out of class LoadOp. Reviewed By: yinghai, ipiszy Differential Revision: D18456785 fbshipit-source-id: d0b8e86ad5709c35f1dc3821376000db1114dc95	2019-11-16 01:18:54 -08:00
Rohan Varma	455b5c1a7d	minor updates to rpc docs (#29857 ) Summary: Small fixes to rpc docs: - mark as experimental and subject to change - Reference the distributed autograd design document in pytorch notes page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29857 Differential Revision: D18526252 Pulled By: rohan-varma fbshipit-source-id: e09757fa60a9f8fe9c76a868a418a1cd1c300eae	2019-11-15 22:28:08 -08:00
Mike Ruberry	4da509090e	Disables TestNN.test_CTCLoss_1d_target (#29841 ) Summary: A variant of this test is flaky in CI. See https://github.com/pytorch/pytorch/issues/29380. This disables the entire test until a fix is determined. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29841 Differential Revision: D18531542 Pulled By: mruberry fbshipit-source-id: 3b033e3a7d55418cf459e7664d856d6dd4c98aa5	2019-11-15 22:03:04 -08:00
Pritam Damania	eb29276623	Update distributed autograd design doc with appropriate links. (#29927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29927 With the docs page now up, we can update the links in the design doc to point to the docs page. ghstack-source-id: 94055423 Test Plan: waitforbuildbot Differential Revision: D18541878 fbshipit-source-id: f44702d9a8296ccc0a5d58d56c3b6dc8a822b520	2019-11-15 21:10:53 -08:00
Mikhail Zolotukhin	4553d5e69b	Fix submodule traversal in insertPackUnpack pass. (#29914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29914 Currently we're visiting all submodules every time we're visiting a method of a module. Test Plan: Imported from OSS Differential Revision: D18534602 Pulled By: ZolotukhinM fbshipit-source-id: 38c5b0ab0bdd27599fd0a6af0eaa3603c68a97a8	2019-11-15 20:43:43 -08:00
Pavel Belevich	27afac2134	C++ API parity: Dropout, Dropout2d, Dropout3d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29761 Test Plan: Imported from OSS Differential Revision: D18530820 Pulled By: pbelevich fbshipit-source-id: 9d351561692f7de099d7c6aaf2ecb930b5c867e9	2019-11-15 20:32:06 -08:00
BowenBao	fbabf72829	Add ONNX support for Logdet (#29767 ) Summary: Exported as combination of ONNX::Log and ONNX::Det. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29767 Reviewed By: hl475 Differential Revision: D18499762 Pulled By: houseroad fbshipit-source-id: e6f2298635a995f01b2913d8958b5e1ca9d04058	2019-11-15 20:27:43 -08:00
Tao Xu	b730d04ed2	Fix deadlock issues in ThreadPool (#29885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29885 ### Summary Currently, we have a deadlock issue on iOS when running Resnet50. The problem happens when the task being run in the ThreadPool wants to call `getNumThread()` who will try to acquire the same mutex. And thus cause the deadlock situation. The fix is just remove the guard for `_numThreads`, as it's not likely to change after initialization. ### Test Plan 1. Generate a Resnet50 model using trace_model.py 2. Run `ios/TestApp/bootstrap.sh` to do the benchmark cc shoumikhin AshkanAliabadi Test Plan: Imported from OSS Differential Revision: D18533505 Pulled By: xta0 fbshipit-source-id: 2a069d20b59833ec8b02ff05515c3739a85a15de	2019-11-15 19:27:52 -08:00
Wanchao Liang	0a33c3f1a1	split module interface tests (#29917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29917 move test_module_interface to its own file, no code logic change Test Plan: Imported from OSS Differential Revision: D18543235 fbshipit-source-id: ab5e233061ba45cb0c05cafdd289b859036c207c	2019-11-15 19:09:36 -08:00
Zachary DeVito	a5b4d78c6d	Revert D18499600: Add overload name to JIT prim operators. Test Plan: revert-hammer Differential Revision: D18499600 Original commit changeset: a1b49e64c908 fbshipit-source-id: 73e27b72f53799c0133850d2352ae8cd8a82d87c	2019-11-15 18:36:17 -08:00
Zachary DeVito	2a442f5dca	Revert D18499601: Add missing operators for PyText model. Test Plan: revert-hammer Differential Revision: D18499601 Original commit changeset: 8a38d3d809ee fbshipit-source-id: 4f28f291bd7020f1fc9fc313bc766b5dbf5b1b90	2019-11-15 18:36:11 -08:00
Mingzhe Li	c543034531	add cuda sync when ops running on gpu (#29936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29936 This diff adds synchronization after op execution to ensure all the cuda streams complete. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 154.412 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 101.115 ... Reviewed By: hl475 Differential Revision: D18542732 fbshipit-source-id: b979d26a174f488e971074dc1e16b00e17179c80	2019-11-15 18:02:48 -08:00
Zachary DeVito	f1860aea83	fix missing lock in profiling graph compilation (#29886 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29886 Fixes https://github.com/pytorch/pytorch/issues/29764 Test Plan: Imported from OSS Differential Revision: D18523903 Pulled By: zdevito fbshipit-source-id: 4e2b04102ee9f6312e4a7b48536392454e6c1b79	2019-11-15 17:51:46 -08:00
Linbin Yu	5cad7d42ef	Enable full error message for mobile builds (#29926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29926 add a macro to enable full error message for mobile Test Plan: buck build -c project.ignore= //xplat/experimental/pytorch/predictor:predictorAndroid#android-armv7 Reviewed By: dreiss Differential Revision: D18521937 fbshipit-source-id: 99673b60a03da249236dc916bab3dff88d24bc25	2019-11-15 17:48:47 -08:00
Gregory Chanan	c300f086a4	Turn off scalar_check for diag. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29877 Test Plan: Imported from OSS Differential Revision: D18521734 Pulled By: gchanan fbshipit-source-id: 646cc0bca5082a808deca3f5d6646bc6bf180484	2019-11-15 17:17:13 -08:00
Gregory Chanan	a6a31c6dc2	Turn off scalar_check for _th_max, _th_min. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29876 Test Plan: Imported from OSS Differential Revision: D18521737 Pulled By: gchanan fbshipit-source-id: aeae6959c778eb6d935bcdb8bcf664a7c2404090	2019-11-15 17:17:08 -08:00
Gregory Chanan	6c7a0c68f9	Turn off scalar_check for lstsq (gels), and test scalars for eig. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29875 Test Plan: Imported from OSS Differential Revision: D18521740 Pulled By: gchanan fbshipit-source-id: 98133aadaaa2f2010462517a2704395dad95817b	2019-11-15 17:17:04 -08:00
Gregory Chanan	79f0636718	Turn off scalar_check for sort. (#29874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29874 It's handled correctly by the op. Test Plan: Imported from OSS Differential Revision: D18521744 Pulled By: gchanan fbshipit-source-id: 0577670bebaec98e6549ad270ff0ebd3ed908231	2019-11-15 17:17:00 -08:00
Gregory Chanan	ee5201cd7c	Fix memory leak in CUDA renorm, turn off scalar_check for renorm. (#29873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29873 Renorm requires at least 2-dimensions, so scalar_check could never succeed. Test Plan: Imported from OSS Differential Revision: D18521733 Pulled By: gchanan fbshipit-source-id: 9701c750a14ce67e1bd63dd0753bd8863da42c17	2019-11-15 17:16:55 -08:00
Gregory Chanan	d87655f515	Turn off scalar_checks for cumsum, cumprod. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29872 Test Plan: Imported from OSS Differential Revision: D18521739 Pulled By: gchanan fbshipit-source-id: 72d642bcc462e5b1317876bcae8b31f83a98467d	2019-11-15 17:16:51 -08:00
Gregory Chanan	fe575b44ee	Turn off scalar_check for fmod. (#29871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29871 Generated code diff: https://gist.github.com/gchanan/cba4ac79afa00a48eaff0aabc60d17cc Test Plan: Imported from OSS Differential Revision: D18521736 Pulled By: gchanan fbshipit-source-id: 364fc2aeba5315d0729a9f7f74c5e9ad64c30e45	2019-11-15 17:16:47 -08:00
Gregory Chanan	98362977a0	Turn off scalar_check for remainder. (#29870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29870 Codegen diff: https://gist.github.com/gchanan/c7ceb5715e7cfa6266e948d598744131 Test Plan: Imported from OSS Differential Revision: D18521738 Pulled By: gchanan fbshipit-source-id: bee23d67e247d4e06fef41243f578247c4817300	2019-11-15 17:16:42 -08:00
Gregory Chanan	61df98a083	Turn off scalar_checks for multinomial_alias_setup_, which requires 1d tensors. (#29869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29869 codegen changes: https://gist.github.com/gchanan/8e1b5184581fa37b27b6e856a75b470f Test Plan: Imported from OSS Differential Revision: D18521741 Pulled By: gchanan fbshipit-source-id: a2674b55214b84032e7a821e8472d7df9e8a1dcb	2019-11-15 17:16:38 -08:00
Gregory Chanan	92a512b583	Stop generating maybe_zero_dim calls for "scalar_check: false" with multiple outputs. (#29868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29868 Codegen changes: https://gist.github.com/gchanan/b0db8ec1310d7e10435c75b951e7de83 Test Plan: Imported from OSS Differential Revision: D18521735 Pulled By: gchanan fbshipit-source-id: bc4c437b001b754868435fb642ab60415600f0ff	2019-11-15 17:16:33 -08:00
Martin Yuan	6c39e5033c	Add missing operators for PyText model. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29664 Test Plan: Imported from OSS Differential Revision: D18499601 fbshipit-source-id: 8a38d3d809ee5ef5b73b5a5ce1db612aea680e75	2019-11-15 16:22:52 -08:00
Martin Yuan	ff4e782e79	Add overload name to JIT prim operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29656 Test Plan: Imported from OSS Differential Revision: D18499600 fbshipit-source-id: a1b49e64c908d16d40a6ddb048182d7bbe80bcd6	2019-11-15 16:22:47 -08:00
Martin Yuan	3003c5f91b	OPN ops TupleConstruct/Unpack and format. (#29635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29635 TupleConstruct/Unpack as OPN ops. Test Plan: Imported from OSS Differential Revision: D18499602 fbshipit-source-id: 389b21d3ea532ef6fa729d67ce34214d86700cd2	2019-11-15 16:22:42 -08:00
David Reiss	d22f61432d	Update fbjni and enable PyTorch JNI build Summary: - Add a "BUILD_JNI" option that enables building PyTorch JNI bindings and fbjni. This is off by default because it adds a dependency on jni.h. - Update to the latest fbjni so we can inhibit building its tests, because they depend on gtest. - Set JAVA_HOME and BUILD_JNI in Linux binary build configurations if we can find jni.h in Docker. Test Plan: - Built on dev server. - Verified that libpytorch_jni links after libtorch when both are built in a parallel build. Differential Revision: D18536828 fbshipit-source-id: 19cb3be8298d3619352d02bb9446ab802c27ec66	2019-11-15 13:59:44 -08:00
Mingzhe Li	3f5dc95b57	fix device check in op bench (#29918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29918 Some of the tests don't specify `device` in the input configs so filter by device won't work for them. This diff fixes that issue. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:qpool_test -- --iterations 1 --device cpu # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: QAdaptiveAvgPool2dBenchmark # Mode: Eager # Name: QAdaptiveAvgPool2dBenchmark_N4_C3_input_size(224,224)_output_size(112,112)_contigTrue_dtypetorch.qint32 # Input: N: 4, C: 3, input_size: (224, 224), output_size: (112, 112), contig: True, dtype: torch.qint32 Forward Execution Time (us) : 2891.172 Reviewed By: hl475 Differential Revision: D18535766 fbshipit-source-id: 09d89cf23b3caab6c0bc3b8a9ae55cc439b98e0f	2019-11-15 13:55:38 -08:00
svcscm	acb8100810	Updating submodules Summary: GitHub commits: `4c18636f6b` `efdb8c4731` `b8881f9d9a` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7e3aeb7417c870ec2d8d46c3b83f1b7b5e9a98ec	2019-11-15 13:31:13 -08:00
Chunli Fu	7807d44934	Add TensorShapeAndType (#29848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29848 design doc: https://docs.google.com/document/d/15luH8R7a0WMiZzoKxu6cI0a1XDW4C0vyaW3-XQ_3G30/edit#heading=h.cyvbc4wtxkn7 Test Plan: buck build Reviewed By: ipiszy Differential Revision: D18513718 fbshipit-source-id: c3e3b30b58360b898528422ba9618b1dd3beb0a8	2019-11-15 13:06:06 -08:00
Gregory Chanan	5ab6635de1	Stop binding _th_resize_as_, which isn't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29867 Test Plan: Imported from OSS Differential Revision: D18521743 Pulled By: gchanan fbshipit-source-id: 0c3f1bfabb29b2d20305657644edb2065a549bc3	2019-11-15 12:50:27 -08:00
Gregory Chanan	8e61287d1b	Skip outputting scalar_checks if they are false. (#29866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29866 This is a no-op anyway, so no reason to output. Test Plan: Imported from OSS Differential Revision: D18521742 Pulled By: gchanan fbshipit-source-id: f695e453beeee609dbdf23d26f9b5eaf519e16b2	2019-11-15 12:50:22 -08:00
Summer Deng	4442fa59c7	Avoid keeping old histograms in the histogram observer to fix the OOM issue (#29768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29768 The previous histogram observer saves all histograms for new data and merge the histograms in the end. It could cause OOM issue when we want to collect histograms on large amount of data. In this diff, we assume running the histogram observer with a single thread and remap the histogram after seeing new data. Test Plan: ``` buck test mode/opt caffe2/caffe2/quantization/server:dynamic_histogram_test ``` ``` buck run mode/opt caffe2/caffe2/fb/fbgemm/numerical_debugger/workflows:int8_static_quantization_exporter -- --model-dir /mnt/public/summerdeng/ads/ --model-name downsized_ins_97293388_0.predictor --run --iter 10 --dataset-path /mnt/public/summerdeng/ads/ctr_instagram_story_int8/dataset/train/dataset_115764229_10 --hive-path="hive://ad_delivery/ig_ad_prefiltered_training_data_orc_injected/ds=2019-09-09/pipeline=ctr_instagram_story_click_only_model_opt_out_df" --collect-histogram --activation-histogram-file=/mnt/public/summerdeng/ads/ctr_instagram_story_int8/activation_histograms/dummy_debug_OOM.txt ``` Reviewed By: jspark1105 Differential Revision: D18458764 fbshipit-source-id: c0e36fffe9bf021efd17d8494deef43727333da2	2019-11-15 12:30:44 -08:00
Thomas Viehmann	7889e1e3f9	Add `torch.version.hip` from cmake (#29815 ) Summary: This adds the HIP_VERSION cmake variable as hip_version. This should help detecting ROCm, e.g. in https://github.com/pytorch/pytorch/issues/22091. To parallel CUDA, hip_version is a string. An alternative variant might be to split by '.' and only take the first two parts. The method suffers a bit from ROCm not being as monolithic as CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29815 Differential Revision: D18532267 Pulled By: bddppq fbshipit-source-id: 1bde4ad0cfacc47bfd1c0945e130921d8575a5bf	2019-11-15 12:03:15 -08:00
vishwakftw	69e343f2cc	Expose is_signed for dtype (#29511 ) Summary: Changelog: - Expose is_signed for torch.dtype by modifying torch/csrc/Dtype.cpp - Allow half, bfloat16 and bool to also been "known" by the isSignedType function Pull Request resolved: https://github.com/pytorch/pytorch/pull/29511 Test Plan: - Add tests in test/test_torch.py Closes https://github.com/pytorch/pytorch/issues/29475 Differential Revision: D18439030 Pulled By: albanD fbshipit-source-id: 4b1f9da70c1c8dfd0a5bc028b6936acd1c64af47	2019-11-15 11:16:45 -08:00
Michael Suo	23fcc409d5	Revert "switch back to azure pipelines" (#29910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29910 This reverts commit 6de1016f9dbf624f93f8c8d45feb56f8c222b7a6. Test Plan: Imported from OSS Differential Revision: D18532474 Pulled By: suo fbshipit-source-id: 852fdcf21bd4aa7ca94322d64e43aab5a822cabc	2019-11-15 11:00:14 -08:00
Takeshi Watanabe	a9c719ba82	Set TORCH_CXX_FLAGS in minimal example (#29890 ) Summary: To avoid ABI issue EDIT: After this PR, the example CMakeLists.txt will always use the `-D_GLIBCXX_USE_CXX11_ABI` value set in `share/cmake/Torch/TorchConfig.cmake`, regardless of the `-D_GLIBCXX_USE_CXX11_ABI` value passed to the `cmake` command by the user. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29890 Differential Revision: D18531391 Pulled By: yf225 fbshipit-source-id: 2db78ae7a33a4088b579e81c60b9a74861f1ccde	2019-11-15 09:57:15 -08:00
Mike Ruberry	9ec1727ea6	Makes test_type_promotion generic (#29417 ) Summary: Test type promotion was already running on CUDA with its own (tiny) version of a generic test framework. This PR makes it use the actual generic test framework. In addition, the tests previously set the default dtype (and did not reset it). A new decorator replaces the previous style and resets the default dtype after each test. This is still not thread-safe, but at least there's a comment to that effect now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29417 Differential Revision: D18514545 Pulled By: mruberry fbshipit-source-id: 5aad43481ae71124cba99fb2e4a946894f591d68	2019-11-15 09:54:07 -08:00
David Reiss	0108f473ad	Use c10::to_string in more places (#29839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29839 std::to_string isn't reliably available on Android. Use c10::to_string instead in some more files that we want to add to some Android builds. Test Plan: CI Reviewed By: linbinyu Differential Revision: D18509295 fbshipit-source-id: 678af1abbea05777310499634ab01afbe21134d8	2019-11-15 09:22:59 -08:00
Will Feng	60ad2a96f0	Update torchvision in CI (#29853 ) Summary: Update torchvision in CI to include `44a5bae933`. This PR is blocking https://github.com/pytorch/pytorch/pull/29838. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29853 Differential Revision: D18531096 Pulled By: yf225 fbshipit-source-id: 19ed7628d08854108a05e01696e09c9b03a3d5f4	2019-11-15 09:18:35 -08:00
David Reiss	5e53c1501a	Update CircleCI config to use Docker images from "pytorch" account (#29835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29835 Using images from personal accounts restricts our ability to push updates in a timely manner. Test Plan: CI Reviewed By: soumith Differential Revision: D18524393 Pulled By: dreiss fbshipit-source-id: f12dd3ce50c8362e152ed265e2d24bcb073dcfd4	2019-11-15 07:30:15 -08:00
Xiaomeng Yang	510ef4b63a	Add nn.quantized.Conv3d (#29813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29813 Add nn.quantized.Conv3d Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv" Reviewed By: jianyuh Differential Revision: D18467749 fbshipit-source-id: 892f708179e9e836ad902851ac1838847009da15	2019-11-15 04:33:40 -08:00
Shen Li	e1a309a647	Always include autograd context id in rpc/remote requests (#29781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29781 Even though the request might not contain any requires_grad tensor, the return value could. Therefore, we should always include the autograd context id in the request. closes #28819 Test Plan: Imported from OSS Differential Revision: D18496709 Pulled By: mrshenli fbshipit-source-id: 2f870c410291a1300952895b7488ea07e5574228	2019-11-14 23:02:11 -08:00
Sebastian Messmer	a34cc01dcc	Implement backend level fallback for c10 (#28494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28494 Allow a backend-level fallback kernel that is called whenever an operator doesn't have a concrete kernel for the backend. This is needed for lazy. ghstack-source-id: 93872571 Test Plan: unit tests Differential Revision: D18081495 fbshipit-source-id: 5f4964249cc226a39fd6e929a5be88a771c401a7	2019-11-14 21:35:49 -08:00
Sebastian Messmer	3fa5917530	Simplify c10 dispatcher (#28314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28314 Simplify the c10 dispatcher, making it more easy to understand. Also, this moves the dispatch decision from the DispatchTable class into the Dispatcher class. This is required because DispatchTable only knows things about one operator but the dispatch decision will (in future diffs) also need to look at backend-level fallbacks, for example for lazy. ghstack-source-id: 93872575 Test Plan: unit tests Differential Revision: D18018736 fbshipit-source-id: 375729d5e307e0622906f8cc9a0b087b94aea2b1	2019-11-14 21:35:44 -08:00
Xingying Cheng	6dc8d72f94	Change from int64_t to jlong for mac build (#29861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29861 Follow https://github.com/pytorch/pytorch/issues/6570 to run ./run_host_tests.sh for Mac Build, we saw error below: ```error: cannot initialize a parameter of type 'const facebook::jni::JPrimitiveArray<_jlongArray >::T ' (aka 'const long ') with an rvalue of type 'std::__1::vector<long long, std::__1::allocator<long long> >::value_type ' (aka 'long long *') jTensorShape->setRegion(0, tensorShapeVec.size(), tensorShapeVec.data());``` ghstack-source-id: 93961091 Test Plan: Run ./run_host_tests.sh and verify build succeed. Reviewed By: dreiss Differential Revision: D18519087 fbshipit-source-id: 869be12c82e6e0f64c878911dc12459defebf40b	2019-11-14 21:29:59 -08:00
Will Feng	893105b79e	Add reset_parameters to torch::nn modules (#29832 ) Summary: This PR adds `reset_parameters` to the torch::nn modules whose Python version also has `reset_parameters` defined, so that there is better parity between Python and C++ version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29832 Differential Revision: D18515939 Pulled By: yf225 fbshipit-source-id: 5aa23e5c7ce1026787c04ffeb6c7f167620dd491	2019-11-14 20:58:32 -08:00
Jiakai Liu	831f25c53b	add test/mobile/op_deps project for dependency analysis test (#29716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29716 Move out the test project from PR #29550 into this separate PR. It serves these purposes: - Defines the ".yaml" format to describe inter-op dependency. - Can be used as a small testbed for us to quickly experiment, evaluate and test different dependency analysis techniques (llvm-pass, linker, etc). - Covers various different c10 operator APIs and builds a runnable binary. I create a 'mobile' folder under 'test/' because I feel we can create a few other similar projects here to test mobile specific yet platform independent stuff, e.g.: - use host tool chain + mobile build options to do continuous E2E test; - test custom build workflow for mobile; Test Plan: - run build script and verify the binary is runnable: ``` scripts/build_mobile.sh test/mobile/op_deps/build.sh ``` Differential Revision: D18474641 Pulled By: ljk53 fbshipit-source-id: 3fae9da5e0e3fe6cb17ada8783d5da2f144a6194	2019-11-14 20:41:38 -08:00
Jiakai Liu	b508de6412	add static libraries to TorchConfig.cmake.in (#29837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29837 The current TorchConfig seems only handles shared libraries. When building static libraries it doesn't provide the list of all needed static libraries. This is especially a problem for mobile build as we build static libraries first then link into shared library / binary to do "gc-sections". Today we have to manually import these dependent libraries on each callsite. Test Plan: - build_mobile.sh builds and runs; - The baby test project in #29716 builds and runs; - Will check CI for other platforms; Differential Revision: D18513404 Pulled By: ljk53 fbshipit-source-id: c3dc2c01004c4c9c4574c71fd9a4253c9e19e1e9	2019-11-14 20:41:33 -08:00
Jiakai Liu	9371b31818	set USE_STATIC_DISPATCH outside cmake (#29715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29715 Previous we hard code it to enable static dispatch when building mobile library. Since we are exploring approaches to deprecate static dispatch we should make it optional. This PR moved the setting from cmake to bash build scripts which can be overridden. Test Plan: - verified it's still using static dispatch when building with these scripts. Differential Revision: D18474640 Pulled By: ljk53 fbshipit-source-id: 7591acc22009bfba36302e3b2a330b1428d8e3f1	2019-11-14 20:41:29 -08:00
Mingzhe Li	60a33cac2b	reduce input shapes of long tag in op bench (#29865 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29865 For some operators, the number of tests (forward + backward) could easily go above 100. Many of them could be redundant so this diff tries to reduce the number of shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 28418.926 ... Reviewed By: hl475 Differential Revision: D18520946 fbshipit-source-id: 1056d6d5a9c46bc2d508ff133039aefeb9d11c27	2019-11-14 20:19:09 -08:00
Mingzhe Li	90e3bbf3ab	support all with tag_filter to run all shapes (#29864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29864 This diff make `all` as a reserved keyword for tag_filter. When `all` is passed from user, it will run all the supported shapes. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 --tag_filter all # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N32_K256_cpu # Input: M: 8, N: 32, K: 256, device: cpu Forward Execution Time (us) : 6798.688 ... Reviewed By: hl475 Differential Revision: D18520249 fbshipit-source-id: 4d55af9f46f89b2fe8842e1a00dfa8e5acaf4fa2	2019-11-14 20:19:05 -08:00
Mingzhe Li	5da2bf945e	add embeddingbag to benchmark_all_test (#29830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29830 as title Test Plan: na Reviewed By: hl475 Differential Revision: D18506023 fbshipit-source-id: 15693894c0aa736ab3e818bc740099f0d629cb84	2019-11-14 20:13:57 -08:00
Rohan Varma	371da6acef	move get_rpc_timeout to pybind (#29765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29765 instead of wrapping this C++ function with python that causes unnecessary overhead, we can move this to pybind and use the `DefaultRpcAgent` to get the timeout. ghstack-source-id: 93879236 Test Plan: unit tests pass Differential Revision: D18493195 fbshipit-source-id: fd0f1f13ee15acb5ea1ae7c696925c9b54304f6d	2019-11-14 19:39:22 -08:00
James Reed	7a6c3b36a1	Switch ScriptModuleOp to use a unique_ptr Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29856 Test Plan: waitforsadcastle Reviewed By: dzhulgakov Differential Revision: D18516553 fbshipit-source-id: d1e2d49ec613d07b21cd30bd777fbd300032cba1	2019-11-14 19:36:00 -08:00
Elias Ellison	902c1f9ef1	Check for mutable default parameters (#29833 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/21545 We we were silently giving wrong semantics previously: Python behavior: ``` def test(x=[]): x.append(1) return len(x) print(test()) # 1 print(test()) # 2 ``` By checking at the python layer, we prevent any new models from serializing this behavior but do not break existing serialized models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29833 Differential Revision: D18513168 Pulled By: eellison fbshipit-source-id: 6fe73f28e1f9d39dedeaf67a04718089d14401a1	2019-11-14 18:28:48 -08:00
Pritam Damania	77bb41c965	Rename dist_autograd_context and dist_autograd_container. (#29696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29696 The paths distributed/autograd/context/dist_autograd_context.h and distributed/autograd/context/dist_autograd_container.h were repetitive. Therefore renaming these to distributed/autograd/context/context.h and distributed/autograd/context/container.h ghstack-source-id: 93850266 Test Plan: waitforbuildbot Differential Revision: D18467624 fbshipit-source-id: bbf3905396f553006851af296c880c1bd106ec47	2019-11-14 14:49:34 -08:00
Rohan Varma	06ef4a757d	Add docs for RPC, dist autograd, and RRef modules (#29276 ) Summary: Closes https://github.com/pytorch/pytorch/issues/28983. Documentation for `torch.distributed.rpc` and `torch.distributed.autograd` modules. Also fixes/tidies up some of the docstrings in rpc/autograd, and moves some functions to be private so they don't show up in the documentation. Note: Much of the text to describe/explain the RPC/RRef layers are taken from the following RFCs: https://github.com/pytorch/pytorch/issues/23110, https://github.com/pytorch/pytorch/issues/26759 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29276 Differential Revision: D18478754 Pulled By: rohan-varma fbshipit-source-id: e9a7089baf5275304e5408d319eb9bf98e53fff8	2019-11-14 14:32:03 -08:00
Hong Xu	ce7058337c	Remove two unused TH definitions of rsqrt. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29688 Differential Revision: D18465222 Pulled By: VitalyFedyunin fbshipit-source-id: a33880db389b82a8242c79723830e0a3afd3d498	2019-11-14 14:28:17 -08:00
Your Name	bfedace5e3	Expose miniz to Python (#29228 ) Summary: Stacked PRs * https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization * https://github.com/pytorch/pytorch/issues/29244 - Use custom CRC * https://github.com/pytorch/pytorch/issues/29228 - Expose miniz to Python This adds the miniz wrapper to Python along with some functionality so that it can operate on both files and buffers. Python's `zipfile` module is pretty slow (see https://github.com/pytorch/pytorch/issues/26573), but miniz solves most of the perf issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29228 Differential Revision: D18330945 Pulled By: driazati fbshipit-source-id: 455a19bcb23b871d56e4233edbf897134b2c2f1d	2019-11-14 13:37:31 -08:00
Ivan Kobzarev	eef349a679	host build gradle publishing (#29749 ) Summary: To publish snapshots: `gradle -p android pytorch_host:uploadArchives` (for test changed version to 0.0.1-SNAPSHOT) Result: https://oss.sonatype.org/#nexus-search;quick~pytorch_java_only https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/ jar: https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/pytorch_java_only-0.0.1-20191113.211446-1.jar sources: https://oss.sonatype.org/service/local/repositories/snapshots/content/org/pytorch/pytorch_java_only/0.0.1-SNAPSHOT/pytorch_java_only-0.0.1-20191113.211446-1-sources.jar Pull Request resolved: https://github.com/pytorch/pytorch/pull/29749 Differential Revision: D18496644 Pulled By: IvanKobzarev fbshipit-source-id: 136213c23b9ab1e3e22059ad9c8b53822c026b3b	2019-11-14 11:44:02 -08:00
Edward Yang	65bb34d885	Remove TensorImpl::is_variable, deprecate Tensor::is_variable (#29653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29653 I didn't remove is_variable from Tensor for BC reasons, but I did remove as many uses as I could from the codebase. at::impl::variable_excluded_from_dispatch got moved to TensorBody.h so that it's more widely accessible. This diff is NOT semantics preserving. Here are the major differences: - In a number of native operator implementations, we tested that arguments are not variable. I replaced these with asserts that variable is excluded from dispatch. I actually don't think these asserts are really necessary now (they should certainly be true, but it's hard to get it wrong), but I've kept them for old time's sake. At least, they'll detect if you call these functions before you've processed variable (indicating a bug in your kernel.) - There are a number of places where we do a per-tensor test for being a variable, for better error reporting when someone commits Tensor/Variable confusion. Although these tests are substantively the same as the tests above, in these cases I decided to delete the test entirely. The reasoning is that in these cases, we didn't really care about dispatch (also, see above; I'm not too sure we really need the dispatch asserts), we cared about Tensor/Variable confusion. Since Tensor/Variable confusion is impossible now, we don't need the tests. One of the key factors which pushed me one way or another was whether or not a function was doing per-tensor validation; if I kept the assert in such functions, I'd repeatedly access the TLS. Even if we want to bring back the asserts, they would have to go somewhere else. Another similar idiom is the number of places we do !x.defined() \|\| x.is_variable(); I treated this equivalently. - nuclear_norm's computation of compute_uv is a bit weird, but I think it's OK to just delete the is_variable case (I suspect that it is always the case that self.is_variable(), but it doesn't really matter.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18496168 Pulled By: ezyang fbshipit-source-id: 5a1ded931e0c10a6b758ba64a8380d34110e0c3e	2019-11-14 11:41:02 -08:00
James Reed	8d23f7a3a8	Only print original SourceRange on highlight Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29708 Test Plan: Imported from OSS Differential Revision: D18472089 Pulled By: jamesr66a fbshipit-source-id: 89cbe8edf4e3c90d3795a1f3ea55cb234e2682e0	2019-11-14 11:38:02 -08:00
Nikolay Korovaiko	7f4d4254c3	Make sure we only run Profiling Graph Executor tests on windows (e.g. no simple, no legacy) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29782 Differential Revision: D18496848 Pulled By: Krovatkin fbshipit-source-id: 9d5dbf0fc6a350138a0094f79eef2f9f25b308f5	2019-11-14 11:25:54 -08:00
James Reed	90ac35b7bd	Fix tracing of autograd functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29791 Test Plan: Imported from OSS Differential Revision: D18499142 Pulled By: jamesr66a fbshipit-source-id: 6c2864dfbfa0419c8c888d55e082a619d058b3ee	2019-11-14 11:18:07 -08:00
Mingzhe Li	747233e3bd	minir edit to fix benchmark_all_test cuda error (#29829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29829 This diff replaces the if check cuda with to(device...) which is a much cleaner interface. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 129.548 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 48.313 ... Reviewed By: bddppq Differential Revision: D18507568 fbshipit-source-id: 32534e76b2e27d59a631a4d76a0d93700e975ea4	2019-11-14 11:13:36 -08:00
Jie	c5ac70a0ea	AdaptiveAvgPooling nhwc cuda update (#29700 ) Summary: 1. Add clip on grid launch configs (Tests added in test_nn.py) 2. Assert on shared memory requirement, gives better hint when error out; Pull Request resolved: https://github.com/pytorch/pytorch/pull/29700 Differential Revision: D18482556 Pulled By: VitalyFedyunin fbshipit-source-id: df3f653185d7b477b2241f2ef4779670e9a78899	2019-11-14 11:02:48 -08:00
Mingzhe Li	ad95099f45	fix benchmark_all_test when running on gpu (#29818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29818 When some of the test running on cuda, there is a runtime error because of missing data transfer from cpu to cuda. This diff fixes that issue. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 165.241 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 56.546 ... Reviewed By: hl475 Differential Revision: D18506269 fbshipit-source-id: 87942d7a52bd398600766c0f5363d791b74a6ca6	2019-11-14 10:10:48 -08:00
Mingzhe Li	b70d571233	add embeddingbag operator the the benchmark suite (#29784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29784 Add embeddingbag operator to the benchmark suite with different number of embeddings, dims, and inputs. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:embeddingbag_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size16_offset0_sparseTrue # Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True Forward Execution Time (us) : 624.838 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags2300_dim64_modesum_input_size64_offset0_sparseTrue # Input: embeddingbags: 2300, dim: 64, mode: sum, input_size: 64, offset: 0, sparse: True Forward Execution Time (us) : 636.744 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size8_offset0_sparseTrue # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 8, offset: 0, sparse: True Backward Execution Time (us) : 2325.291 # Benchmarking PyTorch: embeddingbag # Mode: Eager # Name: embeddingbag_embeddingbags80_dim64_modesum_input_size16_offset0_sparseTrue # Input: embeddingbags: 80, dim: 64, mode: sum, input_size: 16, offset: 0, sparse: True Backward Execution Time (us) : 2528.658 ... Reviewed By: bddppq Differential Revision: D18496340 fbshipit-source-id: 157dcff2ea4ec13416fe161382fcefd47ce4cc01	2019-11-14 10:05:47 -08:00
Mingzhe Li	e53b510773	add addmm op to the benchmark suite (#29783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29783 Add addmm operator which reuses existing input shapes for the add operator. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 759.237 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 922.764 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 4689.546 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd1 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 1700.093 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd2 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2947.427 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K64_cpu_bwd3 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 2518.043 # Benchmarking PyTorch: addmm # Mode: Eager # Name: addmm_M64_N64_K128_cpu_bwdall # Input: M: 64, N: 64, K: 128, device: cpu Backward Execution Time (us) : 5848.369 Reviewed By: bddppq Differential Revision: D18496476 fbshipit-source-id: 4f1c116a2676a64106afa958e8c8a8e109f35a4a	2019-11-14 10:02:55 -08:00
peter	dfa9c9e227	Replace `make` with `cmake --build .` in the docs (#29798 ) Summary: Inspired by https://discuss.pytorch.org/t/issues-with-tutorial-installing-c-distributions-of-pytorch/33295/11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29798 Differential Revision: D18504951 Pulled By: ezyang fbshipit-source-id: 8e80d8891ca85196f00611fe784b2f55659e52ab	2019-11-14 08:23:19 -08:00
Edward Yang	01d76145fc	Fix typo: Caffe2_MAIN_LIB to Caffe2_MAIN_LIBS (#29746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29746 I don't know if this actually broke anything because I just discovered the typo while reading the cmake. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18504546 Pulled By: ezyang fbshipit-source-id: 6cb5fb1e71721e5cf8fc2f7b5552dc7c514f065f	2019-11-14 07:55:09 -08:00
Xiaomeng Yang	bf80664515	Add quantized conv3d function (#29686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29686 Add quantized conv3d function Test Plan: buck test mode/dev-nosan //caffe2/test:quauntized -- "conv" Reviewed By: hl475 Differential Revision: D18463090 fbshipit-source-id: f9c3d2920c3fc015bbb2b6a583a582c9f8397b08	2019-11-14 03:04:51 -08:00
svcscm	2d7d53cd87	Updating submodules Summary: GitHub commits: `dc64b842b8` `c31d13303f` `2a71bbc69e` `4bb251a6af` `15b4a705e6` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 70fff211005d374d558de25eb4342b84b7bcba25	2019-11-14 01:48:43 -08:00
Shen Li	4a1fcc0b83	Allow rpc.remote to create RRef on self (#29634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29634 This implementation supports rpc.remote to self by doing the following steps: 1. create an owner RRef 2. add the owner RRef to owners_ in RRefContext, and keep it alive by using RRefId as the ForkId. 3. Go through serde and insert the message to the caller's thread-pool 4. When the response message gets processed, remove the itself from RRef fork map. Test Plan: Imported from OSS Differential Revision: D18445812 Pulled By: mrshenli fbshipit-source-id: e3b9aa98962c388acbc2ce294101a236d5cb2da6	2019-11-14 00:10:24 -08:00
Junjie Bai	9fd7db616a	Disable Caffe2 RCCL tests (#29792 ) Summary: They are flaky on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/29792 Differential Revision: D18500737 Pulled By: bddppq fbshipit-source-id: 18a39b2d6117a7c3b48e1d6a635f24acb35fc497	2019-11-13 23:56:21 -08:00
Pritam Damania	ba74be0d3e	Update CODEOWNERS for distributed rpc framework. (#29788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29788 ghstack-source-id: 93889545 Test Plan: waitforbuildbot Differential Revision: D18498997 fbshipit-source-id: e1419f1a487f7fe4d5f6af9de66e930da067b70e	2019-11-13 23:42:09 -08:00
Jianyu Huang	4a27d2be18	Enabling intra-op parallelism for fbgemm_linear_int8_weight_fp32_activation op (#29532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29532 As we are migrating from `torch.jit.quantized` to `torch.quantization.quantize_dynamic` API, we still need to temporarily add the intra-op parallelism support in the legacy ` fbgemm_linear_int8_weight_fp32_activation` API for the parallelization of RNN operators and help the performance debugging for some legacy serialized models with the old API. ``` from __future__ import absolute_import, division, print_function, unicode_literals import time import torch K, N = 1024, 1024 print("M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16") for M in (2, 20, 200, 500, 1024,): print(M, sep=",", end=", ") for num_threads in (1, 2, 4, 8, 16): torch.set_num_threads(num_threads) x = torch.rand(M, K) w = torch.rand(K, N) b = torch.rand(N) NITER = 20 W_int8, col_offsets, W_scale, W_zp = torch.fbgemm_linear_quantize_weight(w) W_prepack = torch.fbgemm_pack_quantized_matrix(W_int8, W_int8.size(1), W_int8.size(0)) s = time.time() for _ in range(NITER): Y_fp32 = torch.fbgemm_linear_int8_weight(x, w, W_prepack, col_offsets, W_scale, W_zp, b) elapsed_per_iter_dyn_quant = (time.time() - s) / NITER print( "{:0.2f}".format(2.0 * M * N * K / elapsed_per_iter_dyn_quant / 1e9), end=", ", ) print("\n", end="") ``` On SKL T1 server: Before the Diff: ``` [root@rtptest33418.frc2 ~/jhuang_test]# ./torch_fbgemm_linear_int8_weight_fp32_activation.par M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 2, 41.01, 51.51, 51.63, 51.49, 52.10, 20, 80.94, 81.43, 82.35, 82.27, 82.24, 200, 87.94, 87.61, 88.53, 88.43, 88.52, 500, 88.76, 89.60, 89.80, 89.65, 89.76, 1024, 88.01, 89.58, 90.11, 90.39, 89.96, ``` After the Diff: ``` [root@rtptest33418.frc2 ~/jhuang_test]# ./torch_fbgemm_linear_int8_weight_fp32_activation.par M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 2, 45.08, 70.38, 72.22, 61.59, 44.15, 20, 83.09, 137.86, 205.58, 254.19, 201.08, 200, 87.86, 157.85, 287.24, 420.26, 476.16, 500, 88.57, 162.19, 296.52, 500.91, 530.25, 1024, 88.34, 147.47, 296.78, 534.45, 482.10, ``` ghstack-source-id: 93666880 Test Plan: CI Differential Revision: D18421371 fbshipit-source-id: 22cc1031ec9ee914c1508ba2aa9ed0281dfcd076	2019-11-13 23:12:06 -08:00
Mingzhe Li	f3b15727c5	fix op benchmark OOM issue (#29794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29794 Before this diff, all tests of an operator are created at once before testing. Once an operator is benchmarked, the same process will move to the next operator and so on. The issue is that the number of tests of a single operator could be > 100 which can cause OOM issues. This diff avoids creating all the tests of an operator at once by using generators which creates/runs test one by one. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: relu # Mode: Eager # Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.quint8 # Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.quint8 Forward Execution Time (us) : 52.493 # Benchmarking PyTorch: relu # Mode: Eager # Name: relu_dims(3,4,5)_contigFalse_inplaceFalse_dtypetorch.qint8 # Input: dims: (3, 4, 5), contig: False, inplace: False, dtype: torch.qint8 Forward Execution Time (us) : 44.945 ... Reviewed By: hl475 Differential Revision: D18500103 fbshipit-source-id: 747c0ad0d302177da04da36e112c67f154115b6e	2019-11-13 22:22:58 -08:00
Ivan Kobzarev	aa6e992ffb	Subscribe for record function and if android do atrace (#28708 ) Summary: ghstack-source-id: 5edaf471557c25098ca0547229f2763760866887 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28708 Some cpp formatting changes as I run `clang-format -i` Testing on devserver: make assets (models): ``` pushd android/test_app/; python make_assets.py; popd ``` Build test_app apk: ``` TRACE_ENABLED=1 sh android/build_test_app.sh find . -type f -name *apk ./android/test_app/app/build/outputs/apk/mobNet2Quant/debug/test_app-mobNet2Quant-debug.apk ./android/test_app/app/build/outputs/apk/resnet18/debug/test_app-resnet18-debug.apk ``` Install apk: `adb install -r test_app-mobNet2Quant-debug.apk` Run app on the device. Systrace: ``` $ANDROID_HOME/platform-tools/systrace/systrace.py -t 10 -a org.pytorch.testapp.mobNet2Quant sched freq idle am wm gfx view binder_driver hal dalvik camera input res -o trace.html ``` trace.html contains sections like `jni::Module::forward` ![Screenshot 2019-11-12 18 36 30](https://user-images.githubusercontent.com/6638825/68728156-5d245580-057b-11ea-9e71-e47681894fe4.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28712 Differential Revision: D18495898 Pulled By: IvanKobzarev fbshipit-source-id: 0bced4a442f9dd90525520972a2c1f5d51f57df3	2019-11-13 20:55:40 -08:00
Will Feng	a68c52494c	Use F::*FuncOptions for embedding/embeddingbag functionals (#29673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29673 Following https://github.com/pytorch/pytorch/pull/29364 and https://github.com/pytorch/pytorch/pull/29404, this PR makes `F::EmbeddingFuncOptions` and `F::EmbeddingBagFuncOptions` separate classes from `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`, so that it's easier to enforce that arguments such as `num_embeddings` and `embedding_dim` are required for `torch::nn::EmbeddingOptions` and `torch::nn::EmbeddingBagOptions`. Test Plan: Imported from OSS Differential Revision: D18462540 Pulled By: yf225 fbshipit-source-id: f2abf431e48675b0a9d7f6f398cdb90ff9037c35	2019-11-13 18:47:22 -08:00
Ashkan Aliabadi	9ee6fa0145	Use NNPACK for strided convolutions. (#29595 ) Summary: Use NNPACK for strided convolutions. ResNet50 on Pixel 3: - Before: 552.956 ms - After: 402.947 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/29595 Reviewed By: houseroad Differential Revision: D18457472 Pulled By: AshkanAliabadi fbshipit-source-id: 51f22ce120c39f197cd564bcc71bbad2951edf85	2019-11-13 17:10:41 -08:00
Yangxin Zhong	ed788ec780	Linearizable Label: Class Weights, Allow Missing Label, and Average by Batch Size (#29707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29707 In D17885977, Linearizable label (a multi-class classification) was implemented in MTML. In this diff, we add several items for Linearizable label: - Assigning different weights to each class through ```model_def.tasks[i].class_weights```. - This option is a dictionary, the keys of which are indices of the classes and the values of which are weights for each class. - For example, if a linearizable-label task has 4 classes and its ```class_weights = {"0": 1, "1": 0.1, "2": 0.1, "3": 0.01}```, it means that in the loss function of this task, we assign weight 1 to its first class, weight 0.1 to its second and third class, and weight 0.01 to its forth class. The index/order of classes follows the logic of linearizable label. - Note that when you assign different weights to different classes, you need to correct the calibration by setting an appropriate ```model_def.tasks[i].calibration.linearizable_class_weight```. Basically, the class weights in calibration should be the reciprocals of the class weights in loss function. So the ```calibration.linearizable_class_weight = {"0": 1, "1": 10, "2": 10, "3": 100}``` for the example above. - Example FBLearner job: f150763093 - We also support ```model_def.allow_missing_label_with_zero_weight``` for linearizable label, which will ignore those examples with first label missing, by assigning zero weights to them in loss function. - We need to set ```allow_missing_label_with_zero_weight = true``` to enable it. - Example FBLearner job: f150763093 - Last but not least, we update caffe2 operator ```SoftmaxWithLoss``` to support loss averaged by batch size. - We need to set ```model_def.tasks[i].loss.softmaxLoss.average_by_batch_size = true``` to enable it. - Previously, the loss was averaged by weight sum of examples in batch, which is still the default behavior now (when ```average_by_batch_size = null``` or ```average_by_batch_size = false```). - Without this new feature, the calibration will be incorrect when applying non-equal-weight training among different classes to a linearizable task. - Example FBLearner job with ```average_by_batch_size = true``` results in a correct calibration: f150763093 - Example FBLearner job with ```average_by_batch_size = null``` results in an incorrect calibration: f150762990 Test Plan: buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_class_weights buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_with_zero_weight buck test caffe2/caffe2/fb/dper/layer_models/tests:mtml_test_2 -- test_linearizable_label_task_average_by_batch_size All tests passed. full canary: https://fburl.com/fblearner/troznfgh Reviewed By: chenshouyuan Differential Revision: D18461163 fbshipit-source-id: aaf3df031406ae94f74e2e365b57e47409ef0bfe	2019-11-13 16:52:27 -08:00
Christy Lee	b8dca04f73	Add error message if CUDA startup fails (#29670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29670 This is the entry point to loading CUDA code, improve error message to prompt users to check that gpu code is included. Test Plan: Build without gpu code. Run the binary. Check that the new error message exists. Reviewed By: yfeldblum Differential Revision: D18453798 fbshipit-source-id: 63d9ec50acdf57ef4baf3f7d99c836c56bc1435e	2019-11-13 16:48:40 -08:00
Xingying Cheng	5654eccfe2	Add pytorch_jni_lite for lite interpreter. (#29621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29621 Add pytorch_jni_lite for lite interpreter. ghstack-source-id: 93867325 Test Plan: buck build xplat/caffe2/android:pytorch-jni buck build xplat/caffe2/android:pytorch buck install -r fb4a Reviewed By: dreiss Differential Revision: D18438343 fbshipit-source-id: 7d4dee11d352cc9a67339c45d9d7f4a2ba285ebc	2019-11-13 16:16:29 -08:00
Elias Ellison	681b610f35	use new overload mechanism for rnns (#29614 ) Summary: Uses new overload mechanism for rnns, making it so that python & torchscript go through the same path and using an API that is in line with the one specified in https://docs.python.org/3/library/typing.html#typing.overload This brings the TorchScriptable rnns closer to the base implementation; unifying them should be done in a follow up PR but there are still a few limitations that make it difficult to do so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29614 Differential Revision: D18486982 Pulled By: eellison fbshipit-source-id: aaaea66a4a7f12d2e46199ca254f9e8f7475500e	2019-11-13 15:44:25 -08:00
Xiang Gao	91bef3d189	Simplify copy kernel with static_cast_with_inter_type (#29631 ) Summary: After https://github.com/pytorch/pytorch/pull/29612 get merged, `static_cast_with_inter_type` can now automatically convert complex types to its real values, therefore there is no need to do it inside copy kernel. This should wait until https://github.com/pytorch/pytorch/pull/29612 get merged, otherwise it won't pass CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29631 Differential Revision: D18485676 Pulled By: ezyang fbshipit-source-id: 0bbfd551e3d3010f87eef0fce23a1f8a094b7d31	2019-11-13 15:36:22 -08:00
Will Feng	65f691f2c2	Add more tests for torch::arange Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29689 Test Plan: Imported from OSS Differential Revision: D18465818 Pulled By: yf225 fbshipit-source-id: 0cf0aaa7febcf4318abdaae7d17a43ab3acde017	2019-11-13 15:17:16 -08:00
Will Feng	2bcac59a30	Use default dtype for torch::tensor(floating_point_values) and torch::tensor(empty braced-init-list) when dtype is not specified (#29632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29632 This PR is BC-breaking in the following way: Previously, C++ `torch::tensor` with a floating-point literal with no suffix (e.g. `torch::tensor(1.1)`) or a (nested) braced-init-list of floating-point literals with no suffix (e.g. `torch::tensor({{1.1, 2.2}})` produces a tensor with dtype `at::kDouble`. After this PR, it produces a tensor with dtype `torch::get_default_dtype()`, matching Python `torch.tensor` behavior. Test Plan: Imported from OSS Differential Revision: D18465819 Pulled By: yf225 fbshipit-source-id: 6834fe50335c677bc3832f2a5e9cf8d1ede9f665	2019-11-13 15:17:11 -08:00
Rohan Varma	3fb9bbc99b	refactor and move createException function (#29605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29605 Adds a wrapper around the existing createException function that allows passing of an error string, instead of a regular C++ exception. This allows us to createExceptions for errors that aren't necessarilu c++ exceptions. This function is used by https://github.com/pytorch/pytorch/pull/29601 and https://github.com/pytorch/pytorch/pull/26336. ghstack-source-id: 93819039 Test Plan: Unit tests pass Differential Revision: D18439216 fbshipit-source-id: 70b6a2e4f107304e322cdd2630847ad0071bc0c1	2019-11-13 14:53:22 -08:00
Nikolay Korovaiko	78bd0069d3	enable back 2 tests for simple exec Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29661 Differential Revision: D18456143 Pulled By: Krovatkin fbshipit-source-id: 9e4ae3ae681e3c9a81ada1e8b39da1e1342ce394	2019-11-13 14:22:19 -08:00
Ivan Kobzarev	71aacf7b82	Gradle build offline dependencies #2 (#29738 ) Summary: The issue with previous build was that after phabricators lint error about double quotes I changed: `$GRADLE_PATH $GRADLE_PARAMS` -> `"$GRADLE_PATH" "$GRADLE_PARAMS"` which ended in error: ``` Nov 13 17:16:38 + /opt/gradle/gradle-4.10.3/bin/gradle '-p android assembleRelease --debug --stacktrace --offline' Nov 13 17:16:40 Starting a Gradle Daemon (subsequent builds will be faster) Nov 13 17:16:41 Nov 13 17:16:41 FAILURE: Build failed with an exception. Nov 13 17:16:41 Nov 13 17:16:41 * What went wrong: Nov 13 17:16:41 The specified project directory '/var/lib/jenkins/workspace/ android assembleRelease --debug --stacktrace --offline' does not exist. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29738 Differential Revision: D18486605 Pulled By: IvanKobzarev fbshipit-source-id: 2b06600feb9db35b49e097a6d44422f50e46bb20	2019-11-13 13:56:37 -08:00
Junjie Bai	2b05ae0704	Revert "Enable test_distributed for ROCm but only with nccl backend" (#29736 ) Summary: This reverts commit 7073ee209000a7781c0c863c4ef39bb3bfdb4932. They are flaky on master: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6830//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6824//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/6802//console cc jithunnair-amd Pull Request resolved: https://github.com/pytorch/pytorch/pull/29736 Differential Revision: D18480543 Pulled By: bddppq fbshipit-source-id: 9a1dd9aa5f5959dc6fbbfdab0df997514221217a	2019-11-13 13:53:05 -08:00
Gao, Xiang	c800591030	Update ATen/native/README.md about broadcasting (#29742 ) Summary: Is this description still true? I have never seen any `s_` ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29742 Differential Revision: D18485707 Pulled By: ezyang fbshipit-source-id: c5ce2587bb499561706c3c2817571ee11f7eb63c	2019-11-13 13:46:54 -08:00
Will Feng	b37c235d86	C++/Python API parity for Conv{1,2,3}d layers, and add F::conv{1,2,3}d functionals (#28917 ) Summary: This PR changes the implementation of C++ Conv{1,2,3}d layers to exactly match the Python version, and add F::conv{1,2,3}d functionals. For more thorough testing, I will rely on the parity test mechanism which uses values from `common_nn.py` to generate the inputs and options that we are interested in testing. This PR is BC-breaking in the following way: In `Conv{1,2,3}dOptions`: - `with_bias` is renamed to `bias`. - `input_channels` is renamed to `in_channels`. - `output_channels` is renamed to `out_channels`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28917 Differential Revision: D18471526 Pulled By: yf225 fbshipit-source-id: 7a33f60654ad93cc2e043245e7ff9e0ef9da15b3	2019-11-13 12:53:31 -08:00
James Donald	7f485121a6	Avoid MSVC _cvtsh_ss() workaround with clang-cl (#29726 ) Summary: We (me fnabulsi bmcdb) have a handful of fixes used locally to build and run with clang-cl. I am aware of https://github.com/pytorch/pytorch/issues/8784 but it has not been touched in almost a year. It may be more practical to upstream the non-controversial fixes piecewise. For example, this one. Here, the dummy version of `_cvtsh_ss` for MSVC is not required (and hence causes conflicts) when using clang-cl so can be #ifdef'd out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29726 Differential Revision: D18478120 Pulled By: ezyang fbshipit-source-id: cdcd94251e68347446f2ad1ac5a0e71089f7d0ab	2019-11-13 12:49:13 -08:00
Richard Zou	ed215b1c03	named tensor support for torch.equal (#29322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29322 torch.equal checks if two tensors are equal in both size and values. For named tensors, it also checks that the names are exactly equal. There is an argument to be made for alternative semantics (check that the names match), but for an API that is called "equal" I would expect it to check equality on names as well. Test Plan: - new tests Differential Revision: D18453387 Pulled By: zou3519 fbshipit-source-id: d52bde4e3fdd7f331eef097a3b31d35c89c78049	2019-11-13 12:45:06 -08:00
Richard Zou	5e64cfa663	Make TensorName::unifyFromRight in-place for efficiency (#29307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29307 In our name inference functions we currently create an extra TensorNames every time we unify names. This isn't completely necessary. To do this, I made the following changes: - TensorName now has two states, initialized and uninitialized - Renamed unifyFromRight to unifyFromRightInplace. Test Plan: - `pytest test/test_namedtensor.py -v` Differential Revision: D18453388 Pulled By: zou3519 fbshipit-source-id: 96c3c6fd9478d57e92e1cf770c864aeac6d29dd2	2019-11-13 12:45:01 -08:00
Michael Suo	6de1016f9d	switch back to azure pipelines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29740 Test Plan: Imported from OSS Differential Revision: D18482697 Pulled By: suo fbshipit-source-id: 72a454457a005f82683079b79a77343e20c34021	2019-11-13 11:50:38 -08:00
svcscm	73a926fd5d	Updating submodules Summary: GitHub commits: `756806e65b` `9feea971d1` `3eeb12badf` `bb23bfe63c` `4e8cee1305` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7b3adb4e20270aa7210e1a178ab26b0f47920861	2019-11-13 11:15:27 -08:00
Yinghai Lu	f0dd7517f2	Add option to clean up allocated activations between c2 runs (#29619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29619 att Reviewed By: houseroad Differential Revision: D18415190 fbshipit-source-id: 739aaf436578fac635df10de42b35e2b4368df37	2019-11-13 10:30:10 -08:00
Lu Fang	03d021ddb8	Allow unrelated histories when rebasing to master (#29699 ) Summary: Some prs are refused to be merged. For example, https://github.com/pytorch/pytorch/pull/29595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29699 Reviewed By: hl475 Differential Revision: D18473531 Pulled By: houseroad fbshipit-source-id: e7a4eb1b4be9d9da6dc281575eeb4d7ae685b531	2019-11-13 09:50:43 -08:00
Edward Yang	5635a72069	Revert D18451046: CPU-Strided-Complex Fixes for real and imag ops Test Plan: revert-hammer Differential Revision: D18451046 Original commit changeset: b9dcd8e25e91 fbshipit-source-id: efd30957fc551fe8bf335d66b69e30af63b71752	2019-11-13 09:00:16 -08:00
Johannes M Dieterich	6d54c5ddd2	Missing host device (#29547 ) Summary: Missing `__device__` and `__host__` annotations in the complex case. Make it less UB. Note that this still rather unsavory code: `std::real` is only `constexpr` from C++14 on onwards ( https://en.cppreference.com/w/cpp/numeric/complex/real2 ) which is the requirement for `__device__`. What I am trying to say is: this particular piece of code should not have passed review and not been merged, IMHO, as it tries to codify UB. Also note that the benchmarks referenced in source were CPU and CUDA-only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29547 Differential Revision: D18428156 Pulled By: bddppq fbshipit-source-id: 855ced903ef91bd7f82fcd3a2167ae59bdd30d8b	2019-11-13 08:32:08 -08:00
Dylan Bespalko	9b1ff8090d	CPU-Strided-Complex Fixes for real and imag ops (#29607 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) - [x] Replaced std:real(a) with a.real() in kernel level code. - [x] Fixed Vec256_base implementation of complex ops so that it works correctly on Non-AVX devices. - [ ] Clean up CopyKernel after https://github.com/pytorch/pytorch/issues/29612 is approved. zasdfgbnm is fixing this issue in https://github.com/pytorch/pytorch/issues/29612. This should be added first. cc: iotamudelta, ezyang, bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/29607 Differential Revision: D18451046 Pulled By: ezyang fbshipit-source-id: b9dcd8e25e91cab13bd131b070d027b090cdedc9	2019-11-13 08:19:40 -08:00
Edward Yang	0c91ebb694	Delete all trivial uses of make_variable. (#29213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29213 A trivial use of make_variable is one where requires_grad=False. This transformation is not technically semantics preserving, as make_variable will create a shallow copy of the tensor in question; however, I am guessing that we have the invariant that we don't actually make use of this shallow copy in a nontrivial way. There were some cases where the surrounding code expected a Variable proper to be returned; I retained those sites. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18353503 Pulled By: ezyang fbshipit-source-id: 57fe34d82e009c0cc852266fb0b79d6d9c62bb03	2019-11-13 07:43:41 -08:00
Edward Yang	89e187a2f5	Miscellaneous follow up for code review comments (#29204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29204 Code review comments from #28620 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18353506 Pulled By: ezyang fbshipit-source-id: 0432ce513eff257fd85cddff8bc3e41935127ed8	2019-11-13 07:43:36 -08:00
Edward Yang	30092df15e	Rename getNonVariableDeprecatedTypeProperties to getDeprecatedTypeProperties (#29203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29203 There is no more Variable/Tensor distinction, so fix the misleading name. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18353505 Pulled By: ezyang fbshipit-source-id: dadc394d533ab7746f70bc186c6645441a784518	2019-11-13 07:43:32 -08:00
Edward Yang	7da9ac5afd	Revert D18455666: Gradle build with offline dependencies Test Plan: revert-hammer Differential Revision: D18455666 Original commit changeset: 8fb0b54fd94e fbshipit-source-id: 559903b42cf7e5763099cf33f02940035c8505df	2019-11-13 07:24:13 -08:00
Edward Yang	715e951e3c	Revert D18458751: use new overload mechanism for rnns Test Plan: revert-hammer Differential Revision: D18458751 Original commit changeset: 07c71838f21c fbshipit-source-id: 86acb02f3e022e93ea6c1ef23fe39c80ad43978f	2019-11-13 07:21:31 -08:00
peter	e870a9a870	More checks on MSVC (#29709 ) Summary: The flags `/sdl` and `/permissive-` are switched on automatically when using the VS GUI. Adding those checks will ensure that those annoying errors won't appear when users use the VS GUI to build their project. More info: https://docs.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=vs-2017 https://docs.microsoft.com/en-us/cpp/build/reference/permissive-standards-conformance?view=vs-2017 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29709 Differential Revision: D18473888 Pulled By: bddppq fbshipit-source-id: 21156b0232a5dc3b566d14491d00bacb11493254	2019-11-13 00:15:40 -08:00
Ailing Zhang	7b86199fc0	Switch XLA to only override abstract functions (#29636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29636 This is a followup of li-roy 's work https://github.com/pytorch/pytorch/pull/23282. (I messed up the rebase there :( After https://github.com/pytorch/xla/issues/1225 is done we are good to move the integration to only override abstract functions. This PR contains a TODO which I'll remove in next 2 followup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29438 Reviewed By: ljk53 Differential Revision: D18445927 Pulled By: ailzhang fbshipit-source-id: 52ea98626d6d6140241b5a4796a5c0d0c1b922ba	2019-11-13 00:09:37 -08:00
Hong Xu	3a72662d01	Restructure comparison ops so as to better support XLA dispatch (#29591 ) Summary: Per ailzhang's suggestion in https://github.com/pytorch/pytorch/pull/28162#discussion_r344361926, this PR changes the implementation of binary comparison and logical ops to those of unary ops in UnaryOps.cpp. The reason is that the call should eventually go through at::op_out (e.g., at::logical_xor_out). The check for Boolean output tensor is also removed, because: - This check should only apply to _out functions but not on other variants. However, other variants must go through the _out variant eventually. - It does not have a clear motivation and seems unnecessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29591 Differential Revision: D18460113 Pulled By: ailzhang fbshipit-source-id: 58d501e59335186b3b8cc7d80ee9eed74efeeac8	2019-11-12 23:42:30 -08:00
Zafar Takhirov	09d359dfd9	Changed default args in quantization observers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29640 Test Plan: Imported from OSS Differential Revision: D18447297 Pulled By: z-a-f fbshipit-source-id: 7c86a5bb467a2fad8fe30c935d9c031c69868296	2019-11-12 23:32:05 -08:00
Zafar Takhirov	d2aa4c611f	observer benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29508 Test Plan: Imported from OSS Differential Revision: D18415171 Pulled By: z-a-f fbshipit-source-id: 5ebedee8c17448e36853e0c1bf778bb128975678	2019-11-12 23:28:10 -08:00
Ivan Kobzarev	d8732b3b43	Gradle build with offline dependencies (#29262 ) Summary: https://github.com/pytorch/pytorch/issues/29159 Introducing GRADLE_OFFLINE environment variable to use '--offline' gradle argument which will only use local gradle cache without network. As it is cache and has some expiration logic - before every start of gradle 'touch' files to update last access time. Deploying new docker images that includes prefetching to gradle cache all android dependencies, commit with update of docker images: `df07dd5681` Reenable android gradle jobs on CI (revert of `54e6a7eede`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29262 Differential Revision: D18455666 Pulled By: IvanKobzarev fbshipit-source-id: 8fb0b54fd94e13b3144af2e345c6b00b258dcc0f	2019-11-12 22:48:23 -08:00
James Reed	20fb8a814c	PackedSequence support for quantized LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29585 Test Plan: Imported from OSS Differential Revision: D18436569 Pulled By: jamesr66a fbshipit-source-id: 0f32c0fcc897894e30d8e7ff203392c1a961ce60	2019-11-12 20:13:38 -08:00
Will Feng	87363a8102	Revert D18466043: Pin Linux image and modules version to 4.4.0-166 Test Plan: revert-hammer Differential Revision: D18466043 Original commit changeset: d3c69c9ab3bf fbshipit-source-id: 49365be7edf82923ade9c17b862f6e942c62b1ac	2019-11-12 19:08:44 -08:00
Junjie Bai	5a8ad66354	Do not show cuda stats in autograd profiler when `use_cuda=False` (#29666 ) Summary: Example ```python import torch x = torch.randn(1) with torch.autograd.profiler.profile(use_cuda=False) as prof: x + x print(prof.key_averages().table(sort_by='cpu_time_total')) ``` Before: ``` ------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CUDA total % CUDA total CUDA time avg Number of Calls ------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- add 100.00% 25.781ms 100.00% 25.781ms 25.781ms NaN 0.000us 0.000us 1 ------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 25.781ms CUDA time total: 0.000us ``` After: ``` ------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------- --------------- --------------- --------------- --------------- --------------- --------------- add 100.00% 25.037ms 100.00% 25.037ms 25.037ms 1 ------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 25.037ms ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29666 Differential Revision: D18458828 Pulled By: bddppq fbshipit-source-id: d96ef4cec8b1e85b77c211292a3099048882734d	2019-11-12 17:53:20 -08:00
Richard Zou	95cad57340	Turn on named tensors for all builds (#29603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29603 Previously, named tensors were off for the internal caffe2 xplat builds. However, we have since excised the caffe2 xplat build's dependencies on PyTorch. This makes it so that we can turn on named tensors for all builds. Test Plan: - Wait for CI Differential Revision: D18439084 Pulled By: zou3519 fbshipit-source-id: f1cc405d0ce9ffe991eff1bbb80575ce87c02d4a	2019-11-12 17:03:26 -08:00
Will Feng	907a29de70	Pin Linux image and modules version to 4.4.0-166 (#29690 ) Summary: When installing the 4.4.0-168 version, the following error is thrown (e.g. in https://app.circleci.com/jobs/github/pytorch/pytorch/3577840): ``` Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: linux-image-generic : Depends: linux-image-4.4.0-168-generic but it is not going to be installed or linux-image-unsigned-4.4.0-168-generic but it is not installable Depends: linux-modules-extra-4.4.0-168-generic but it is not installable Recommends: thermald but it is not going to be installed E: Unable to correct problems, you have held broken packages. ``` The (temporary) solution is to pin the Linux image and modules version to 4.4.0-166. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29690 Differential Revision: D18466043 Pulled By: yf225 fbshipit-source-id: d3c69c9ab3bf505c6eb3a2edd138e9789b62b6d6	2019-11-12 17:00:29 -08:00
Zafar Takhirov	29e509ff1d	Fix a missing comma in quantized benchmark Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29685 Test Plan: Imported from OSS Differential Revision: D18463246 Pulled By: z-a-f fbshipit-source-id: c21fd7892f3701afcc5faa8bc03f98b6f6550d0f	2019-11-12 16:50:46 -08:00
Ailing Zhang	8875120b54	Make dropout condition on training. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29436 Reviewed By: bddppq Differential Revision: D18438288 Pulled By: ailzhang fbshipit-source-id: d9c6fe4bd734dc87b2154b0ccd80efcb61740ec9	2019-11-12 16:32:02 -08:00
Xingying Cheng	422fbfb108	Fix some issues for lite interpreter internal build. (#29620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29620 Modify buck for lite interpreter to build successfully on internal integration. ghstack-source-id: 93733618 Test Plan: buck build xplat/caffe2:torch_mobile_coreAndroid Reviewed By: iseeyuan Differential Revision: D18438105 fbshipit-source-id: d6f6615623a385383105763733607c3872c89c42	2019-11-12 16:16:42 -08:00
Hong Xu	bd0394d473	Add op bitwise_xor to replace __xor__ and __ixor__ (#25665 ) Summary: We define `bitwise_xor` instead of `__xor__` and `__ixor__`. The reason is that (a) it is not idiomatic to call functions starting and ending with double underscores, and that (b) the types of argument that we can add is limited (e.g., no out), and that (c) consistent with the naming of `bitwise_not` and numpy. Fix https://github.com/pytorch/pytorch/issues/24513, Fix https://github.com/pytorch/pytorch/issues/24517, Fix https://github.com/pytorch/pytorch/issues/24660, Fix https://github.com/pytorch/pytorch/issues/24664 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25665 Differential Revision: D17577143 Pulled By: VitalyFedyunin fbshipit-source-id: 042f6385f9305bd66d50a8ce82e28f40a23a7266	2019-11-12 16:14:04 -08:00
Elias Ellison	8e7b406773	use new overload mechanism for rnns (#29614 ) Summary: Uses new overload mechanism for rnns, making it so that python & torchscript go through the same path and using an API that is in line with the one specified in https://docs.python.org/3/library/typing.html#typing.overload This brings the TorchScriptable rnns closer to the base implementation; unifying them should be done in a follow up PR but there are still a few limitations that make it difficult to do so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29614 Differential Revision: D18458751 Pulled By: eellison fbshipit-source-id: 07c71838f21cb5425e8d6dbd4a512f774c8c2970	2019-11-12 16:12:04 -08:00
Will Feng	433baf1b90	Change arg dtype from float to double in LPPool and nn/utils/clip_grad.h (#29584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29584 In Python, `float` dtype is always 64-bit (https://stackoverflow.com/a/8216110), and the C++ equivalent APIs should take `double` dtype to match the bit length. Test Plan: Imported from OSS Differential Revision: D18436616 Pulled By: yf225 fbshipit-source-id: ece510bba6f089ccada03af216f4805bbd03f5f2	2019-11-12 16:05:35 -08:00
Will Feng	65bfcde05e	Use c10::variant-based enums for SmoothL1Loss module and functional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29536 Test Plan: Imported from OSS Differential Revision: D18432272 Pulled By: yf225 fbshipit-source-id: fa355145962e93025b7de98b99b0a4fc82e8c871	2019-11-12 16:05:31 -08:00
Will Feng	57eab22c6a	Use c10::variant-based enums for F::grid_sample Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29535 Test Plan: Imported from OSS Differential Revision: D18432273 Pulled By: yf225 fbshipit-source-id: 11476f0431a9b544dfb62bc7a89bab84399f9b83	2019-11-12 16:05:26 -08:00
Will Feng	9f879ef532	Make all non-input arguments to functionals part of its options (#29404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29404 This PR makes all non-input arguments to functionals part of its options parameters, so that we won't break backward compatibility even if we add or reorder some of the non-input arguments to functionals in the future. Test Plan: Imported from OSS Differential Revision: D18378526 Pulled By: yf225 fbshipit-source-id: f5cf6bdfb844e75bf94fdee58c121e0955631b6e	2019-11-12 16:05:22 -08:00
Pritam Damania	c3b2c2e353	Design doc for distributed autograd. (#29175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29175 Updates our docs to include a design doc for distributed autograd. Currently, this doc only covers the FAST mode algorithm. The Smart mode algorithm section just refers to the original RFC. There is a section for Distributed Optimizer that we can complete once we've finalized the API for the same. ghstack-source-id: 93701129 Test Plan: look at docs. Differential Revision: D18318949 fbshipit-source-id: 670ea1b6bb84692f07facee26946bbc6ce8c650c	2019-11-12 15:04:23 -08:00
Junjie Bai	b0c245d52d	Consolidate the places that find pybind11 include dirs (#29659 ) Summary: Also move the logic that installs the pybind11 headers from setup.py to cmake (to align with other headers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29659 Differential Revision: D18458208 Pulled By: bddppq fbshipit-source-id: cfd1e74b892d4a65591626ab321780c8c87b810d	2019-11-12 14:51:56 -08:00
Jerry Zhang	fd8f74e688	Remove observer module after insert_quant_dequant (#29622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29622 Remove the observer module in the quantized model Test Plan: python test/test_jit.py 'TestJit.test_insert_quant_dequant' Differential Revision: D18442888 Pulled By: jerryzh168 fbshipit-source-id: 22c777569af0e814661fe51f76341b39600fae0d	2019-11-12 14:48:40 -08:00
Elias Ellison	fbe90b65fa	Cleanup special handling of Containers, allowing custom forwards (#28988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28988 Make ModuleList, Sequential, ModuleDict go through the same pathway as other modules, cleaning up a bunch of code and allowing them to define custom forwards and other methods. EDIT: Previously, we would ignore an nn.Sequential attribute if it was not in `__constants__` ("did you forget to add it to Constants"). This PR scripts it even if it is not in `__constants__`. Is that what we want? Test Plan: Imported from OSS Differential Revision: D18402821 Pulled By: eellison fbshipit-source-id: dd4f28fb0df0d1ba4ad1b3bc34ba141959a433f7	2019-11-12 14:10:38 -08:00
Elias Ellison	3175f5543a	Make nn.Sequential iterable (#28987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28987 We have `__iter__` defined on nn.ModuleList. Chainer's `Sequential` defines `__iter__`. This will also be helpful in modules which extend `nn.Sequential` and define a custom forward, because they can use the `for x in self` syntax that is supported in both python & TorchScript. Test Plan: Imported from OSS Differential Revision: D18402822 Pulled By: eellison fbshipit-source-id: 1ece0f891a9d37f401e232320f58b056d5481856	2019-11-12 14:10:34 -08:00
Anjali Chourdia	eeb7199ccc	updated name_inference doc for cumsum and cumprod (#29453 ) Summary: cumsum/cumprod perform their own respective operations over a desired dimension, but there is no reduction in dimensions in the process, i.e. they are not reduction operations and hence just keep the input names of the tensor on which the operation is performed Pull Request resolved: https://github.com/pytorch/pytorch/pull/29453 Differential Revision: D18455683 Pulled By: anjali411 fbshipit-source-id: 9e250d3077ff3d8f3405d20331f4b6ff05151a28	2019-11-12 13:43:47 -08:00
Zafar Takhirov	9bb0e2834d	Fixing data type in quantized pool benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29663 Test Plan: Imported from OSS Differential Revision: D18456671 Pulled By: z-a-f fbshipit-source-id: b36fc56e4f29937e458308f4c13f7a5e37665269	2019-11-12 13:22:53 -08:00
Jerry Zhang	82913a266d	Skip copy_same_type_transpose_ for quantized tensor (#29609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29609 We can enable this path later if there is a need. trying to fix: https://github.com/pytorch/pytorch/issues/29435 Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18453723 fbshipit-source-id: 3dc774f6b7da5cdf33deb6676d8612d21ed4b5a9	2019-11-12 13:16:38 -08:00
Zafar Takhirov	3b43cfde80	Benchmarking per channel quantization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29627 Test Plan: Imported from OSS Differential Revision: D18443929 Pulled By: z-a-f fbshipit-source-id: a0345cc5e259b4ce98589252719b8885326d43a3	2019-11-12 11:33:42 -08:00
Zafar Takhirov	5db361bd32	Quantized interpolation benchmarks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29509 Test Plan: Imported from OSS Differential Revision: D18415367 Pulled By: z-a-f fbshipit-source-id: 84d0aaa81b131b49762edde6ade27e61acb99a42	2019-11-12 11:23:03 -08:00
Xingying Cheng	9c9c361f67	Separate out pytorch_jni into pytorch_jni_jit and pytorch_jni_common. (#29617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29617 As for internal build, we will use mobile interpreter instead of full jit, so we will need to separate the existing pytorch_jni.cpp into pytorch_jni_jit.cpp and pytorch_jni_common.cpp. pytorch_jni_common.cpp will be used both from pytorch_jni_jit.cpp(open_source) and future pytorch_jni_lite.cpp(internal). ghstack-source-id: 93691214 Test Plan: buck build xplat/caffe2/android:pytorch Reviewed By: dreiss Differential Revision: D18387579 fbshipit-source-id: 26ab845c58a0959bc0fdf1a2b9a99f6ad6f2fc9c	2019-11-12 11:13:44 -08:00
Zafar Takhirov	f95e8ea1be	Benchmarking quantized methods (#29625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29625 This PR also adds a template for benchmarking methods that require no input. Test Plan: Imported from OSS Differential Revision: D18443485 Pulled By: z-a-f fbshipit-source-id: 6f25c3a7cd94e396c112b5f7c33307b71f78ecd3	2019-11-12 11:08:55 -08:00
Junjie Bai	f111f1b1a7	Suppress implicit int-float conversion warning in ROCm build (#29604 ) Summary: ``` c10/util/Half.h:467:37: warning: implicit conversion from 'long' to 'double' changes value from 9223372036854775807 to 9223372036854775808 [-Wimplicit-int-float-conversion] return f < limit::lowest() \|\| f > limit::max(); ~ ^~~~~~~~~~~~ c10/util/Half.h:497:41: note: in instantiation of function template specialization 'c10::overflows<long, double>' requested here if (!std::is_same<To, bool>::value && overflows<To, From>(f)) { ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29604 Differential Revision: D18440713 Pulled By: bddppq fbshipit-source-id: f059b4e37e90fa84308be52ff5e1070ffd04031e	2019-11-12 10:44:28 -08:00
Junjie Bai	949d6ae184	Fix jit tracing namedtuple (#29477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29477 When passing in a namedtuple as trcing input, __clone_inputs will call into `torch.autograd.function._nested_map` and https://github.com/pytorch/pytorch/blob/593bb14/torch/autograd/function.py#L256 will run into error (because namedtuple doesn't support this style of constructor). ghstack-source-id: 93586773 Differential Revision: D18405504 fbshipit-source-id: 8d0135cff0bdaaabcf6e06fac63df0f75c0c50b9	2019-11-12 10:38:20 -08:00
Xiang Gao	450949c7fe	Complex support on GPU for dynamic casting (#29612 ) Summary: Currently, the dynamic casting mechanism is implemented assuming no support of complex on GPU. This will no longer be true in the soon future. https://github.com/pytorch/pytorch/pull/29547 could clear some clang warning but the complex support on GPU is still not complete: - fetch is not supported - casting between complex64 and complex128 is not supported - complex scalar types are not tested This PR is what should be done for type promotion in order to add support to complex dtype on GPU, as suggested in https://github.com/pytorch/pytorch/issues/755#issuecomment-552631381 Note that what is newly added here in this PR is not tested due to the lack of basic support of complex dtypes (I can not construct a complex tensor). But his PR shouldn't break any existing part of PyTorch. For the merge this PR, consider two options: - We could merge this PR now so that dylanbespalko could conveniently work based on master, if there is something wrong here not found by code review, dylanbespalko would find when adding complex integration. - Or, we could just leave this PR open, don't merge it. But then dylanbespalko might need to manually apply this to his branch in order to support type promotion of complex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29612 Differential Revision: D18451061 Pulled By: ezyang fbshipit-source-id: 6d4817e87f0cc2e844dc28c0355a7e53220933a6	2019-11-12 09:57:16 -08:00
Jithun Nair	7073ee2090	Enable test_distributed for ROCm but only with nccl backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28814 Differential Revision: D18437300 Pulled By: ezyang fbshipit-source-id: bf1ab68e0fde683e0082f6c9fe2fc20e2bc8fc06	2019-11-12 07:52:30 -08:00
Zafar Takhirov	3b452ca428	quantized topk benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29505 Test Plan: Imported from OSS Differential Revision: D18414851 Pulled By: z-a-f fbshipit-source-id: 23999ef95c2f087066c4da36b2bf35516ebc0421	2019-11-12 00:33:47 -08:00
Zafar Takhirov	a0d4d5062b	Quantized unary ops benchmarking (mostly template) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29503 Test Plan: Imported from OSS Differential Revision: D18414589 Pulled By: z-a-f fbshipit-source-id: ab5af490359b3e0a51642a46aef86f7be720deff	2019-11-11 23:48:36 -08:00
svcscm	e651494d47	Updating submodules Summary: GitHub commits: `e27d9b5733` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 78ff76d3a979182d0f943bba85461ce80aa4b790	2019-11-11 23:26:47 -08:00
Michael Suo	2fb4059652	change `drop_on_export` warning category (#29610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29610 `DeprecationWarning` is intended for developers (and so is ignored in certain circumstances). `FutureWarning` is the user-facing deprecation warning. This fixes fbcode failures. Test Plan: Imported from OSS Differential Revision: D18446393 Pulled By: suo fbshipit-source-id: ded11a007f0a62132a9839b733157a97cf9006e9	2019-11-11 23:24:27 -08:00
Jianyu Huang	bbff06ee96	Convert conv_prepack to conv2d_prepack and conv_unpack to conv2d_unpack (#29529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29529 Pull Request resolved: https://github.com/pytorch/glow/pull/3771 We would like to replace `conv_prepack` with `conv2d_prepack` and `conv_unpack` with `conv2d_unpack`. This makes the naming consistent between 2D and 3D conv: ``` torch.ops.quantized.conv2d_prepack torch.ops.quantized.conv2d_unpack torch.ops.quantized.conv2d torch.ops.quantized.conv3d_prepack torch.ops.quantized.conv3d_unpack torch.ops.quantized.conv3d ``` We should do this earlier rather than later when we have more users for the quantized conv2d ops, for better engineering. The replacement bash command is as the follows: ``` find ./ -type f -exec sed -i -e 's/quantized::conv_prepack/quantized::conv2d_prepack/g' {} \; find ./ -type f -exec sed -i -e 's/quantized::conv_unpack/quantized::conv2d_unpack/g' {} \; find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_prepack/torch.ops.quantized.conv2d_prepack/g' {} \; find ./ -type f -exec sed -i -e 's/torch.ops.quantized.conv_unpack/torch.ops.quantized.conv2d_unpack/g' {} \; ``` ghstack-source-id: 93661879 Test Plan: CI Reviewed By: jackm321 Differential Revision: D18421079 fbshipit-source-id: 17ae8b1ee79223bd2c5d4bbccd57af6580c4ab12	2019-11-11 21:54:10 -08:00
Lara	2acca09e1a	Add Support for ONNX scripting Interpolate with missing shape (#29489 ) Summary: - Add support for missing case where interpolate is exported with missing shape information in scripting - Add warnings Pull Request resolved: https://github.com/pytorch/pytorch/pull/29489 Reviewed By: hl475 Differential Revision: D18438872 Pulled By: houseroad fbshipit-source-id: d01f833bec0cc4e881ddc18e7054d22f54e9886b	2019-11-11 21:20:14 -08:00
svcscm	8db06732bf	Updating submodules Summary: GitHub commits: `a5381c4d13` `a19de78da5` `b5a7d0259c` `8c4e217115` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 26a91452c36caab109dad713fb04b71551f36a90	2019-11-11 19:12:55 -08:00
Ivan Kobzarev	0c9e672727	Apply the latest master docker images(jni.h in every image) (#29588 ) Summary: Applying the latest docker images from master: (the latest PR of docker images: https://github.com/pytorch/pytorch-ci-dockerfiles/pull/2 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29588 Differential Revision: D18442848 Pulled By: IvanKobzarev fbshipit-source-id: bcb9cca54632d1e83f1b922ebb267b1122c1f56e	2019-11-11 18:41:38 -08:00
Lara	8b53515b8a	Add ONNX Export Support for torch.scalar_tensor (#28713 ) Summary: Support exporting torch.scalar_tensor() to ONNX. This will allow making operations on dynamic scalars (like x.size(dim) where x is a tensor of dynamic shape) and exporting them to ONNX. This is a dummy example of operations that could not be exported dynamically before this PR: ``` size_x = x.size(0) size_y = y.size(0) size_x_y_static = torch.tensor([size_x , size_y]) # size_x_y_static is traced as constant size_x = torch.scalar_tensor(size_x).unsqueeze(0) size_y = torch.scalar_tensor(size_y).unsqueeze(0) size_x_y_dynamic = torch.cat((size_x , size_y)) # size_x_y_dynamic is dynamic and depends on x and y's size ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28713 Reviewed By: hl475 Differential Revision: D18438880 Pulled By: houseroad fbshipit-source-id: c1651e480a41602c7c7452ffc4acba40a2b3827c	2019-11-11 18:27:49 -08:00
Ivan Kobzarev	5249c43d93	Disable android gradle jobs (#29606 ) Summary: # disabled until fixing https://github.com/pytorch/pytorch/issues/29159 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29606 Differential Revision: D18443452 Pulled By: IvanKobzarev fbshipit-source-id: 5a12d7b3d214203037e78552b6289752ac1b8192	2019-11-11 18:27:44 -08:00
Zafar Takhirov	fb07098e2b	Creating a base benchmarking class for activations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29182 Test Plan: Imported from OSS Differential Revision: D18319456 Pulled By: z-a-f fbshipit-source-id: d2314bb30a584551b5f1c8610b36c4c10c27ac85	2019-11-11 18:24:44 -08:00
Igor Fedan	7df854bddd	explicitly provide memory format when calling to clone() at prune.py (#29593 ) Summary: Currently clone() has parameter memory_format with default value as Contiguous. In the future it will be changed to different default memory format - Preserve. To avoid any potencial issues, specify memory_format explicitly Pull Request resolved: https://github.com/pytorch/pytorch/pull/29593 Differential Revision: D18439783 Pulled By: ifedan fbshipit-source-id: e7ed6c19ee227990214d44c562c26a7250981324	2019-11-11 18:07:06 -08:00
Vitaly Fedyunin	bf61405ed6	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29387 Test Plan: Imported from OSS Differential Revision: D18429729 Pulled By: VitalyFedyunin fbshipit-source-id: c71264ed5d64ed7e5d8ea907413b6b8e7b67769a	2019-11-11 17:57:34 -08:00
Vitaly Fedyunin	8df602400b	explicitly provide memory format when calling to *_like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29386 Test Plan: Imported from OSS Differential Revision: D18429727 Pulled By: VitalyFedyunin fbshipit-source-id: 2e9d72d9168a81f7d7cc8f07d3be3a6480faec52	2019-11-11 17:57:30 -08:00
svcscm	858b2010ae	Updating submodules Summary: GitHub commits: `3f47103c72` `72f73d40d8` `5082d158b3` `03ce7fb292` `f0d0e0dc38` `807685d4eb` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 30634d39f7f50212793d7abf3c0488c8822e17f5	2019-11-11 17:37:44 -08:00
Edward Yang	1bb5209f7e	Back out "Revert D18299298: [pytorch][PR] Migrate conv3d from TH to ATen (CPU)" (#29286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29286 Original commit changeset: 33057d5a91d1 ghstack-source-id: 93638554 Test Plan: sandcastle and ossci Differential Revision: D18349945 fbshipit-source-id: 9d9ddb0c185248a2073ade1063bb69ffbfa48b46	2019-11-11 17:33:14 -08:00
Vitaly Fedyunin	ddeeb561c3	Revoking mutually exclusive requirement on channels last and contiguous tensor (#28466 ) Summary: The old implementation assumed `is_channels_last_contiguous_` to be mutually exclusive to `is_contiguous_`, which is not true. Properly set the flag by checking strides. Original Pull Request resolved: https://github.com/pytorch/pytorch/pull/24113 Original GitHub Author: jjsjann123 <jiej@nvidia.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/28466 Differential Revision: D16860715 Pulled By: VitalyFedyunin fbshipit-source-id: facd19d3501b6566d77c46199567e0cd051a6b49	2019-11-11 17:29:39 -08:00
Jerry Zhang	70f886ffa4	Revert D18253777: Remove observer module after insert_quant_dequant Test Plan: revert-hammer Differential Revision: D18253777 Original commit changeset: 26081c4c3fd3 fbshipit-source-id: 88f330c34976030c9310e7982fa6ae74e093ebbf	2019-11-11 17:09:58 -08:00
Mingzhe Li	af3468a1c7	change op bench input shape to reduce execution time (#29616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29616 1. Reduce the predefined_min_time which is the minimum time each test needs to run. Based on the test result, the average time across different epoch are pretty stable before exiting. So we can safely reduce the predefined time here. 2. Chang the input shapes of several ops Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add 200 256.044864655 400 165.850520134 800 163.579881191 1600 162.871927023 3200 160.3128016 # Mode: Eager # Name: add_cpu_M64_K64_bwd1_N64 # Input: device: cpu, K: 64, M: 64, N: 64 Backward Execution Time (us) : 164.715 # Benchmarking PyTorch: add 200 170.650482178 400 168.895125389 800 169.867575169 1600 163.400024176 3200 168.658420444 # Mode: Eager # Name: add_cpu_M64_K64_bwd2_N64 # Input: device: cpu, K: 64, M: 64, N: 64 Backward Execution Time (us) : 168.777 Reviewed By: hl475 Differential Revision: D18438540 fbshipit-source-id: 1fd27cf4bbc34e46e74393af912ee2fcb75c33b2	2019-11-11 16:58:27 -08:00
Mingzhe Li	7374dd0d52	remove SkipInputShape flag (#29615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29615 Remove that flag as it's not needed any more. Test Plan: na Reviewed By: hl475 Differential Revision: D18440271 fbshipit-source-id: 41b0659c72ef746a1cc268174fd1e7dc2beb1ae2	2019-11-11 16:56:40 -08:00
Jerry Zhang	fdcb203e8e	Identify weights and bias by argument position in aten call (#29147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29147 Previously we use a vector of weight and bias to record the values of weight/bias and we assume we'll get them by GetAttr nodes, then we propagate these values through the function calls However, it doesn't work if we also do some transformations on these values right now, we'll need to mark all the values that's produced by weight/bias as weight/bias, e.g. ``` %w = GetAttr[name="weight"](%conv) %wt = aten::transpose(%w) %r = aten::conv2d(..., %wt, ...) ``` we'll mark both %w and %wt as weight. This is a bit over compilicated to support this. Alternatively, we can identify weights by argument positions, e.g. for call %r = aten::conv2d(..., %w, ...), we know the argument 1 is weight, argument 2 is bias. Test Plan: test_jit.py Imported from OSS Differential Revision: D18362839 fbshipit-source-id: afbf07f48bab8d01c5be1c882561a0255730a6b9	2019-11-11 16:40:56 -08:00
Jerry Zhang	587996ef04	Remove observer module after insert_quant_dequant (#28985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28985 Remove the observer module in the quantized model Test Plan: python test/test_jit.py 'TestJit.test_insert_quant_dequant' Imported from OSS Differential Revision: D18253777 fbshipit-source-id: 26081c4c3fd3dc049cafa8c0383219bc4c233589	2019-11-11 16:31:01 -08:00
svcscm	81116fd7cd	Updating submodules Summary: GitHub commits: `2bdb5a4a7c` `dfd5219816` `66f868b745` `0c4130d051` `c912150192` `c17384fea4` `e0b2156829` `7aef78fb2e` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 82552466afa665f0e335d5dce385dfcae9247b0b	2019-11-11 16:18:18 -08:00
Michael Suo	a9308f9d8b	py2 fix Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29613 Test Plan: Imported from OSS Differential Revision: D18440212 Pulled By: suo fbshipit-source-id: 4e25a599ea2c649d0b6b4531da5df9b00e7f6380	2019-11-11 16:15:51 -08:00
Mingzhe Li	b5a38fa98e	update op bench readme (#29596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29596 as title Test Plan: na Reviewed By: hl475 Differential Revision: D18437811 fbshipit-source-id: 7996d1689d8a46849b62b2b3875c67cf8dc5861c	2019-11-11 15:33:29 -08:00
Michael Suo	a09197561e	correctly share types between traced modules (#29583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29583 The normal flow for type sharing assumes that we will construct the `ConcreteModuleType`, then use `operator==` to decide whether or not to reuse an existing JIT type. In this case, `jitType_` is not populated, so it doesn't make sense to compare it. However, there is one exception to this flow: for traced modules, we pre-compute the JIT type and poke it into the `ConcreteModuleType` manually. To handle this case, we should compare the `jitType_`s in `operator==` like everything else. Test Plan: Imported from OSS Differential Revision: D18435949 Pulled By: suo fbshipit-source-id: 44b7672a686015aaf02f6664c6aff00e165fde65	2019-11-11 15:01:35 -08:00
Michael Suo	1a9e5dad81	Improve `ConcreteModuleType::dump()` (#29582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29582 Give it more info, fix a segfault Test Plan: Imported from OSS Differential Revision: D18435950 Pulled By: suo fbshipit-source-id: 43c695ffe1f13f33df69c6e51caa531f8b993208	2019-11-11 15:01:31 -08:00
Mingzhe Li	00c224f0f2	move quantized tests from benchmark_all_test to benchmark_all_quantized_test (#29590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29590 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iteration 1 Parsing buck files: finished in 1.0 sec Creating action graph: finished in 43.0 sec Building: finished in 16.0 sec (100%) 10053/10053 jobs, 1 updated Total time: 01:00.0 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 45419.667 ... buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test Parsing buck files: finished in 1.0 sec Building: finished in 6.0 sec (100%) 10053/10053 jobs, 1 updated Total time: 7.0 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: QReLU # Mode: Eager # Name: QReLU_dims(1,)_permute_dimsFalse_inplaceFalse_dtypetorch.quint8 # Input: dims: (1,), permute_dims: False, inplace: False, dtype: torch.quint8 Forward Execution Time (us) : 137.685 ... Reviewed By: hl475 Differential Revision: D18436727 fbshipit-source-id: 317ec0e4bd2a6e33c9a60830f01ed805ae412449	2019-11-11 14:59:29 -08:00
Mingzhe Li	137eea5938	change module_name in chunk_test (#29589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29589 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:chunk_test -- --iteration 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: chunk # Mode: Eager # Name: chunk_M256_N512_chunks2_cpu # Input: M: 256, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 148.345 # Benchmarking PyTorch: chunk # Mode: Eager # Name: chunk_M512_N512_chunks2_cpu # Input: M: 512, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 125.239 Reviewed By: hl475 Differential Revision: D18436532 fbshipit-source-id: e7100f4605471e27703b2e2e863b971a93229854	2019-11-11 14:59:24 -08:00
Mingzhe Li	6104f4e37c	reduce input shapes for matmul (#29587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29587 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:matmul_test -- --iteration 1 Reviewed By: hl475 Differential Revision: D18436317 fbshipit-source-id: 564143edc3d4400bcfafa0da11b7479562661b0c	2019-11-11 14:59:20 -08:00
Mingzhe Li	0e5299a441	fix list_ops and list_tests (#29586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29586 This diff is fixing the list_ops and list_tests issues caused by D18412342. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_tes t -- --iteration 1 --list_tests Parsing buck files: finished in 0.9 sec Creating action graph: finished in 37.2 sec Building: finished in 15.9 sec (100%) 10053/10053 jobs, 1 updated Total time: 54.0 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # List of tests: # add_M8_N2_K1_cpu # add_M8_N2_K8_cpu .. buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iteration 1 --list_ops Parsing buck files: finished in 1.0 sec Building: finished in 5.3 sec (100%) 10053/10053 jobs, 0 updated Total time: 6.3 sec # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # List of Operators to run: # add # batchnorm # cat # chunks # Conv1d # ConvTranspose1d ... Reviewed By: hl475 Differential Revision: D18435994 fbshipit-source-id: 89ecfd55339b6e7687cdf8d90433d4767252e09f	2019-11-11 14:59:16 -08:00
Mingzhe Li	85752df4a1	reduce conv_test input shapes (#29580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29580 The input shapes of Conv benchmark generates too many tests which could took 40+GB memory. This diff reduces the input shapes to fix that issue. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:conv_test -- --iteration 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Conv3d # Mode: Eager # Name: Conv3d_in_c64_out_c64_kernel3_stride1_N8_D4_H16_W16_cpu # Input: in_c: 64, out_c: 64, kernel: 3, stride: 1, N: 8, D: 4, H: 16, W: 16, device: cpu Forward Execution Time (us) : 383376.096 Reviewed By: hl475 Differential Revision: D18434627 fbshipit-source-id: a91a239394b034ff7b42e1b09e2f744a8ad671e9	2019-11-11 14:59:11 -08:00
Xiang Gao	01ad2bc5da	Improving BinaryOpsKernel.cu (#29428 ) Summary: - Building `BinaryOpsKernel.cu` takes extremely long. Split the original file into 3 pieces, and copy-paste code into these files. - Remove some useless logic - change some wrong ops name `_cpu` -> `_cuda` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29428 Differential Revision: D18408858 Pulled By: VitalyFedyunin fbshipit-source-id: 29323b0bc40a928ae698345ad1ffe46c5851b012	2019-11-11 14:45:26 -08:00
Zafar Takhirov	6bfa7c0471	FakeQuantize benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29507 Test Plan: Imported from OSS Differential Revision: D18415084 Pulled By: z-a-f fbshipit-source-id: f758e45d5178ee5f80157772ab701a69f074a78b	2019-11-11 14:41:58 -08:00
Zachary DeVito	627f2823e0	remove _register_* bindings from python (#29499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29499 This changes how DataParallel and trace module creation works so that we no longer need to mutate Module class after it has been created. The only remaining usage of register_* functions are now inside C++ tests. Test Plan: Imported from OSS Differential Revision: D18413652 Pulled By: zdevito fbshipit-source-id: f039e5400cd016632768be4547892f6a69645c20	2019-11-11 13:52:46 -08:00
Zachary DeVito	4e4e29a511	Simplify ScriptModule bindings. (#29432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29432 This removes a lot of the private methods on torch._C.ScriptModule, and instead implements functionality in terms of slot_dict_impl views to implement _parameter, _buffers, and _modules in nn.Module. A followup PR should also remove the _register_attribute, _register_module, and _register_parameter methods, but this requires more refactoring of the way tracing creates modules and replication for data parallel works. Test Plan: Imported from OSS Differential Revision: D18387963 Pulled By: zdevito fbshipit-source-id: f10d47afeb30c1e05d704ae5ac4166830933125c	2019-11-11 13:52:36 -08:00
Nikolay Korovaiko	5b702ab52b	switching to a simple/full executor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29230 Differential Revision: D18402229 Pulled By: Krovatkin fbshipit-source-id: 62f4bc9bc89c0c7369359bba1359c22a2fa80f46	2019-11-11 13:41:35 -08:00
Richard Zou	cedca377bd	Re-enable TestNamedTensor.test_big_tensor_repr (#29407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29407 Fixes https://github.com/pytorch/pytorch/issues/27753. The bug was that random tensors print subtly differently. This causes the "names=" tag to appear in slightly different places; sometimes it is on the same line as the data, sometimes it is on different lines. For this test, we wanted to know the following: - printing a big named tensor's repr doesn't crash - a big named tensor's repr shows the names This PR changes the test to check those two things. Test Plan: - run existing tests Differential Revision: D18428657 Pulled By: zou3519 fbshipit-source-id: 6bcf247ffba010520878a175e766a496028f87d9	2019-11-11 13:32:32 -08:00
Zafar Takhirov	b3b8f522e8	Disabling 'contig' in quantized arithmetic test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29576 Test Plan: Imported from OSS Differential Revision: D18433052 Pulled By: z-a-f fbshipit-source-id: 8082303faa368646ef6370b6cf348275526fd33b	2019-11-11 13:30:13 -08:00
Zafar Takhirov	5b43becfc5	per-tensor quantize/dequantize benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29506 Test Plan: Imported from OSS Differential Revision: D18415017 Pulled By: z-a-f fbshipit-source-id: 92a50706aafabdcaa79dd1f226f7f4ac63606c74	2019-11-11 13:19:46 -08:00
Shen Li	c49b324cbf	Enable test_stress_light_rpc in rpc_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29473 Test Plan: Imported from OSS Differential Revision: D18404820 Pulled By: mrshenli fbshipit-source-id: de0f18db208d83794507c162483bb948056af533	2019-11-11 12:22:10 -08:00
Shen Li	bb90c18791	Enable test_py_rref_args_user_share in rpc_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29472 Test Plan: Imported from OSS Differential Revision: D18404818 Pulled By: mrshenli fbshipit-source-id: 1fcd19b178dc20540a210601cbb2c974be14a7cc	2019-11-11 12:22:05 -08:00
Shen Li	b885eff4be	Enable test_multi_py_udf_remote in rpc_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29471 Test Plan: Imported from OSS Differential Revision: D18404819 Pulled By: mrshenli fbshipit-source-id: 8cf3e32d7980e34c48bfd8fb61cfd9a0acc9bd46	2019-11-11 12:22:01 -08:00
Shen Li	bc4457f5b6	Enable test_py_built_in in rpc_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29470 Test Plan: Imported from OSS Differential Revision: D18404822 Pulled By: mrshenli fbshipit-source-id: 01cb87dee39c3579a2e0961d67b627ca1dc87fc2	2019-11-11 12:21:56 -08:00
Alisson Gusatti Azzolini	93b5c9d723	Allow to create local RRef with value (#28948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28948 Add the constructor RRef(value) in python. This allows to wrap a local object with RRef an pass or return this RRef to users. This enables returning, for example, a list of RRefs containing the parameters of a module to the user of the module. ghstack-source-id: 93565010 Test Plan: unit test. Differential Revision: D18241227 fbshipit-source-id: b9e9b958f40623348d62ee6fc9e7f0414b4215b7	2019-11-11 12:19:45 -08:00
Pritam Damania	17b0ab4727	Add python API for get_gradients() method. (#28926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28926 The get_gradients method was a pybind only method without any documentation for this method for users. I've moved this method to our python distributed autograd API and ensured that we have appropriate docs for this method. ghstack-source-id: 93558845 Test Plan: waitforbuildbot Differential Revision: D18234443 fbshipit-source-id: 317267d8c2416da75afd3f9d900a3cd74bb78dfb	2019-11-11 12:19:41 -08:00
Zafar Takhirov	9276cd449d	qadaptive_avgpool2d benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29274 Test Plan: Imported from OSS Differential Revision: D18343569 Pulled By: z-a-f fbshipit-source-id: e5ab9c79965caa59a8e17069e70304c01be46104	2019-11-11 12:17:44 -08:00
Alisson Gusatti Azzolini	b0cf43b2dd	Simple distributed optimizer (#29304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29304 Implements a simple python distributed optimizer that takes rrefs to parameters that will be optimized. It keeps instances of optimizers remotely and calling step on distributed optimizer will call step on each of the remote optimizers in parallel. ghstack-source-id: 93564364 Test Plan: unit tests. Differential Revision: D18354586 fbshipit-source-id: 85d4c8bfec4aa38d2863cda704d024692511cff5	2019-11-11 12:02:24 -08:00
Anjali Chourdia	604fc9ec41	F::embedding, F::embedding_bag, moved Embedding and EmbeddingBag options to embedding.h in options Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28669 Differential Revision: D18377609 Pulled By: anjali411 fbshipit-source-id: 6a2c547368849ebd1a2f8828cfbe7252152b26a2	2019-11-11 11:51:26 -08:00
Igor Fedan	65f3b98c35	explicitly provide memory format when calling to clone() at ProcessGroupGloo.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28688 Test Plan: Imported from OSS Differential Revision: D18333382 Pulled By: ifedan fbshipit-source-id: b698b647eaa1e318210f445c864d6333e7d46a15	2019-11-11 11:48:53 -08:00
Pritam Damania	310343e946	Properly shutdown RPC even in the case of `clean_shutdown=False`. (#29148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29148 We would skip rpc.join_rpc() in the case of `clean_shutdown=False`. This would exit the process without properly cleaning up the local RPCAgent resulting in a crash. As a result, to fix this we still call rpc.join_rpc() even in an unclean shutdown. Note that, rpc.join_rpc() needs to be replaced with a local `shutdown` call eventually since we need a way to shutdown the local RPC agent properly. Test Plan: waitforbuildbot Reviewed By: xush6528 Differential Revision: D18306941 fbshipit-source-id: 2685db3924f7aa4516f3b28f58d6c127bcd55ba9	2019-11-11 11:30:48 -08:00
eellison	e01fc56ecb	move type inference for arange into c++ (#27629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/17662 I'm not sure if `arange` needs to be in python_arg_parser at all, given the schemas in native_functions.yaml. In any case this at least fixes the dytpe mismatch. In follow up PRs I will try to handle some of the other ops that do type inference at the python level, like randint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27629 Differential Revision: D17885939 Pulled By: eellison fbshipit-source-id: f97a8bc722b7ab77de1c42a992e49a4a3175ad60	2019-11-11 11:26:21 -08:00
svcscm	9de0b63554	Updating submodules Summary: GitHub commits: `cce78c4f99` `0fced3e95c` `79505f3059` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: bef569172d04f781b068e86c5246cf55dbde0321	2019-11-11 11:22:15 -08:00
Peter Bell	72eff0021e	Declare Dimname's kWildcard as extern instead of static (#29384 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27627 The variable being declared in a header as `static` meant that the global variable is initialized in every source file that includes it. This is particularly problematic when included in AVX source files as it generates SIGILL on older hardware. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29384 Differential Revision: D18380379 Pulled By: zou3519 fbshipit-source-id: 0dcd87db01c468a5c9ddb2c695528b85ed2e1504	2019-11-11 10:14:16 -08:00
Edward Yang	344e7c26c4	Delete USE_CUDA macro use from data_parallel.h (#29483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29483 Somehow, these macros were not necessary! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18427851 Pulled By: ezyang fbshipit-source-id: 86e1d75d98342461c9a5afa1c30c14346188f7cc	2019-11-11 09:21:12 -08:00
Edward Yang	b141754b7f	Give a better error message when people accidentally use unsupported devices (#29409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29409 Fixes #27875 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18396828 Pulled By: ezyang fbshipit-source-id: 3f53cbbe620cd3445852273be90ff5744aa7a8cb	2019-11-11 08:10:53 -08:00
Peter Bell	bb119d957e	Move torch.cuda's atfork handler into C++ (#29101 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23401 We cannot rely on `multiprocessing.util.register_after_fork` since it is only called for processes created by the `multiprocessing` module and not `os.fork()`. Moving to `pthread_atfork` does always get called. However, I don't think it's safe to call python functions inside of the `atfork` handler so the python code has to be a bit more careful when checking `_initialized`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29101 Differential Revision: D18355451 Pulled By: ezyang fbshipit-source-id: 4d4253a3669796212c099dad4e5bdfdb0df40469	2019-11-11 07:34:27 -08:00
Huan Gui	be757957ba	Support softmax with D == 0 (#29167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29167 As titled. This fix is crucial as multi_channel splitting would create history that has no items (i.e., D == 0), which leads to flow failure. Test Plan: Unittest flow test: before fix: f148783160 after fix: f149082299 buck test mode/dev-nosan caffe2/caffe2/python/operator_test:softmax_ops_test Reviewed By: xianjiec Differential Revision: D18296081 fbshipit-source-id: e0bb2dc2c4e5b465e213f31e5c5ced3a7e1fd574	2019-11-11 00:46:10 -08:00
Adam J. Stewart	23483406aa	Fix missing space in lr_scheduler warning msg Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29527 Differential Revision: D18422662 Pulled By: ngimel fbshipit-source-id: 80191232ee0b639274ba3561e0d89ddcb40434e7	2019-11-10 22:51:35 -08:00
Shen Li	3e5af22650	Disable flaky RPC tests (#29485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29485 The flakiness is likely due to the problem with OMP and fork. We should disable fork tests for good, but that would have negative impact on internal test coverage. This commit disables the most buggy nested tests for now, until we find a way to turn fork test off. Test Plan: Imported from OSS Differential Revision: D18407529 Pulled By: mrshenli fbshipit-source-id: dcbe49a9d104fcf1eaf83107d58904d49dc18aff	2019-11-10 21:33:27 -08:00
Edward Yang	f6f428b675	Make smoke tests depend on s3 html update, to avoid race condition. (#29481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29481 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18422265 Pulled By: ezyang fbshipit-source-id: b483cd5f688676444c83174a38c99cb1777a60b0	2019-11-10 19:08:50 -08:00
Lu Fang	46c4ae5719	Fix BC CI (#29533 ) Summary: skip _nnpack_spatial_convolution for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29533 Reviewed By: hl475 Differential Revision: D18421636 Pulled By: houseroad fbshipit-source-id: 74ceaa753cf2faa16db89ea028fe275497b673c1	2019-11-10 16:04:01 -08:00
Lu Fang	466ab93ef5	Revert D18286473: Use NNPACK for strided convolutions. Test Plan: revert-hammer Differential Revision: D18286473 Original commit changeset: accdfafa2c24 fbshipit-source-id: dc1347eb2738009c7f44699fc46b6cb80c54e2e3	2019-11-10 08:11:11 -08:00
Xiang Gao	2032482eb9	Use handle pool to manage cuparse handles (#29426 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29352 The newly added test fails consistently with illegal memory access without this PR, and now it succeeds consistently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29426 Differential Revision: D18407784 Pulled By: ngimel fbshipit-source-id: 6cabb9a6674c25f7d7a3dc7b3bac99002018d8ee	2019-11-09 23:12:34 -08:00
Zafar Takhirov	5c9eae075f	qavgpool benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29268 Test Plan: Imported from OSS Differential Revision: D18342589 Pulled By: z-a-f fbshipit-source-id: cc6f0153a927672e0831200b58f5413c7db7bdb1	2019-11-09 22:30:24 -08:00
Zafar Takhirov	958d0cd4df	Adding short tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29257 Test Plan: Imported from OSS Differential Revision: D18340536 Pulled By: z-a-f fbshipit-source-id: dce470fd0c7ef9c6f639de40f7e0713b335408d1	2019-11-09 21:33:41 -08:00
Ashkan Aliabadi	5ba9209755	Use NNPACK for strided convolutions. (#29084 ) Summary: Use NNPACK for strided convolutions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29084 Differential Revision: D18286473 Pulled By: AshkanAliabadi fbshipit-source-id: accdfafa2c247f2750208a7af84c9e2c0374920b	2019-11-09 21:21:55 -08:00
Davide Libenzi	cc6af45944	Fix writeable strings warnings. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29512 Differential Revision: D18417715 Pulled By: mruberry fbshipit-source-id: 7029f0a73bcdf0ce8594b90b6f5af8be4e8b5e02	2019-11-09 21:16:35 -08:00
Andreas Koepf	86fee25d99	nll_loss (cpu): Simplify index checking: rely on exception propagation in parallel_for (#29454 ) Summary: Replace the custom thread-safe invalid index checking and instead rely on the internal exception propagation of parallel_for. Use the `TORCH_CHECK_INDEX` macro when checking indices. Align index check in `nll_loss` implementation with `nll_loss2d`, see https://github.com/pytorch/pytorch/issues/28304. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29454 Differential Revision: D18418169 Pulled By: ezyang fbshipit-source-id: 273da5230dd4b66a51bf02386718b31d2dd41e66	2019-11-09 20:23:30 -08:00
Andreas Koepf	c7ed89cf65	Migrate `nll_loss2d` from TH to ATen (CPU) (#28304 ) Summary: Added check for indicies in Reduction::None case. ### Benchmark results Note: Due to the size of the input tensors this time the random number generation is responsible for a significant portion of the total time. It is better to look at the individual net time-outputs (which do not include the input preparation). Script used for benchmark.: [nnl_loss2d_benchmark.py](https://gist.github.com/andreaskoepf/5864aa91e243317cb282c1e7fe576e1b) #### WITH PR applied ``` using reduction: none CPU forward 1000 took 7.916500908322632e-05 CPU forward 10000 took 0.0002642290201038122 CPU forward 100000 took 0.003828087996225804 CPU forward 1000000 took 0.037140720000024885 CPU forward 10000000 took 0.33387596398824826 CPU forward TOTAL time 7.218988707987592 using reduction: mean CPU forward 1000 took 9.165197843685746e-05 CPU forward 10000 took 0.0005258890159893781 CPU forward 100000 took 0.0050761590246111155 CPU forward 1000000 took 0.047345594997750595 CPU forward 10000000 took 0.4790863030066248 CPU forward TOTAL time 7.9106070210109465 CPU for- & backward 1000 took 0.0005489500181283802 CPU for- & backward 10000 took 0.0015284279943443835 CPU for- & backward 100000 took 0.015138130984269083 CPU for- & backward 1000000 took 0.15741890601930209 CPU for- & backward 10000000 took 1.6703072849777527 CPU for- & backward TOTAL time 9.555764263990568 using reduction: sum CPU forward 1000 took 8.789298590272665e-05 CPU forward 10000 took 0.000514078012201935 CPU forward 100000 took 0.005135576997417957 CPU forward 1000000 took 0.04715992201818153 CPU forward 10000000 took 0.4821214270195924 CPU forward TOTAL time 7.9119505700073205 CPU for- & backward 1000 took 0.00047759301378391683 CPU for- & backward 10000 took 0.0015945070190355182 CPU for- & backward 100000 took 0.018208994006272405 CPU for- & backward 1000000 took 0.15904426100314595 CPU for- & backward 10000000 took 1.5679037219961174 CPU for- & backward TOTAL time 9.495157692988869 ``` #### WITHOUT original TH impl ``` using reduction: none CPU forward 1000 took 0.0003981560003012419 CPU forward 10000 took 0.0035912430030293763 CPU forward 100000 took 0.035353766987100244 CPU forward 1000000 took 0.3428319719969295 CPU forward 10000000 took 3.364342701010173 CPU forward TOTAL time 11.166179805004504 using reduction: mean CPU forward 1000 took 8.63690220285207e-05 CPU forward 10000 took 0.0004704220045823604 CPU forward 100000 took 0.0045734510058537126 CPU forward 1000000 took 0.046232511987909675 CPU forward 10000000 took 0.4191019559802953 CPU forward TOTAL time 7.846049971994944 CPU for- & backward 1000 took 0.0005974550149403512 CPU for- & backward 10000 took 0.0014057719963602722 CPU for- & backward 100000 took 0.013776941981632262 CPU for- & backward 1000000 took 0.13876214998890646 CPU for- & backward 10000000 took 1.3666698939923663 CPU for- & backward TOTAL time 9.10526105100871 using reduction: sum CPU forward 1000 took 7.598899537697434e-05 CPU forward 10000 took 0.00046885499614290893 CPU forward 100000 took 0.0044489419960882515 CPU forward 1000000 took 0.04495517900795676 CPU forward 10000000 took 0.418376043002354 CPU forward TOTAL time 7.789334400993539 CPU for- & backward 1000 took 0.0004464260127861053 CPU for- & backward 10000 took 0.0017732900159899145 CPU for- & backward 100000 took 0.01626713399309665 CPU for- & backward 1000000 took 0.11790941300569102 CPU for- & backward 10000000 took 1.4346664609911386 CPU for- & backward TOTAL time 9.294745502003934 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28304 Differential Revision: D18350157 Pulled By: ezyang fbshipit-source-id: e9437debe51386a483f4265193c475cdc90b28e4	2019-11-09 18:31:20 -08:00
Zafar Takhirov	a47fe40729	qpool benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29250 Test Plan: Imported from OSS Differential Revision: D18339142 Pulled By: z-a-f fbshipit-source-id: 1d2a3dda15ab300ffa63719158a4788b7fb17df5	2019-11-09 17:52:31 -08:00
Zafar Takhirov	aa658a2a68	Adding inplace quantized relu6 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29245 Test Plan: Imported from OSS Differential Revision: D18334541 Pulled By: z-a-f fbshipit-source-id: 25b12cc88ee81434d96cf5c44c008c6f85da0673	2019-11-09 14:53:42 -08:00
Zafar Takhirov	4874120804	Added all binary arithmetic tests in QFunctional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29424 Test Plan: Imported from OSS Differential Revision: D18385689 Pulled By: z-a-f fbshipit-source-id: 5947e0edfcbe2b6eba984dc9da187e9fce5cd40f	2019-11-09 14:49:57 -08:00
Zafar Takhirov	687ea7460a	quantized comparators benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29437 Test Plan: Imported from OSS Differential Revision: D18389909 Pulled By: z-a-f fbshipit-source-id: e007b50fc3905747f0e0a70ab438b790e63b023e	2019-11-09 14:23:41 -08:00
Zafar Takhirov	fb2eb01955	qadd benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29420 Test Plan: Imported from OSS Differential Revision: D18383402 Pulled By: z-a-f fbshipit-source-id: 8ea2f689b7df676ffb8adef0cbb058a7a2123938	2019-11-09 14:20:28 -08:00
Lingyi Liu	f5074ccafe	set the no_deadline for the adaptive_avg_pool_nhwc test (#29502 ) Summary: It is reported this test is flaky due to the time expiration. This pr flags it as no_deadline test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29502 Differential Revision: D18416632 Pulled By: lly-zero-one fbshipit-source-id: 27cd7b28139f3f16ee0cf5802a0709385719d487	2019-11-09 09:30:46 -08:00
Hong Xu	6c020673c9	Migrate acos from TH to ATen (CUDA) (#29323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29323 Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.acos(a) a.numel() == 10000 for 20000 times torch.half 0.3783099120009865 torch.acos(a) a.numel() == 10000 for 20000 times torch.float 0.37258279799971206 torch.acos(a) a.numel() == 10000 for 20000 times torch.double 0.5627449999992677 torch.acos(a) a.numel() == 100000 for 20000 times torch.half 0.8581132070012245 torch.acos(a) a.numel() == 100000 for 20000 times torch.float 1.0164795860000595 torch.acos(a) a.numel() == 100000 for 20000 times torch.double 2.644646360999104 ``` After: ``` torch.acos(a) a.numel() == 10000 for 20000 times torch.half 0.3873771430007764 torch.acos(a) a.numel() == 10000 for 20000 times torch.float 0.38498222500038537 torch.acos(a) a.numel() == 10000 for 20000 times torch.double 0.5826049269999203 torch.acos(a) a.numel() == 100000 for 20000 times torch.half 0.8118497010000283 torch.acos(a) a.numel() == 100000 for 20000 times torch.float 1.0175845949997893 torch.acos(a) a.numel() == 100000 for 20000 times torch.double 2.658536324999659 ``` Close #24532 Test Plan: Imported from OSS Differential Revision: D18406806 Pulled By: VitalyFedyunin fbshipit-source-id: 2d012485f4747fae0ddbcf2e08b1d75ef5274a19	2019-11-09 09:11:53 -08:00
Hong Xu	ebfe846ad2	Clean up many unused declaration/definitions in TH Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29046 Test Plan: Imported from OSS Differential Revision: D18302767 Pulled By: VitalyFedyunin fbshipit-source-id: 65f4df515426274b92f4405ed7aad44bd1c9141e	2019-11-09 09:11:49 -08:00
Hong Xu	4606deb2be	Migrate frac from TH to ATen (CUDA) (#28953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28953 Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Test Plan: Imported from OSS Differential Revision: D18302768 Pulled By: VitalyFedyunin fbshipit-source-id: 24198838dc903d455155f0819d0c7d58974aaecd	2019-11-09 09:11:45 -08:00
Ailing Zhang	d00579da93	Revert D18399922: Switch XLA to only override abstract functions Test Plan: revert-hammer Differential Revision: D18399922 Original commit changeset: b01761673f51 fbshipit-source-id: 2e19ca58f93dd05be3c3a2a125a154d8288db672	2019-11-09 00:08:38 -08:00
Will Feng	cb74ede59e	Pass F::FuncOptions instead of torch::nn::Options to functionals, and make F::FuncOptions a different class when necessary (#29364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29364 Currently, we use `torch::nn::Options` both as module options and functional options. However, this makes it very hard to manage the parameters in `torch::nn::Options`, because a module's constructor can take a different set of arguments than the module's equivalent functional (e.g. `torch.nn.BatchNorm1d` takes `num_features, eps=1e-5, momentum=0.1, affine=True, track_running_stats=True`, while `F::batch_norm` takes `running_mean, running_var, weight=None, bias=None, training=False, momentum=0.1, eps=1e-5`). This PR resolves the above problem by making `F::FuncOptions` a different class from `torch::nn::Options` when necessary (i.e. when a module's constructor takes a different set of arguments than the module's equivalent functional). In the rest of the cases where the module constructor takes the same set of arguments as the module's equivalent functional, `F::FuncOptions` is an alias of `torch::nn::*Options`. Also as part of this PR, we change all functional options to pass-by-value, to make the semantics consistent across all functionals. Test Plan: Imported from OSS Differential Revision: D18376977 Pulled By: yf225 fbshipit-source-id: 8d9c240d93bfd5af0165b6884fdc912476b1d06b	2019-11-08 22:38:21 -08:00
Ailing Zhang	5c29160c7c	Switch XLA to only override abstract functions (#29438 ) Summary: This is a followup of li-roy 's work https://github.com/pytorch/pytorch/pull/23282. (I messed up the rebase there :( After https://github.com/pytorch/xla/issues/1225 is done we are good to move the integration to only override abstract functions. This PR contains a TODO which I'll remove in next 2 followup PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29438 Reviewed By: ljk53 Differential Revision: D18399922 Pulled By: ailzhang fbshipit-source-id: b01761673f519dfb240681180d3f18a4518273ca	2019-11-08 22:33:09 -08:00
Mingzhe Li	f31d6c70fe	reduce op bench binary size (#29496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29496 This diff reduces the binary size of op benchmark by avoiding creating all tests at once. Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : long # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K1_cpu # Input: M: 8, N: 2, K: 1, device: cpu Forward Execution Time (us) : 160.781 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N2_K8_cpu # Input: M: 8, N: 2, K: 8, device: cpu Forward Execution Time (us) : 158.941 Reviewed By: hl475 Differential Revision: D18412342 fbshipit-source-id: 5db647019ae8c2e4d6ab361b54b63cf88236b1ae	2019-11-08 22:15:12 -08:00
Michela Paganini	8e8a5e0664	Pruning Functionality (#24076 ) Summary: Provides implementation for feature request issue https://github.com/pytorch/pytorch/issues/20402. Adds pruning functionalities (structured and unstructured, local and global, as well as pruning from user-provided mask). Associated tutorial here: https://github.com/pytorch/tutorials/pull/605 cc: soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/24076 Differential Revision: D18400431 Pulled By: mickypaganini fbshipit-source-id: a97bd6ca61f8600ae411da9ff6533c232aae1a51	2019-11-08 19:38:00 -08:00
Michael Suo	3657df3836	don't set DEBUG=1 in py3.6-gcc5.4 CI build (#29491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29491 Setting DEBUG=1 causes tests to run super slow. There are two reasons why you might do it: 1. Testing `#NDEBUG` stuff. We don't really use this macro. 2. https://github.com/pytorch/pytorch/issues/4119. This is valid, but I would prefer to allow internal contbuilds to test this, as the infra is better there. Test Plan: Imported from OSS Differential Revision: D18411635 Pulled By: suo fbshipit-source-id: 54e1d0f9cddaa448cd2dd11fe263d5001845bdd8	2019-11-08 16:53:12 -08:00
Elias Ellison	91e1f07967	Check for unrolled loop in break & continue (#29474 ) Summary: For the same reason we don't allow iteration over heterogenous types (modulelists/tuples) with types that don't have a static length, we also can't break/continue within them - we need to statically know all types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29474 Differential Revision: D18406097 Pulled By: eellison fbshipit-source-id: 70ed3fc4947b6237cdd6703135a988a5c13ce786	2019-11-08 15:51:13 -08:00
Will Feng	4da3ac91b7	Add functional overloads for fold, linear, loss, normalization, padding (#29360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29360 This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals: - fold - linear - loss - normalization - padding These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance. Test Plan: Imported from OSS Differential Revision: D18376975 Pulled By: yf225 fbshipit-source-id: 233cd940834dc9d0b5d4b89339ab7082ec042c3c	2019-11-08 15:10:49 -08:00
Jeremy Lilley	e80f7506c2	In torch::save(), make padding computation faster. (#29425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29425 This change saves roughly 5-6% in the TorchSaveSmallTensor benchmark (torch::save() on a tensor with 64 random floats) by reusing the padding string across records. ghstack-source-id: 93517961 Test Plan: Correctness: buck test mode/dev-nosan caffe2/test/... Benchmark buck build mode/opt experimental/jeremyl/c2/... buck-out/opt/gen/experimental/jeremy/c2/SerializationBench Differential Revision: D18385731 fbshipit-source-id: 20bcbe1efd2fb7e3012dd68080542f2a74a7d4f2	2019-11-08 15:03:25 -08:00
Zafar Takhirov	675a4cb9fb	Extracted quantize/dequantize out of linear. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29173 Test Plan: Imported from OSS Differential Revision: D18318561 Pulled By: z-a-f fbshipit-source-id: 89317bb5f56e31221ed9ed02bf727ce39f44ebf8	2019-11-08 14:35:15 -08:00
zrphercule	eae4a69069	Add quantized fbgemm headers to torch target (#29418 ) Summary: We dont have ATen/native/quantized/cpu/*.h in torch target before, and we would like it to be exposed for external use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29418 Differential Revision: D18383534 Pulled By: zrphercule fbshipit-source-id: 72c06ae2c10e8cc49e7256c9e9b89288263bbfde	2019-11-08 14:32:19 -08:00
David Reiss	c1140f20dc	Rename PyTorch JNI library to pytorch_jni (#29412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29412 Originally, this was going to be Android-only, so the name wasn't too important. But now that we're planning to distribute it with libtorch, we should give it a more distinctive name. Test Plan: Ran tests according to https://github.com/pytorch/pytorch/issues/6570#issuecomment-548537834 Reviewed By: IvanKobzarev Differential Revision: D18405207 fbshipit-source-id: 0e6651cb34fb576438f24b8a9369e10adf9fecf9	2019-11-08 14:29:13 -08:00
David Reiss	0cfa4965a2	Clean up pytorch_android_torchvision test (#29455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29455 - Don't need to load native library. - Shape is now private. Test Plan: Ran test. Reviewed By: IvanKobzarev Differential Revision: D18405213 fbshipit-source-id: e1d1abcf2122332317693ce391e840904b69e135	2019-11-08 14:29:10 -08:00
Jeremy Lilley	abf55eb3a8	Pickler: convert std::stringstream cases. (#29351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29351 When torch::save()ing a smallish tensor, we spend ~5% of the time still in std::stringstream constructors. This removes the last couple of cases. Benchmark shows ~5% improvement: TorchSaveSmallTensor Pre: 13.12us TorchSaveSmallTensor Post: 12.48us ghstack-source-id: 93517928 Test Plan: buck build mode/opt experimental/jeremyl/c2: buck-out/opt/gen/experimental/jeremyl/c2/SerializationBench --bm_regex=TorchSaveSmallTensor Differential Revision: D18365066 fbshipit-source-id: a3284bec004751cedae1cdadf27f969422faff8e	2019-11-08 14:26:40 -08:00
Ivan Kobzarev	92b9de1428	Test application for profiling, CMake params for debug symbols (#28406 ) Summary: Reason: To have one-step build for test android application based on the current code state that is ready for profiling with simpleperf, systrace etc. to profile performance inside the application. ## Parameters to control debug symbols stripping Introducing /CMakeLists parameter `ANDROID_DEBUG_SYMBOLS` to be able not to strip symbols for pytorch (not add linker flag `-s`) which is checked in `scripts/build_android.sh` On gradle side stripping happens by default, and to prevent it we have to specify ``` android { packagingOptions { doNotStrip "*/.so" } } ``` which is now controlled by new gradle property `nativeLibsDoNotStrip ` ## Test_App `android/test_app` - android app with one MainActivity that does inference in cycle `android/build_test_app.sh` - script to build libtorch with debug symbols for specified android abis and adds `NDK_DEBUG=1` and `-PnativeLibsDoNotStrip=true` to keep all debug symbols for profiling. Script assembles all debug flavors: ``` └─ $ find . -type f -name *apk ./test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk ./test_app/app/build/outputs/apk/resnet/debug/test_app-resnet-debug.apk ``` ## Different build configurations Module for inference can be set in `android/test_app/app/build.gradle` as a BuildConfig parameters: ``` productFlavors { mobilenetQuant { dimension "model" applicationIdSuffix ".mobilenetQuant" buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_MOBILENET_QUANT')) addManifestPlaceholders([APP_NAME: "PyMobileNetQuant"]) buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-mobilenet\"") } resnet { dimension "model" applicationIdSuffix ".resnet" buildConfigField ("String", "MODULE_ASSET_NAME", buildConfigProps('MODULE_ASSET_NAME_RESNET18')) addManifestPlaceholders([APP_NAME: "PyResnet"]) buildConfigField ("String", "LOGCAT_TAG", "\"pytorch-resnet\"") } ``` In that case we can setup several apps on the same device for comparison, to separate packages `applicationIdSuffix`: 'org.pytorch.testapp.mobilenetQuant' and different application names and logcat tags as `manifestPlaceholder` and another BuildConfig parameter: ``` ─ $ adb shell pm list packages \| grep pytorch package:org.pytorch.testapp.mobilenetQuant package:org.pytorch.testapp.resnet ``` In future we can add another BuildConfig params e.g. single/multi threads and other configuration for profiling. At the moment 2 flavors - for resnet18 and for mobilenetQuantized which can be installed on connected device: ``` cd android ``` ``` gradle test_app:installMobilenetQuantDebug ``` ``` gradle test_app:installResnetDebug ``` ## Testing: ``` cd android sh build_test_app.sh adb install -r test_app/app/build/outputs/apk/mobilenetQuant/debug/test_app-mobilenetQuant-debug.apk ``` ``` cd $ANDROID_NDK python simpleperf/run_simpleperf_on_device.py record --app org.pytorch.testapp.mobilenetQuant -g --duration 10 -o /data/local/tmp/perf.data adb pull /data/local/tmp/perf.data python simpleperf/report_html.py ``` Simpleperf report has all symbols: ![Screenshot 2019-10-22 11 06 21](https://user-images.githubusercontent.com/6638825/67315740-0bc50100-f4bc-11e9-8f9e-2499be13d63e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28406 Differential Revision: D18386622 Pulled By: IvanKobzarev fbshipit-source-id: 3a751192bbc4bc3c6d7f126b0b55086b4d586e7a	2019-11-08 14:19:04 -08:00
Michael Suo	52456b2eba	add `hasattr()` (#29332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29332 Even though we're statically typed, this can be useful, e.g. as shorthand when iterating through a module list. Test Plan: Imported from OSS Differential Revision: D18393097 Pulled By: suo fbshipit-source-id: aa42e955f88d1b8a876d0727055eb596453b9839	2019-11-08 13:58:14 -08:00
Michael Suo	7a63728d5f	kill pytorch_linux_xenial_cuda9_cudnn7_py2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29479 Test Plan: Imported from OSS Differential Revision: D18406234 Pulled By: suo fbshipit-source-id: fb142b61ba39d0478632b3a4f7e9d96fe6efede9	2019-11-08 13:55:30 -08:00
Michael Suo	98bb1d1f03	remove non-onnx caffe2 builds (#29478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29478 caffe2 is still tested internally, but removing the OSS configurations. ONNX remains, however. I will look at migrating them to the pytorch docker images so we can kill the entire caffe2 part of the config Test Plan: Imported from OSS Differential Revision: D18406233 Pulled By: suo fbshipit-source-id: c3a7d1c58a2828f04778497faa1b5d13b67acbbb	2019-11-08 13:55:26 -08:00
Mike Ruberry	991c2ac383	Disables flaky test_rand_quantization (#29463 ) Summary: See https://github.com/pytorch/pytorch/issues/28550. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29463 Differential Revision: D18405669 Pulled By: mruberry fbshipit-source-id: 2984c3896a9260a06fbf052afb06e0cb8d28b53d	2019-11-08 13:51:22 -08:00
Will Feng	3ab44c48d1	Add functional overloads for pixelshuffle, pooling, upsampling, vision (#29359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29359 This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals: - pixelshuffle - pooling - upsampling - vision These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance. Test Plan: Imported from OSS Differential Revision: D18376978 Pulled By: yf225 fbshipit-source-id: 4ea8d359e7efde0d741eff79faad6b24b2a5d804	2019-11-08 13:48:47 -08:00
Michael Suo	5b1a1a17ed	remove FunctionType as an allowed constant (#29405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29405 We never actually used this (function attributes are a separate pathway in ConcreteModuleType). Test Plan: Imported from OSS Differential Revision: D18378392 Pulled By: suo fbshipit-source-id: b06c4b6d70f0b2534be78a215125cffd22ab44f0	2019-11-08 13:38:02 -08:00
Martin Yuan	a4b872b65e	Inline graph before writing the bytecode file. (#29421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29421 Inline graph before writing the bytecode file, so that all the instructions are emitted from the top-level methods. Test Plan: Imported from OSS Differential Revision: D18404180 fbshipit-source-id: 4759474a8dba3813616ebce8253bea09941f6bbb	2019-11-08 13:23:32 -08:00
svcscm	f362ae1f72	Updating submodules Summary: GitHub commits: `e112d61a25` `15098906d8` `f59ddd8ca2` `aa5a68f285` `61b1f9d489` `acddad22ce` `ac0829cd6b` `8fee33907f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 9150a027e1ba74386cd5d1c1b0e43b1299b52023	2019-11-08 12:54:41 -08:00
Zafar Takhirov	2e5fc034fb	Quantized concat benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29431 Test Plan: Imported from OSS Differential Revision: D18387765 Pulled By: z-a-f fbshipit-source-id: a14f69d1ceb0f63ce5eddfda8af342f672dfec69	2019-11-08 12:48:55 -08:00
Jiang Liu	3bc014ecf2	Implementation of cosine learning rate training policy (#29440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29440 as titled. same as diff: D18195868. We fix the windows compiling issue by changing the marco, inspired from: D15511736 Test Plan: buck test -v 2 caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_composite_cosine_lr_policy canary: https://fburl.com/fblearner/ky7wh3vg Differential Revision: D18392276 fbshipit-source-id: 83c84c985cd23b1cc43efedfef176ff3c67acb6e	2019-11-08 12:27:59 -08:00
Will Feng	edcf659e42	Remove default values from functional overloads for activation, batchnorm, distance, embedding Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29456 Test Plan: Imported from OSS Differential Revision: D18401483 Pulled By: yf225 fbshipit-source-id: 638ff72a60fb69e41bec6f468835654b208c2896	2019-11-08 12:24:51 -08:00
Jeremy Lilley	2cd4f86422	Support process_group_agent "sending to itself" (#29253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29253 Some operations can be simpler if a worker can send an rpc to itself. The main reason for not doing previous was that Gloo doesn't support self-sending. That said, this changes the process_group_agent to skip the assert check, and simply enqueue the rpc message in its receiving queue. ghstack-source-id: 93518076 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D18339715 fbshipit-source-id: 08ade40e81da378b003a550c898a726e99d50e34	2019-11-08 12:11:55 -08:00
neginraoof	64a66e8320	fixed random gerenation export (#29354 ) Summary: Fixed random generator symbolics, and added rand_like. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29354 Reviewed By: hl475 Differential Revision: D18400995 Pulled By: houseroad fbshipit-source-id: 4a891e91b6c87ebce57c35b2bfa11e32ab93a149	2019-11-08 11:43:30 -08:00
Pritam Damania	5e1983f90f	Fix distributed autograd initialization. (#29069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29069 Distributed autograd was initialized after RPC and this would cause a race in some scenarios where one node might have initialized distributed autograd, calls backward() but other nodes have not initialized distributed autograd yet. Moving this before `_init_rpc` fixes the problem since `_init_rpc` implicitly has a sync between processes via the store. ghstack-source-id: 93535922 Test Plan: waitforbuildbot Differential Revision: D18280875 fbshipit-source-id: 739a1c22dec21df859738d074e6e497fa43257fd	2019-11-08 11:20:15 -08:00
Xiaodong Wang	36b73d5a1b	Hipify contrib/nccl (#29385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29385 hipify contrib/gloo Test Plan: OSS & sandcastle build Reviewed By: bddppq Differential Revision: D18373308 fbshipit-source-id: 39c232db36318af116c341f64d03642639575ecd	2019-11-08 10:39:17 -08:00
Igor Fedan	740c9da267	explicitly provide memory format when calling to clone() at SparseTensorUtils.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28697 Test Plan: Imported from OSS Differential Revision: D18333347 Pulled By: ifedan fbshipit-source-id: 5340e1829fed068976266089c55d91aa90afee22	2019-11-08 10:29:48 -08:00
Igor Fedan	c69c243d88	explicitly provide memory format when calling to clone() at spectral_norm.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28691 Test Plan: Imported from OSS Differential Revision: D18333381 Pulled By: ifedan fbshipit-source-id: 0f562fb6f5c728b93a20fbbe53135ae5ae25c875	2019-11-08 10:24:46 -08:00
Martin Yuan	587ec3f55f	Decouple JIT and autograd codes (#28900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28900 Decouple the JIT and autograd codes (and their dependencies). After this decoupling, the compressed torch mobile size is 548 KB total (comparing to 2.98 MB with full JIT). ghstack-source-id: 93447313 Test Plan: buck build fbandroid/mode/dev_clang_libcxx //xplat/experimental/pytorch/mobile:lite_predictorAndroid#android-armv7 -c project.ignore= -c user.ndk_cxxflags=-g0 --show-output Differential Revision: D18226237 fbshipit-source-id: a188329274b450f63eb6448f42adec28517e14fd	2019-11-08 10:16:18 -08:00
Xingying Cheng	ab47465384	Remove SchemaRegistrationHandleRAII. (#29379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29379 ghstack-source-id: 93452912 Test Plan: buck build caffe2:aten_cpu Differential Revision: D18365671 fbshipit-source-id: 0141930e50a4b519df866ce70b724d17601e29dd	2019-11-08 09:51:31 -08:00
Soumith Chintala	f441bb1c20	check error status of CUDA launch after Magma kernels (#29003 ) Summary: as part of https://github.com/pytorch/hub/issues/62 I found that the stack-trace of a failed kernel launch was being recorded elsewhere, even with CUDA_LAUNCH_BLOCKING=1. So, I started debugging, and found that magma launches don't do error checking. I eventually found the issue to be that I didn't compile-in sm37 SASS into the magma binary and the failure was on `x.inverse()`, and that's somehow a problem for magma 2.5.1 (but not 2.5.0). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29003 Differential Revision: D18397358 Pulled By: soumith fbshipit-source-id: 04baca68eac209d7af773daddd0193697d4ab0d9	2019-11-08 09:43:51 -08:00
Edward Yang	4e21157e01	Revert "Revert D18171156: Merge Tensor and Variable." (#29299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29299 This reverts commit 9c43b16df9dad3dfb4da1efab68d8c88e6437e8f, but also with the changes from D18348622. Comments there: thpp-compatibility is used by admarket/adreview/service:adreviewservice and libtorch is too big for the service to deal with. thpp-compatibility doesn't support autograd, so we hack around dispatching variables by using AutoNonVariableTypeMode everywhere we call into ATen, so we never attempt to call into Variable stubs. If you get it wrong, you'll get an error like: ``` what(): Could not run 'aten::empty' with arguments from the 'VariableTensorId' backend. 'aten::empty' is only available for these backends: [SparseCPUTensorId, CPUTensorId, MkldnnCPUTensorId]. (lookup_ at caffe2/aten/src/ATen/core/dispatch/DispatchTable.h:298) ``` Test Plan: Imported from OSS ``` buck test //thpp-compatibility/... buck build mode/opt-clang admarket/adreview/service:adreviewservice ``` adreviewservice canary: https://our.intern.facebook.com/intern/ads/canary/422290029716387895 (comparing against parent comment due to current breakage) ==> experiment store https://our.intern.facebook.com/intern/experiment_store/experiment/43990006/ adfinder canary: https://our.intern.facebook.com/intern/ads/canary/422268535840333934 adindexer canary: https://our.intern.facebook.com/intern/ads/canary/422268550559034675 adreview second canary: https://our.intern.facebook.com/intern/ads/canary/422307863515591925 canary without thpp-compat fixups https://our.intern.facebook.com/intern/ads/canary/422308951649168772 Reviewed By: dreiss Differential Revision: D18353504 Pulled By: ezyang fbshipit-source-id: 65feaba39fa07bb66762810909aeb38868668a30	2019-11-08 09:11:20 -08:00
Will Feng	b24b967e00	Add functional overloads for activation, batchnorm, distance, embedding (#29358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29358 This PR adds functional overloads that take the full set of arguments (instead of just Options) for the following functionals: - activation - batchnorm - distance - embedding These new functionals lives in the `torch::nn::functional::detail` namespace and they are only meant to be called from the module forward methods (i.e. they are not public API). This is in preparation for the future change where we make module Options and functional Options two different classes, because if the module forward method has to construct a new functional Options object every time it runs it will be pretty silly and bad performance. Test Plan: Imported from OSS Differential Revision: D18376976 Pulled By: yf225 fbshipit-source-id: 0b254dc6340b6d6ac08c9f95d2b1c02b791b2f38	2019-11-08 08:34:10 -08:00
Shen Li	63675b1969	Revert RRef.to_here()/local_value() return type (#29396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29396 The return types of RRef.to_here()/local_value() were recently changed to Future, which triggers flakiness as the RRef could be deleted before the future.wait() finishes. While we are still discussing how we'd like to solve it, this commit reverts the return type to stop bleeding in tests. closes #28885 Test Plan: Imported from OSS Differential Revision: D18375571 Pulled By: mrshenli fbshipit-source-id: 354dbf38b15ab804e44fc9968dd30888415c1fab	2019-11-08 08:31:18 -08:00
Martin Yuan	d75222f3f5	Dump operator names of a module and its submodules. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29374 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D18372073 Pulled By: ljk53 fbshipit-source-id: cf2df0d44ffe74dd24dc63f1f07f395e36b5393d	2019-11-08 08:22:05 -08:00
Lu Fang	b7fc26a9ef	Clean up the stale item in bc white list (#29439 ) Summary: keep the list clean Pull Request resolved: https://github.com/pytorch/pytorch/pull/29439 Reviewed By: hl475 Differential Revision: D18392445 Pulled By: houseroad fbshipit-source-id: 2cfe66620e0e9275a0f9590e453c9be10c82124a	2019-11-08 07:06:48 -08:00
Michael Suo	255b2340fc	don't copy ignored/unused methods to ScriptModule (#29342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29342 This is not necessary, as we use `lazy_bind` to retrieve those methods from the class anyway. Test Plan: Imported from OSS Differential Revision: D18383381 Pulled By: suo fbshipit-source-id: e8b7c9e696087cc1e707ac38f7ae85f569f08371	2019-11-07 22:54:29 -08:00
Xiang Gao	5f03ad9698	Add note to docs of torch.unique (#29165 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/19151 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29165 Differential Revision: D18319890 Pulled By: soumith fbshipit-source-id: 162afaecd5371446bec2a1769e0a8848ecffb002	2019-11-07 22:03:15 -08:00
Mike Ruberry	baef925d5d	Skips CUDA handle tests on Python2 (#29430 ) Summary: Per title. These tests aren't Python2 compatible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29430 Differential Revision: D18391211 Pulled By: mruberry fbshipit-source-id: a3516796f6bd333de0415dd0ff0a2a161f963109	2019-11-07 21:33:20 -08:00
Jerry Zhang	4bcf4796aa	Make HistogramObserver scriptable with `@torch.jit.ignore` (#27950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27950 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18360139 fbshipit-source-id: 5459ae49c087886e4990de136198773a75b1c572	2019-11-07 18:02:44 -08:00
Elias Ellison	19d3a7ad02	fix negative string indexing (#22700 ) Summary: strings allow negative indexing in python Pull Request resolved: https://github.com/pytorch/pytorch/pull/22700 Differential Revision: D18382382 Pulled By: eellison fbshipit-source-id: 05c3fa0890be6234ee1467da0e65697f51236523	2019-11-07 17:28:16 -08:00
Shihao Xu	e66626ae5c	Lift rpc_timeout to RpcAgent, for other RpcAgents to reuse. (#29341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29341 So that other RpcAgent could use this timeout setting as well. ghstack-source-id: 93481902 Differential Revision: D5681951 fbshipit-source-id: 569c768dc342e8a2d9faf142ceccf696e12e41dc	2019-11-07 17:05:45 -08:00
Negin Raoof	7da11f4967	Export weight_norm (#28618 ) Summary: Export _weight_norm Caffe2 tests are inplace Looks like there is a conflicting behavior in torch.nn.utils.weight_norm regarding None dim. Where dim could be negative for backwards axes, but when dim = None, it's overwitten to -1 `0c48092b22/torch/nn/utils/weight_norm.py (L10)` For now, this symbolic to matches the current behavior. But this might need to be changed in the torch module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28618 Reviewed By: hl475 Differential Revision: D18354270 Pulled By: houseroad fbshipit-source-id: 0d64ee9ee1156bb96d36ed0a25b2e8cc5058ce90	2019-11-07 16:55:56 -08:00
James Reed	782e80e6e7	Make jit.trace_module reentrant (#29411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29411 Fixes https://github.com/pytorch/pytorch/issues/29367 Test Plan: Imported from OSS Differential Revision: D18380559 Pulled By: jamesr66a fbshipit-source-id: 5caf606ccbc5dc79dac14e3c28cc02dec19ce695	2019-11-07 16:29:06 -08:00
Natalia Gimelshein	90f28c2756	enable fast path for TensorIterator for contiguous inputs/no broadcast (#29180 ) Summary: As title. Also, replaces output allocation by `empty` instead of `empty_strided` in the regular path when possible, thus avoiding resizing of outputs and taking additional DeviceGuard for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29180 Test Plan: covered by existing tests Differential Revision: D18327836 Pulled By: ngimel fbshipit-source-id: e8d925f0fe915f327ec41aba83fd6857b09772b5	2019-11-07 16:23:33 -08:00
Xingying Cheng	8a33f1150d	Use nativeloader instead of system loader to load JNI library for soloader compatibility. (#29350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29350 ghstack-source-id: 93491099 Test Plan: P121597890 Reviewed By: dreiss Differential Revision: D18352773 fbshipit-source-id: 712c3f5d10a3d4c815c5554bb62e1a95563ba7ff	2019-11-07 16:09:29 -08:00
Neta Zmora	fa66a1498e	Simplify _calculate_fan_in_and_fan_out (#29370 ) Summary: The code checking `if dimensions == 2` is not needed because the case of a 2D tensor (Linear) is already handled by the statement: `receptive_field_size = 1` and this conditional: `if tensor.dim() > 2:` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29370 Differential Revision: D18372987 Pulled By: albanD fbshipit-source-id: fcb4dddbc76b9f4414c6d88c0aa2fb4435bf3385	2019-11-07 15:53:05 -08:00
Jerry Zhang	de9a54466d	clone should preserve the type of attribute (#29269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29269 Hit this bug when I have an attribute of type `Optional[Tensor]` which is initialized to None and reassigned later to some tensor. Test Plan: . Imported from OSS Differential Revision: D18364338 fbshipit-source-id: d8e1277a84ab7d80331cba83f5639469d398632e	2019-11-07 15:25:20 -08:00
Linbin Yu	5a44107146	fix pytorch mobile build (#29414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29414 add a missing file and fix a std::to_string call. Test Plan: buck build //xplat/caffe2:torchAndroid#android-armv7,shared Reviewed By: ljk53 Differential Revision: D18351498 fbshipit-source-id: 41225bff974058eef485a9991d0cc16c67a4074a	2019-11-07 15:20:04 -08:00
svcscm	0be2f12ef9	Updating submodules Summary: GitHub commits: `f80050fa8f` `7acd9b86d2` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 8c0603b72028220d3ac2254b752cfc9c9f6011a4	2019-11-07 14:54:07 -08:00
James Reed	821f8bfc2f	Fix tracing for dynamic quantized LSTM (#29331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331 Closes #27954 This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance: 1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load. 2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (`9c43b16df9/torch/csrc/jit/tracer.cpp (L285)`) can see these parameters and create a `Value*` for them in the traced graph. 3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business. The `forward` of the traced LSTM before and after this change are as follows: Before ``` def forward(self, input: Tensor, argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]: hx, hx0, = argument_2 _0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True) return (_0, (_1, _2)) ``` After ``` def forward(self, input: Tensor, argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]: _0 = self.cell._all_weight_values _1 = getattr(_0, "0").param _2 = getattr(_0, "1").param hx, hx0, = argument_2 _3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True) return (_3, (_4, _5)) ``` Test Plan: Imported from OSS Differential Revision: D18374904 Pulled By: jamesr66a fbshipit-source-id: f1a9b58998bc365b9baad38c21fd4bb510dd639c	2019-11-07 13:45:39 -08:00
svcscm	5bb35fe923	Updating submodules Summary: GitHub commits: `07a0ad3c29` `aa35e8c58b` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: fe673a44e0a23ba0a4dc588a9ae036857874f203	2019-11-07 13:42:45 -08:00
James Reed	1dd3c8e539	Skip flaky test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29403 Test Plan: Imported from OSS Differential Revision: D18377162 Pulled By: jamesr66a fbshipit-source-id: 69052a7466d03468146e99da45f1ee2c9e85dfa8	2019-11-07 12:52:47 -08:00
Xiang Gao	02921e7985	Use cuDNN's handle pool mechanism to manage cublas handles (#29233 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/6962 The PR implements the handle pool mechanism for cublas as suggested by mcarilli in https://github.com/pytorch/pytorch/issues/6962#issuecomment-530563872. ~~I didn't add any unit test here yet because as mcarilli mentioned:~~ > ~~On my local machine, out of curiosity I also rewrote that test to use gemms instead of convolutions. The race condition seemed rarer, but the test did show that cublas use is not thread safe. I can share the script if you want.~~ ~~Please share your script with me mcarilli. And if the race condition is rare, would it still be possible for the CI to detect it?~~ cc: colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/29233 Differential Revision: D18372007 Pulled By: ezyang fbshipit-source-id: 3492bf13410598e8452e89cf4e3e63e8df9c8c3d	2019-11-07 12:50:18 -08:00
Igor Fedan	b008b34bd8	explicitly provide memory format when calling to clone() at SparseTensor.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28699 Test Plan: Imported from OSS Differential Revision: D18333354 Pulled By: ifedan fbshipit-source-id: e5634cd6f2e5d24867f4bb6730670303e70dea52	2019-11-07 12:26:50 -08:00
Igor Fedan	09822a1d62	explicitly provide memory format when calling to clone() at SparseTensorMath.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28700 Test Plan: Imported from OSS Differential Revision: D18333349 Pulled By: ifedan fbshipit-source-id: 23780c6a60f366de2b8f563b477df35cf52f88b4	2019-11-07 12:15:15 -08:00
Lu Fang	564384fe12	Automatic update of fbcode/onnx to fea8568cac61a482ed208748fdc0e1a8e47f62f5 (#29363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29363 Previous import was 2891e1459745933f4bba9a8cb3371cf3c9eb1d16 Included changes: - [fea8568c](https://github.com/onnx/onnx/commit/fea8568c): minor changes to NonZero and Slice (#2429) <Ashwini Khade> - [79bd5042](https://github.com/onnx/onnx/commit/79bd5042): fix test bugs for resize op version 11 (#2425) <Ashwini Khade> - [3ea3b0e0](https://github.com/onnx/onnx/commit/3ea3b0e0): Add shape existence check in GatherElements shape inference logic (#2402) <Hariharan Seshadri> - [192ad8c8](https://github.com/onnx/onnx/commit/192ad8c8): add invite for next workshop (#2407) <Prasanth Pulavarthi> - [eea60812](https://github.com/onnx/onnx/commit/eea60812): Fix missing comma in exception message. Causes invalid message depending on what's in memory prior to the constant char string. (#2403) <Scott McKay> - [dd082c99](https://github.com/onnx/onnx/commit/dd082c99): Add section headers for easier linking (#2400) <Prasanth Pulavarthi> - [ca1d5b7e](https://github.com/onnx/onnx/commit/ca1d5b7e): Add type check for node inputs (#2367) <RandySheriffH> - [e5600091](https://github.com/onnx/onnx/commit/e5600091): Update doc loop op (#2337) <G. Ramalingam> Test Plan: ci Reviewed By: hl475 Differential Revision: D18365923 fbshipit-source-id: 8ac138e3ff9d4fbc5fdf85d06785190334c346a1	2019-11-07 12:09:17 -08:00
svcscm	255505f232	Updating submodules Summary: GitHub commits: `d752e52a31` `d7cc18d7c7` `6e26fa9d03` `76fcc9469a` `1da1f04231` `17ff03e136` `d8fde1c7fc` `8a07c4270c` `50fdf05973` `affb36ec7c` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: dfcf050e5eaa6e5077ea9b4c21326f127ec6066c	2019-11-07 11:58:24 -08:00
Igor Fedan	d5d524dadb	explicitly provide memory format when calling to clone() at TensorShape.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28698 Test Plan: Imported from OSS Differential Revision: D18333352 Pulled By: ifedan fbshipit-source-id: cb31d4bbda50a6bfa7a25d0cae9953bec03e7c46	2019-11-07 11:55:42 -08:00
Jie	fdab1cf0d4	NHWC support in cuDNN BatchNorm & Conv2d (#29361 ) Summary: This reverts the 9a9bb448ee49a1493f22bbbeed4af92b1364fce9 Fixing the broken case which reverts the previous commit. details about fix: modified: aten/src/ATen/native/Convolution.cpp called contiguous on 3D input tensor. This avoids the code path to accidentally recognize the input as channel_last stride, due to unsqueezing of permuted 3d tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29361 Differential Revision: D18371964 Pulled By: VitalyFedyunin fbshipit-source-id: a5985f4687b37e183649fa35b8ccdb50368ebfdf	2019-11-07 10:39:58 -08:00
Jerry Zhang	0aba5ba13c	Add unsafeRemoveAttr and unsafeRemoveSlot to ivalue::Object (#29048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29048 In order to support remove_attribute in module, we need to support remove slot in ivalue::Object, it's caller's responsiblity to gaurantee the safety of the remove operation Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D18343464 fbshipit-source-id: c1ba3a06afc40d928e59500b7b35c9e6c8720028	2019-11-07 10:35:57 -08:00
Dylan Bespalko	abbe6347ff	CPU-strided-complex support for ComplexFloat (#29294 ) Summary: Re-submit of https://github.com/pytorch/pytorch/issues/29133 In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) Changes - [x] Fixed Vec256 Permute operations for Complex Float - [x] Fixed copy_kernel_cast between complex data types - copy_kernel_cast should not call std::real during inter-complex dtype conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29294 Differential Revision: D18371928 Pulled By: ezyang fbshipit-source-id: a80a894eeaeb68540054ccfe405c4d0338fa4350	2019-11-07 09:35:19 -08:00
vishwakftw	86c64440c9	Make PyTorch Python 3.8 compatible (#29302 ) Summary: PEP 590 modifies the `tp_print` offset to `tp_vectorcall_offset` - which requires a Py_ssize_t object. Passing a nullptr caused compatibility issues for Python 3.8. Changelog: - Modify all occurrences of `nullptr /* tp_print /` to 0 / tp_vectorcall_offset */ - Minor formatting changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/29302 Test Plan: - Local fresh build with Python 3.8 completed successfully. Fixes https://github.com/pytorch/pytorch/issues/28060. Fixes https://github.com/pytorch/pytorch/issues/29162. Supersedes https://github.com/pytorch/pytorch/pull/28364 Differential Revision: D18372022 Pulled By: ezyang fbshipit-source-id: 8e9a15b0d0f72101ccc69bd489f5efa216b880bb	2019-11-07 09:20:19 -08:00
Sebastian Messmer	ca20b569be	Move unboxed dispatch decision into dispatcher (#29200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29200 Before, the dispatch key for unboxed operators from native_functions.yaml was generated in codegen and passed to the c10 dispatcher. Now, we generate it inside of the dispatcher, right next to where the same thing happens for boxed calls. ghstack-source-id: 93371022 Test Plan: unit tests Differential Revision: D18282747 fbshipit-source-id: 96a97fe83778d0a9e61b4441d6e2aed10d73209c	2019-11-07 09:03:19 -08:00
Igor Fedan	43d4d019c4	explicitly provide memory format when calling to clone() at rprop.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28693 Test Plan: Imported from OSS Differential Revision: D18333379 Pulled By: ifedan fbshipit-source-id: 4430efc0602a3fc6ef05adac07df845a696449f7	2019-11-07 09:00:37 -08:00
Yavuz Yetim	2704af0970	AsyncIf op implementation Summary: This diff adds the following: - An AsyncIf to support conditional async execution. This op assumes that then_net and else_net are async scheduling nets. This op itself completes when every async op in the active net completes. Cancellation cancels the inner nets and the async ops. - Unit tests targeting asynchronicity and error/cancellation handling. Test Plan: New unit tests With --stress-runs=2000: https://our.intern.facebook.com/intern/testinfra/testrun/4785074616784325 Reviewed By: ilia-cher Differential Revision: D18051357 fbshipit-source-id: 1399a437b3ca63fd4ea0cf08d173f85b9242cc1f	2019-11-07 08:51:31 -08:00
Alban Desmaison	b14c5943d4	Handle warning in torchscript (#27154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27154 Fix for #25859 * #28283 Fix clang-tidy errors in csrc/Module.cpp Test Plan: Imported from OSS Differential Revision: D18249631 Pulled By: albanD fbshipit-source-id: 4e9bbad07cc39e7c7f0546ef7587bd4ab2dd644e	2019-11-07 08:35:16 -08:00
Alban Desmaison	0ff1696c75	add pybind version of HANDLE_TH_ERRORS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26614 Test Plan: Imported from OSS Differential Revision: D18249634 Pulled By: albanD fbshipit-source-id: 25503f368926e0f3633c5af0f222c9bb4729f342	2019-11-07 08:35:11 -08:00
Alban Desmaison	9b875e1256	Buffer python warning to avoid deadlocks Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26613 Test Plan: Imported from OSS Differential Revision: D18249633 Pulled By: albanD fbshipit-source-id: 863f52400e1b97943a67a9e1abb09ae8d045e7f0	2019-11-07 08:35:06 -08:00
albanD	cb3232fdb9	Fix clang-tidy errors in csrc/Module.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28283 Test Plan: Imported from OSS Differential Revision: D18249632 Pulled By: albanD fbshipit-source-id: 0c7c71b3b7c74d338a90850e06c841b399f5709f	2019-11-07 08:34:58 -08:00
Hong Xu	528a0cfb96	Allow setting tolerations in testing math functions. (#29297 ) Summary: May be needed by https://github.com/pytorch/pytorch/issues/25287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29297 Differential Revision: D18371907 Pulled By: ezyang fbshipit-source-id: 4b90ae2b9867d21401498b780428dd009741b6bc	2019-11-07 08:26:53 -08:00
Igor Fedan	b05e9d4521	explicitly provide memory format when calling to clone() at lbfgs.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28692 Test Plan: Imported from OSS Differential Revision: D18333356 Pulled By: ifedan fbshipit-source-id: ca0de6b721f695893c0756ea1b3b469df1a2b249	2019-11-07 08:20:11 -08:00
Igor Fedan	5d70b11d36	Fix the issue when NHWC Tensor has height or width larger then max cuda grid (#28931 ) Summary: When NHWC Tensor has height or width larger then max CUDA grid size, max_pool fails with error code 0 The example is: https://github.com/pytorch/pytorch/issues/28714 This change should limit grid size to the CUDA max possible size and chunk the input to be able to process it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28931 Differential Revision: D18358892 Pulled By: ifedan fbshipit-source-id: 2fd65448bd644f1588a0e208edaaea5bcb6a7d52	2019-11-07 08:17:54 -08:00
Igor Fedan	4926a51010	explicitly provide memory format when calling to clone() at parameter.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28690 Test Plan: Imported from OSS Differential Revision: D18333355 Pulled By: ifedan fbshipit-source-id: e02bd556e7b336bb02cd9ec89029a0e5f4f7cbe7	2019-11-07 07:38:44 -08:00
Brian Vaughan	8498a1555b	Add some non-contiguous tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28905 Test Plan: Imported from OSS Differential Revision: D18357843 Pulled By: nairbv fbshipit-source-id: d411517d702023618dce7f501d3e2a4eea8901ff	2019-11-07 07:10:22 -08:00
Igor Fedan	9dcf5191d5	explicitly provide memory format when calling to clone() at batchnorm.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28689 Test Plan: Imported from OSS Differential Revision: D18333368 Pulled By: ifedan fbshipit-source-id: e440c80ce8a64e1aae709cd935b14c7024a17787	2019-11-07 06:42:14 -08:00
Igor Fedan	75309b45f3	explicitly provide memory format when calling to clone() at Indexing.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28660 Test Plan: Imported from OSS Differential Revision: D18333346 Pulled By: ifedan fbshipit-source-id: 06590205d883a5096388a4ae318389244130972d	2019-11-07 05:38:32 -08:00
Pieter Noordhuis	78a34d3205	Revert D18350353: dump operator names of a module and its sub-modules. Test Plan: revert-hammer Differential Revision: D18350353 Original commit changeset: 2026c8ab7650 fbshipit-source-id: 401f34cb276c3ea34a5439de4c3415969a04ab2a	2019-11-07 05:28:33 -08:00
Michael Suo	58005382c8	fix @property (#28395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28395 Currently property methods are broken in TorchScript because we basically treat it as an attribute in the existing path: we'll evaluate the method once and store that as the value forever. Since lack of property support is easily worked around (just make it a method), I've opted to just explicitly error to avoid confusion. If people want it, they can file an issue and we can look at their use case. This also helps us nicely clean up some parts of the ScriptModule conversion path. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D18054946 Pulled By: suo fbshipit-source-id: 7e927836ae687cd2f13a94b9f0af399437fae422	2019-11-06 23:51:07 -08:00
Zachary DeVito	796363147f	Implement more of of the nn.Module API (#28828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28828 This updates torch::script::Module to more closely match the behavior of nn.Module. In particular, it implements the (optionally recurisive) iterators that retrieve submodules, parameters, and buffers and makes their names match the python versions. This also removes the individual accessors for Parameter, Module, Buffer, etc. and replaces them with a single `attr` function which is equivalent to writing `a.foo` in Python (`setattr` emulates `a.foo = v`). As we build out the user-facing API for TorchScript values this will end up matching how an attribute is accessed on general objects. This PR preservers the python bindings for script::Module by emulating the old API at the binding level. A followup will clean up the usage to more directly match the C++ API. Test Plan: Imported from OSS Differential Revision: D18197611 Pulled By: zdevito fbshipit-source-id: 7ee4dcbb258605d1c988314b05d938423f1ccee5	2019-11-06 22:58:25 -08:00
Spandan Tiwari	509d9630ca	Disabling ONNX IR v4 sematics for opset 8 or lower. (#28990 ) Summary: Currently, `keep_initializers_as_input` argument in `torch.onnx.export` API can be used to choose whether to export an ONNX model with IR v3 or v4 semantics. Currently, the implementation does not check for which opset is being used for export. This is an issue because ONNX IR v4 is valid only for opset 9 and above (as listed [here](https://github.com/onnx/onnx/releases/tag/v1.4.0)), and opset 8 or lower export with `keep_initializers_as_input=False` will create a illegal ONNX graph. This change fixes this by introducing a check on opset version when deciding whether to export ONNX IR v3 or v4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28990 Reviewed By: hl475 Differential Revision: D18352523 Pulled By: houseroad fbshipit-source-id: 7e9055d405c3faf52b80a8de0d04186d4c350c15	2019-11-06 21:57:21 -08:00
Supriya Rao	4515edfe15	Disable QNNPACK tests on MacOS (#29328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29328 Tests are flaky as seen in issue #29326. Disable until we fix the kernels. Test Plan: python test/test_quantized.py TestQNNPackOps Imported from OSS Differential Revision: D18358200 fbshipit-source-id: 58f1981799fe8253234fcc7b0540e1c0b6babc15	2019-11-06 21:30:11 -08:00
Mike Ruberry	84a6583ba1	Revert D18359880: Fix tracing for dynamic quantized LSTM Test Plan: revert-hammer Differential Revision: D18359880 Original commit changeset: 0ff2cad294a1 fbshipit-source-id: 834cd43b39fb754f90c8b18b8ab9b837f2b511ab	2019-11-06 21:10:33 -08:00
Martin Yuan	dc7552f9ca	dump operator names of a module and its sub-modules. (#29208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29208 A binary to dump operator names from a script model and its sub models. Usage: dump_operator_names path/to/script_model.pt path/to/output.yaml Test Plan: Imported from OSS Differential Revision: D18350353 fbshipit-source-id: 2026c8ab765069ad059ab2ca44fc27b79315b973	2019-11-06 20:57:28 -08:00
Mingzhe Li	6572d0d174	add a new flag to select machine for op benchmark (#29349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29349 This diff adds a new flag to pick cpu/gpu machines to run op benchmarks. The default is None which will try to run all support devices. Test Plan: ``` buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 124.283 ... # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K128_cuda_bwdall # Input: M: 64, N: 64, K: 128, device: cuda Backward Execution Time (us) : 176.592 buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cpu # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 121.884 buck run mode/opt caffe2/benchmarks/operator_benchmark/pt:add_test -- --device cuda # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cuda # Input: M: 64, N: 64, K: 64, device: cuda Forward Execution Time (us) : 26.002 Reviewed By: hl475 Differential Revision: D18363942 fbshipit-source-id: fccd1fd09bcd6d7725e6fa4063559a27d9cc3065	2019-11-06 20:13:25 -08:00
Your Name	fff4f16e45	Clean up file opening for serialization (#29221 ) Summary: Stacked PRs * https://github.com/pytorch/pytorch/issues/29232 - Add zipfile serialization * https://github.com/pytorch/pytorch/issues/29228 - Expose miniz to Python * https://github.com/pytorch/pytorch/issues/29221 - Clean up file opening for serialization This is a small refactor to get things started for zipfile-based serialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/29221 Differential Revision: D18330932 Pulled By: driazati fbshipit-source-id: ce91542faf987ae5aa6dfd322e633a0c7335e678	2019-11-06 18:41:40 -08:00
Jerry Zhang	ae12630508	getFuncName take func_value as argument (#29146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29146 getFuncName takes the Value that represents the function for argument e.g. for CallFunction(%1, %a, %b, %c), it takes %1 for argument Test Plan: test_jit.py Imported from OSS Differential Revision: D18362840 fbshipit-source-id: fc90ebe7db702aec9b50cec6db454d0eb8ee5612	2019-11-06 18:20:04 -08:00
Vitaly Fedyunin	9a9bb448ee	Revert cudnn changes #23861 (#29329 ) Summary: Broken case: ```python x = torch.randn(192,16,50).cuda() x = x.permute(0,2,1).contiguous().permute(0,2,1) m = torch.nn.Conv1d( in_channels=16, out_channels=32, kernel_size=2, bias=True, ).cuda() m(x) ``` This reverts commit 8160f390cf678b3b98e0c1f73bd289ee3c96afcb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29329 Differential Revision: D18357674 Pulled By: VitalyFedyunin fbshipit-source-id: cdd7e77e8dcbfc5f2ab3df54eb53ccfbf703b245	2019-11-06 17:38:46 -08:00
James Reed	f17e02fd94	Fix tracing for dynamic quantized LSTM (#29331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29331 Closes #27954 This fixes the hard-coding of packed parameter values for the dynamic quantized LSTM by orchestrating the following dance: 1) Each variadic parameter on the module has its own Module. That Module defines the `__getstate__` and __setstate__` method s.t. packed weights are properly re-done on model load. 2) Each of these modules is wrapped into a `torch.nn.ModuleList`, s.t. the parameters appear as attributes in the hierarchy. Then, `gatherParametersAndBuffers` (`9c43b16df9/torch/csrc/jit/tracer.cpp (L285)`) can see these parameters and create a `Value*` for them in the traced graph. 3) In forward, we need to convert from ModuleList -> Module -> Parameter to a simple TensorList of the parameters. We just use a loop here. In tracing, we simply record a `ListConstruct` with each of the proper parameter values. In scripting, the `ModuleList` is const, so it can be unrolled into the graph and a subsequent `ListConstruct` does its business. The `forward` of the traced LSTM before and after this change are as follows: Before ``` def forward(self, input: Tensor, argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]: hx, hx0, = argument_2 _0, _1, _2 = torch.quantized_lstm(input, [hx, hx0], [CONSTANTS.c0, CONSTANTS.c1], True, 1, 0., True, False, False, dtype=12, use_dynamic=True) return (_0, (_1, _2)) ``` After ``` def forward(self, input: Tensor, argument_2: Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]]: _0 = self.cell._all_weight_values _1 = getattr(_0, "0").param _2 = getattr(_0, "1").param hx, hx0, = argument_2 _3, _4, _5 = torch.quantized_lstm(input, [hx, hx0], [_1, _2], True, 1, 0., True, False, False, dtype=12, use_dynamic=True) return (_3, (_4, _5)) ``` Test Plan: Imported from OSS Differential Revision: D18359880 Pulled By: jamesr66a fbshipit-source-id: 0ff2cad294a1871123015dfc704eaf73a7ac1d9e	2019-11-06 17:02:12 -08:00
Lu Fang	6c4fd602ff	Revert D18350224: Fixed export for random Test Plan: revert-hammer Differential Revision: D18350224 Original commit changeset: 540a07f7def3 fbshipit-source-id: c5755c819191b858f0de4aab8196cf5a46b8f750	2019-11-06 16:01:49 -08:00
James Reed	309b28ee3a	Trace module calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29261 Test Plan: Imported from OSS Differential Revision: D18343363 Pulled By: jamesr66a fbshipit-source-id: 0c6394205e2c0ea8708028d20df83fe17b466ff4	2019-11-06 15:05:49 -08:00
James Reed	0f4b226afb	API for finding a common ancestor block for a pair of nodes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28864 Test Plan: Imported from OSS Differential Revision: D18219786 Pulled By: jamesr66a fbshipit-source-id: fb19ed5732dd714cef7a924bc42c156065b926d5	2019-11-06 15:05:45 -08:00
Edward Yang	adb7df7117	Consistently use TORCH_CUDA_API for all files that live in cuda targets. (#29158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29158 My plan is to split out libtorch_cuda.so from libtorch.so. To do this, I need accurate _API annotations for files in these directories. I determined the correct set of annotations by looking at tools/build_variables.py and making sure every file that was a member of the libtorch_cuda/ATen-cu targets had these annotations. (torch-cpp-cuda doesn't count since that's going to be where the stuff that has explicit USE_CUDA lives, so it's going to be in a separate dynamic library). As future work, it would be good to setup a lint rule to help people understand what the correct _API annotation to use in a file is; it would also be good to reorganize folder structure so that the library structure is clearer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18309593 Pulled By: ezyang fbshipit-source-id: de710e721b6013a09dad17b35f9a358c95a91030	2019-11-06 15:02:07 -08:00
Edward Yang	a5d356cb39	Delete THP_CORE macro; partially replace with THP_BUILD_MAIN_LIB (#29143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29143 THP_CORE macro is a very old macro that appeared to have served two purposes: 1. The torch-python equivalent of CAFFE2_BUILD_MAIN_LIB, to toggle symbol visibility headers 2. Some sort of ad hoc way of hiding certain definitions from headers so external clients can't get at them. It did (2) in a very confusing manner, because we set THP_CORE in both torch and torch-python (it shouldn't do anything in torch). In this PR I just get rid of use case (2) entirely (so everything shows up in headers all the time), and then redo (1) using a new THP_BUILD_MAIN_LIB macro. This cleans up some of the macro definitions and makes my life easier for working on #27215. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18309594 Pulled By: ezyang fbshipit-source-id: adcb6d7cb387cd818480137e2b94e5e761dbfefc	2019-11-06 15:02:02 -08:00
Richard Zou	f227530c88	Clean up named tensor `propagate_names` API (#29239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29239 There were a few main changes, summarized below. Rename `propagate_names` ---------------------------------------------- There are two main APIs now, `propagate_names_if_nonempty(Tensor&, ArrayRef<Dimname>)` and `propagate_names(Tensor&, ArrayRef<Dimname>)` The former propagates names if they are not empty and the latter unconditionally tries to propagate names. `names` can be empty if name inference did not occur (see the next section). Removed usages of `optional` in name inference ---------------------------------------------- Previously, we used `optional<ArrayRef<Dimname>>` and `optional<vector<Dimname>>`. `nullopt` represens that no name inference happened. The problem with this is that these types are not implicitly convertible to each other and dealing with them is painful as a result (users have to manually unwrap `optional<vector>` and convert to `optional<arrayref>`. To fix this, I rewrote most named inference functions to use an empty array as an indicator value: - If an array is empty, then no name inference occured - If an array is not empty, then name inference occured. Removed `vector<Dimname>&&` overloads ---------------------------------------------- These were originally meant for efficiency: instead of copying a vector of names we could move it directly inside the tensor and replace the old names. However, looking around the code base, we do copies for `IntArrayRef` for sizes and strides instead of optimizing them, so the perf gain is probably not critical. I removed `vector<Dimname>&&` overloads to stop optimizing prematurely. Furthermore, one potential design for a faster named inference api is to construct names directly on a tensor's names object; in this design there is also no `vector<Dimname>&&` overload. Plans ---------------------------------------------- After this PR I'll keep attempting to cleaning up `propagate_names` functions. There are a lot of `propagate_names_for_{blah}` functions that exist that probably don't need to. Test Plan: - `python test/test_namedtensor.py -v` Differential Revision: D18350090 Pulled By: zou3519 fbshipit-source-id: eb5dd6cbd2d4f1838431db5edbdb207204c5791d	2019-11-06 14:45:39 -08:00
neginraoof	364e525f55	Fixed export for random (#28470 ) Summary: [ONNX] Fixed export for random generator ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/28470 Reviewed By: hl475 Differential Revision: D18350224 Pulled By: houseroad fbshipit-source-id: 540a07f7def335f66808af8c360b72261d15635b	2019-11-06 14:42:20 -08:00
Michael Suo	8ed84a9123	skip broken custom op test (#29334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29334 As title Test Plan: Imported from OSS Differential Revision: D18358592 Pulled By: suo fbshipit-source-id: d7afbce52ddd008ae9c42aeda6be24e35086ef01	2019-11-06 14:33:01 -08:00
Mingzhe Li	7d01d5efd7	update op bench readme (#29289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29289 as title Test Plan: na Reviewed By: hl475 Differential Revision: D18350580 fbshipit-source-id: 80f41cbbfda9cbcd8988b451cdfb199f2b89e49b	2019-11-06 14:08:02 -08:00
xiaobing.zhang	e51d937e91	move cuda abs to Aten (#25857 ) Summary: VitalyFedyunin, this PR fix the https://github.com/pytorch/pytorch/issues/24531 Benchmark script : ``` import timeit device = "cuda" for n, t in [(10, 100000),(1000, 10000)]: print('a.abs() (a.numel() == {}) for {} times'.format(n, t)) for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64', 'torch.float', 'torch.double', 'torch.half'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.abs()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t)) ``` Device: Tesla P4 Cuda verison: 9.0.176 Before this change: ``` a.abs() (a.numel() == 10) for 100000 times device: cuda, dtype: torch.int8, 100000 times 1.8391285985708237 device: cuda, dtype: torch.uint8, 100000 times 1.8831938095390797 device: cuda, dtype: torch.int16, 100000 times 1.8131775446236134 device: cuda, dtype: torch.int32, 100000 times 1.832334715873003 device: cuda, dtype: torch.int64, 100000 times 1.8218239657580853 device: cuda, dtype: torch.float, 100000 times 1.7942761108279228 device: cuda, dtype: torch.double, 100000 times 1.8193779103457928 device: cuda, dtype: torch.half, 100000 times 1.796515878289938 a.abs() (a.numel() == 1000) for 10000 times device: cuda, dtype: torch.int8, 10000 times 0.18348361551761627 device: cuda, dtype: torch.uint8, 10000 times 0.1892806850373745 device: cuda, dtype: torch.int16, 10000 times 0.18253886327147484 device: cuda, dtype: torch.int32, 10000 times 0.18509215489029884 device: cuda, dtype: torch.int64, 10000 times 0.18291602283716202 device: cuda, dtype: torch.float, 10000 times 0.1796952784061432 device: cuda, dtype: torch.double, 10000 times 0.18088893592357635 device: cuda, dtype: torch.half, 10000 times 0.18222836777567863 ``` After change: ```a.abs() (a.numel() == 10) for 100000 times device: cuda, dtype: torch.int8, 100000 times 1.7365420907735825 device: cuda, dtype: torch.uint8, 100000 times 1.7433889284729958 device: cuda, dtype: torch.int16, 100000 times 1.7034666128456593 device: cuda, dtype: torch.int32, 100000 times 1.6825932636857033 device: cuda, dtype: torch.int64, 100000 times 1.6896217577159405 device: cuda, dtype: torch.float, 100000 times 1.7211194895207882 device: cuda, dtype: torch.double, 100000 times 1.6823345720767975 device: cuda, dtype: torch.half, 100000 times 1.7027524448931217 a.abs() (a.numel() == 1000) for 10000 times device: cuda, dtype: torch.int8, 10000 times 0.17180879414081573 device: cuda, dtype: torch.uint8, 10000 times 0.17316896095871925 device: cuda, dtype: torch.int16, 10000 times 0.16990498825907707 device: cuda, dtype: torch.int32, 10000 times 0.1681906059384346 device: cuda, dtype: torch.int64, 10000 times 0.16994905844330788 device: cuda, dtype: torch.float, 10000 times 0.1719626784324646 device: cuda, dtype: torch.double, 10000 times 0.16886932775378227 device: cuda, dtype: torch.half, 10000 times 0.16957201063632965 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25857 Differential Revision: D18299368 Pulled By: VitalyFedyunin fbshipit-source-id: 173eb0f6ca5a12a27f3d53466ff373a5f81f1da8	2019-11-06 13:41:32 -08:00
Mike Ruberry	74b2d9ed2e	Skips test_equiv_recurrent (#29255 ) Summary: This test is flaky, per issue https://github.com/pytorch/pytorch/issues/10322. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29255 Differential Revision: D18350782 Pulled By: mruberry fbshipit-source-id: 53a7d33e17428c2484211618cb71e870ce2d6a03	2019-11-06 13:29:23 -08:00
Michael Suo	cc457ca30f	split remaining "easy" tests (#29249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29249 This splits out all the tests that are "easy", leaving `TestJit`, `TestScript`, the autogenerated tests, and a small docs test. Splitting those into reasonable chunks is more effort which is less mechanical. Differential Revision: D18339007 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: 69164b9f9a2c379fe8923a846c98dd3c37ccb70e	2019-11-06 13:23:01 -08:00
Jerry Zhang	f93a6e54b9	Add removeAttribute to `ClassType` (#28984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28984 Support removing an attribute in `ClassType`, `ClassType` is considered as a low level API and user of this function need to guarantee the safety of calling this method. Test Plan: tbd Imported from OSS Differential Revision: D18253776 fbshipit-source-id: 5814baa3fdf6de6c71d3cc1be225ded9116c961a	2019-11-06 11:49:29 -08:00
Rohan Varma	7069eee227	update gloo submodule (#29248 ) Summary: Update gloo submodule to use the new APIs introduced in https://github.com/facebookincubator/gloo/pull/232. Done by `cd third_party/gloo && git checkout 7c54124` which is gloo's latest commit. Next step would be to consume the introduced APIs in `ProcessGroup::Work`. Then we can use this layer to be able to interrupt `ProcessGroupAgent` (only for the gloo backend). Pull Request resolved: https://github.com/pytorch/pytorch/pull/29248 Reviewed By: xush6528 Differential Revision: D18350654 Pulled By: rohan-varma fbshipit-source-id: e41f7446bbb500087a0ca3919173b2e8379c7ce7	2019-11-06 11:33:50 -08:00
Sebastian Messmer	eb46d64740	Remove CollisionChecker from typeids (#29242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29242 I don't really know why this is crashing, but it is crashing on ios with a EXC_BAD_ACCESS / KERN_INVALID_ADDRESS. (see attached task). Removing it. ghstack-source-id: 93304255 Test Plan: waitforsandcastle Differential Revision: D18333464 fbshipit-source-id: 166012fabe1e1b1d84c10f3d3dcc2c1e24bff3aa	2019-11-06 11:28:38 -08:00
Ivan Kobzarev	ab855d06fb	Print aars content detailed size info (#28438 ) Summary: Output: ``` Oct 22 20:22:04 + find . -type f -name '.a' Oct 22 20:22:04 + xargs ls -lah Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 12M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libc10.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libclog.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libcpuinfo.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 67M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libeigen_blas.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libpytorch_qnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:21 ./android/pytorch_android/build/intermediates/jniLibs/release/x86/libtorch.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 12M Oct 22 19:45 ./build_android/install/lib/libc10.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 19:44 ./build_android/install/lib/libclog.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 19:44 ./build_android/install/lib/libcpuinfo.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 67M Oct 22 19:45 ./build_android/install/lib/libeigen_blas.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 19:45 ./build_android/install/lib/libnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 19:45 ./build_android/install/lib/libpytorch_qnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:06 ./build_android/install/lib/libtorch.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 12M Oct 22 19:45 ./build_android/lib/libc10.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 5.9K Oct 22 19:44 ./build_android/lib/libclog.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 282K Oct 22 19:44 ./build_android/lib/libcpuinfo.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 274K Oct 22 19:44 ./build_android/lib/libcpuinfo_internals.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 67M Oct 22 19:45 ./build_android/lib/libeigen_blas.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.4M Oct 22 19:45 ./build_android/lib/libnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 69K Oct 22 19:45 ./build_android/lib/libnnpack_reference_layers.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 51K Oct 22 19:44 ./build_android/lib/libpthreadpool.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 966K Oct 22 19:45 ./build_android/lib/libpytorch_qnnpack.a Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 644M Oct 22 20:06 ./build_android/lib/libtorch.a Oct 22 20:22:05 ++ find . -type f -name '.aar' Oct 22 20:22:05 Oct 22 20:22:05 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 + for f in '`find . -type f -name ".aar"`' Oct 22 20:22:05 + echo Oct 22 20:22:05 + echo ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 + ls -lah ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 20K Oct 22 20:22 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 + zipinfo -l ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 Archive: ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 Zip file size: 20260 bytes, number of entries: 7 Oct 22 20:22:05 -rw-r--r-- 2.0 unx 281 b- 177 defN 80-Feb-01 00:00 AndroidManifest.xml Oct 22 20:22:05 -rw-r--r-- 2.0 unx 81895 b- 14629 defN 80-Feb-01 00:00 R.txt Oct 22 20:22:05 -rw-r--r-- 2.0 unx 4816 b- 4632 defN 80-Feb-01 00:00 classes.jar Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 res/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 res/values/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 128 b- 106 defN 80-Feb-01 00:00 res/values/values.xml Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 values/ Oct 22 20:22:05 7 files, 87120 bytes uncompressed, 19550 bytes compressed: 77.6% Oct 22 20:22:05 Oct 22 20:22:05 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 + for f in '`find . -type f -name ".aar"`' Oct 22 20:22:05 + echo Oct 22 20:22:05 + echo ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 + ls -lah ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 6.6M Oct 22 20:21 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 + zipinfo -l ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 Archive: ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 Zip file size: 6827798 bytes, number of entries: 12 Oct 22 20:22:05 -rw-r--r-- 2.0 unx 269 b- 171 defN 80-Feb-01 00:00 AndroidManifest.xml Oct 22 20:22:05 -rw-r--r-- 2.0 unx 81895 b- 14629 defN 80-Feb-01 00:00 R.txt Oct 22 20:22:05 -rw-r--r-- 2.0 unx 16007 b- 14295 defN 80-Feb-01 00:00 classes.jar Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 res/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 res/values/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 116 b- 100 defN 80-Feb-01 00:00 res/values/values.xml Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/x86/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 1017704 b- 326504 defN 80-Feb-01 00:00 jni/x86/libfbjni.so Oct 22 20:22:05 -rw-r--r-- 2.0 unx 22309852 b- 6470885 defN 80-Feb-01 00:00 jni/x86/libpytorch.so Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 values/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 x86/ Oct 22 20:22:05 12 files, 23425843 bytes uncompressed, 6826596 bytes compressed: 70.9% Oct 22 20:22:05 + for f in '`find . -type f -name ".aar"`' Oct 22 20:22:05 + echo Oct 22 20:22:05 + echo ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 + ls -lah ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 Oct 22 20:22:05 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 -rw-r--r-- 1 jenkins jenkins 1.2M Oct 22 20:21 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 + zipinfo -l ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 Archive: ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar Oct 22 20:22:05 Zip file size: 1172812 bytes, number of entries: 16 Oct 22 20:22:05 -rw-r--r-- 2.0 unx 246 b- 162 defN 80-Feb-01 00:00 AndroidManifest.xml Oct 22 20:22:05 -rw-r--r-- 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 R.txt Oct 22 20:22:05 -rw-r--r-- 2.0 unx 12582 b- 9896 defN 80-Feb-01 00:00 classes.jar Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/arm64-v8a/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 997768 b- 288617 defN 80-Feb-01 00:00 jni/arm64-v8a/libfbjni.so Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/armeabi-v7a/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 599848 b- 219234 defN 80-Feb-01 00:00 jni/armeabi-v7a/libfbjni.so Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/x86/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 1017704 b- 326504 defN 80-Feb-01 00:00 jni/x86/libfbjni.so Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 jni/x86_64/ Oct 22 20:22:05 -rw-r--r-- 2.0 unx 1055384 b- 326713 defN 80-Feb-01 00:00 jni/x86_64/libfbjni.so Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 x86_64/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 x86/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 arm64-v8a/ Oct 22 20:22:05 drwxr-xr-x 2.0 unx 0 b- 2 defN 80-Feb-01 00:00 armeabi-v7a/ Oct 22 20:22:05 16 files, 3683532 bytes uncompressed, 1171146 bytes compressed: 68.2% Oct 22 20:22:05 + xargs tar cfvz /var/lib/jenkins/workspace/android/artifacts.tgz Oct 22 20:22:05 + find . -type f -name 'aar' -print Oct 22 20:22:05 ./android/pytorch_android_torchvision/build/outputs/aar/pytorch_android_torchvision-release.aar Oct 22 20:22:05 ./android/pytorch_android/build/outputs/aar/pytorch_android-release.aar Oct 22 20:22:05 ./android/libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni-release.aar ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28438 Differential Revision: D18153674 Pulled By: IvanKobzarev fbshipit-source-id: dce51c61e59a8423fe390405d0c71efc8ffa7deb	2019-11-06 11:24:48 -08:00
Edward Yang	9c43b16df9	Revert D18171156: Merge Tensor and Variable. Test Plan: revert-hammer Differential Revision: D18171156 Original commit changeset: 5b6a045beba3 fbshipit-source-id: f5581d902c2305018ea49f8473592be2a465560b	2019-11-06 10:57:00 -08:00
Jianyu Huang	6a4b51aec1	Add the intra-op parallelism for equal operator (#28810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28810 Similar to https://github.com/pytorch/pytorch/pull/28464 and https://github.com/pytorch/pytorch/pull/28477, we would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model. Test Plan: CI Differential Revision: D18165752 fbshipit-source-id: 354cede4c36893acbd69711f49aa6a51dc94397f	2019-11-06 10:30:44 -08:00
Igor Fedan	9ae6fd2599	explicitly provide memory format when calling to clone() at TensorFactories.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28665 Test Plan: Imported from OSS Differential Revision: D18333361 Pulled By: ifedan fbshipit-source-id: 88f19649e708c3e04decc1ca34c7a1faabe6c434	2019-11-06 09:53:55 -08:00
Mingzhe Li	e4c4ff079c	group quantized op benchmarks into a new binary (#29288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29288 More quantized operators have been added to the benchmark suite. We want to split them from the un-quantized ones for easier benchmarking. Test Plan: ``` buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_quantized_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: QConv2d # Mode: Eager # Name: QConv2d_kernel3_G32_H56_OC512_N1_stride2_pad1_W56_IC512 # Input: kernel: 3, G: 32, H: 56, OC: 512, N: 1, stride: 2, pad: 1, W: 56, IC: 512 Forward Execution Time (us) : 5614.996 # Benchmarking PyTorch: QLinear # Mode: Eager # Name: QLinear_N6400_IN141_OUT15 # Input: N: 6400, IN: 141, OUT: 15 Forward Execution Time (us) : 2829.075 Reviewed By: hl475 Differential Revision: D18349850 fbshipit-source-id: 5b2fd9c1d5a25068592e5059909bb6c14095f397	2019-11-06 09:48:53 -08:00
Mingzhe Li	114e7382b6	skip cuda test if not on GPU machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29287 Test Plan: ``` buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu # Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu Forward Execution Time (us) : 10434.151 Reviewed By: hl475 Differential Revision: D18344574 fbshipit-source-id: 881c857cf901c4539ee1a61171ab41df1c476db7	2019-11-06 09:37:04 -08:00
Mingzhe Li	e86450620d	add cuda to all op benchmark (#29285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29285 as title Test Plan: ``` buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: ConvTranspose2d # Mode: Eager # Name: ConvTranspose2d_kernel3_out_c256_H16_in_c256_N1_stride1_W16_cpu # Input: kernel: 3, out_c: 256, H: 16, in_c: 256, N: 1, stride: 1, W: 16, device: cpu Forward Execution Time (us) : 10434.151 Reviewed By: hl475 Differential Revision: D18338258 fbshipit-source-id: 944e87d1ec70daadb205faaf2825d4a2202086c5	2019-11-06 09:37:00 -08:00
Mingzhe Li	27115612ab	add execution mode to the test name (#29284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29284 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test -- iterations 1 --ai_pep_format true # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "PyTorch_add_M64_N64_K64_cpu_Eager", "metric": "latency", "unit": "ms", "value": "26.64516019518487"} Reviewed By: hl475 Differential Revision: D18336980 fbshipit-source-id: 1f9d5147a56afeb68cd526a57f7375c5ec39efa4	2019-11-06 09:32:54 -08:00
Igor Fedan	50fa132bd1	explicitly provide memory format when calling to clone() at SortingKthValue.cu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28666 Test Plan: Imported from OSS Differential Revision: D18333371 Pulled By: ifedan fbshipit-source-id: 11d4bbdaf8e57c97a1c47181ce7e953f2ad5b49e	2019-11-06 09:20:12 -08:00
Igor Fedan	af45801f0d	explicitly provide memory format when calling to clone() at SpectralOps.cu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28667 Test Plan: Imported from OSS Differential Revision: D18333358 Pulled By: ifedan fbshipit-source-id: 6e5d035517e2b9de811c80ec8255dafceb1a511e	2019-11-06 09:17:05 -08:00
peter	d05da7dad3	Fix virtualenv builds on Windows (#29273 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29058. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29273 Differential Revision: D18349822 Pulled By: ezyang fbshipit-source-id: c4d76521cc0742d890f22f1d7f32dede5600b651	2019-11-06 09:02:30 -08:00
Igor Fedan	4e53f3bcfe	explicitly provide memory format when calling to clone() at Unique.cu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28668 Test Plan: Imported from OSS Differential Revision: D18333364 Pulled By: ifedan fbshipit-source-id: 9e9ce3287021d63d35c2db8b954f0ae548fd19d4	2019-11-06 08:41:53 -08:00
Igor Fedan	5e0cf05585	explicitly provide memory format when calling to clone() at TensorTransformations.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28664 Test Plan: Imported from OSS Differential Revision: D18333351 Pulled By: ifedan fbshipit-source-id: 8a42b65330c55e23e699f2c1ae58824e74cdd1e1	2019-11-06 08:00:16 -08:00
Edward Yang	abe05a16ac	Revert D18195868: Implementation of cosine learning rate training policy Test Plan: revert-hammer Differential Revision: D18195868 Original commit changeset: 67bdb0b8dd31 fbshipit-source-id: f26761c82788f4c06f624fbd968fb966db8ecb47	2019-11-06 07:50:04 -08:00
Igor Fedan	689599d07d	explicitly provide memory format when calling to clone() at LinearAlgebra.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28661 Test Plan: Imported from OSS Differential Revision: D18333362 Pulled By: ifedan fbshipit-source-id: dcb7a1c63473415654d7a964aa732c8f0d5480ec	2019-11-06 07:45:19 -08:00
Vitaly Fedyunin	81bf73643b	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29227 Test Plan: Imported from OSS Differential Revision: D18330969 Pulled By: VitalyFedyunin fbshipit-source-id: 54d75c025b40520866b2480ce86e6483e2dcb002	2019-11-06 07:24:42 -08:00
Vitaly Fedyunin	cc1c0120bc	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29226 Test Plan: Imported from OSS Differential Revision: D18330965 Pulled By: VitalyFedyunin fbshipit-source-id: 7029848bc1379a50caba6961c7a6e1d56c1fc0ad	2019-11-06 07:24:38 -08:00
Vitaly Fedyunin	e3e06549c1	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29225 Test Plan: Imported from OSS Differential Revision: D18330964 Pulled By: VitalyFedyunin fbshipit-source-id: f357a0cc125bd90a62575bd461722b9e36e75cbf	2019-11-06 07:24:34 -08:00
Vitaly Fedyunin	47f94d5393	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29224 Test Plan: Imported from OSS Differential Revision: D18330968 Pulled By: VitalyFedyunin fbshipit-source-id: 42a5553248bfe4c7084b56850df4bcd323bad638	2019-11-06 07:24:30 -08:00
Vitaly Fedyunin	aeae0d8403	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29223 Test Plan: Imported from OSS Differential Revision: D18330967 Pulled By: VitalyFedyunin fbshipit-source-id: 25c740dd66c64fb533a0a410801ea2a53905c282	2019-11-06 07:24:25 -08:00
Vitaly Fedyunin	d410fc5a81	Autogenerated contiguous memory format for old *_like calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29222 Test Plan: Imported from OSS Differential Revision: D18330966 Pulled By: VitalyFedyunin fbshipit-source-id: 9e8da4e826cc43fac9828737ef744606491812a4	2019-11-06 07:24:21 -08:00
Richard Zou	a248ef7b9c	fix autograd support for torch.mean(tensor, dimname) (#29199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29199 Previously, we called `native::mean_cpu_gpu` inside `mean(Tensor, Dimname)`; `native::mean_cpu_gpu` is not supported by autograd. This PR replaces `native::mean_cpu_gpu` with `at::mean(Tensor, int)` so that the dimname overload can piggyback off of autograd support for `at::mean(Tensor, int)`. Also added tests (those didn't exist before) for autograd support for named tensor reduction functions. Test Plan: - `python test/test_namedtensor.py -v` Differential Revision: D18334617 Pulled By: zou3519 fbshipit-source-id: 1714eb3fd93714fe860f208831e8d910f01c1c78	2019-11-06 07:21:30 -08:00
Hong Xu	ff9d508b88	Remove tools/setup_helpers/cuda.py. (#28617 ) Summary: Except for the Windows default path, everything it does has been done in FindCUDA.cmake. Search for nvcc in path has been added to FindCUDA.cmake (https://github.com/pytorch/pytorch/issues/29160). The Windows default path part is moved to build_pytorch_libs.py. CUDA_HOME is kept for now because other parts of the build system is still using it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28617 Differential Revision: D18347814 Pulled By: ezyang fbshipit-source-id: 22bb7eccc17b559ce3efc1ca964e3fbb270b5b0f	2019-11-06 07:12:01 -08:00
Spandan Tiwari	bc91e19861	Enable ONNX constant folding for opset 11. (#29011 ) Summary: Currently ONNX constant folding (`do_constant_folding=True` arg in `torch.onnx.export` API) supports only opset 9 and 10 of ONNX. Opset 11 support was recently introduced in the ONNX exporter. For opset 11, it is currently a no-op. This change enables ONNX constant folding for opset 11. Specifically there are three main changes: 1) Turn on constant folding ONNX pass for opset 11. 2) Enable constant folding tests in `test/onnx/test_utility_funs.py` and `test/onnx/test_pytorch_onnx_onnxruntime.py` for opset 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29011 Reviewed By: hl475 Differential Revision: D18306998 Pulled By: houseroad fbshipit-source-id: afeed21ca29e01c278612e51dacd93397dd6e2d8	2019-11-05 23:22:39 -08:00
Bram Wasti	ee21142e40	Move custom passes to last optimization step (#29256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29256 .. Test Plan: .. Reviewed By: ZolotukhinM Differential Revision: D18340212 fbshipit-source-id: 30f4850c8a21bdab42c7cf04b4b92b1787449ee2	2019-11-05 20:10:33 -08:00
Supriya Rao	6ea4219d20	Temporarily disable qnnpack tests on MACOS (#29176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29176 Captured in issue #27326 Test Plan: python test/test_quantized.py test_qconv Imported from OSS Differential Revision: D18336184 fbshipit-source-id: 7394b04215b6c8b7bc0508f1648f23022bd031cb	2019-11-05 18:52:45 -08:00
Jiang Liu	ee8d5e5249	Implementation of cosine learning rate training policy (#29017 ) Summary: Implementation of the cosine learning rate in: https://arxiv.org/pdf/1608.03983.pdf. Mostly inspired from: https://github.com/pytorch/fairseq/blob/master/fairseq/optim/lr_scheduler/cosine_lr_scheduler.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/29017 Test Plan: buck test -v 2 caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_composite_cosine_lr_policy learning rate log with max_lr=0.3, initial_period=20, t_mult=0.95, lr_shrink=0.95: P120327179 https://pxl.cl/PrcP full canary: https://fburl.com/fblearner/mw69ylsd Differential Revision: D18195868 Pulled By: grantlj fbshipit-source-id: 67bdb0b8dd31d040d16b29d0da3115907bd141ef	2019-11-05 18:19:41 -08:00
Zafar Takhirov	d545e4f155	qrelu benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29174 Test Plan: Imported from OSS Differential Revision: D18319345 Pulled By: z-a-f fbshipit-source-id: b64f0131296771ed201d85664930cceb7be185bd	2019-11-05 17:20:40 -08:00
svcscm	13f53d0fea	Updating submodules Summary: GitHub commits: `de05e0e7ac` `b6641eb7fa` `ec1aa6936b` `80479de3f7` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 815f4c5a06826e1a508e5d5016f2be42e96b7fea	2019-11-05 17:07:23 -08:00
James Reed	6e38c3b89e	Make get_trace_graph private Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29149 Test Plan: Imported from OSS Differential Revision: D18307559 Pulled By: jamesr66a fbshipit-source-id: 0b6aec2a1d10810d4e7f6b30b256cca79fc4e854	2019-11-05 17:04:36 -08:00
Mike Ruberry	2f2a0d1607	Disables test_atomic_ops and testInputOrder (#29145 ) Summary: These tests have been flaky for some time, see: - https://github.com/pytorch/pytorch/issues/28179 - https://github.com/pytorch/pytorch/issues/9064 This PR disables them. The actual tests were added/updated 2+ years ago. It's unclear who, if anyone, would own them now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29145 Differential Revision: D18327937 Pulled By: mruberry fbshipit-source-id: d02731d662aff3545b581272e5ae8db4e3097d87	2019-11-05 16:53:53 -08:00
Tao Xu	30f88bb05a	Fix the TestApp (#29247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29247 ### Summary If you run the TestApp using Cocoapods, you'll likely run into an error due to the lack of `config.json` in the main bundle. This PR fixes this crash and updates the README as well. ### Test Plan - Don't break CIs Test Plan: Imported from OSS Differential Revision: D18339047 Pulled By: xta0 fbshipit-source-id: 244cf1ca8729c7ac918258d4eff14d34363e8389	2019-11-05 16:28:51 -08:00
Rohan Varma	003cb8595b	skip more flaky rpc tests (#29157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29157 As reported, these tests are flaky and time out. Skip them while we investigate further. ghstack-source-id: 93287663 Test Plan: CI Differential Revision: D18309204 fbshipit-source-id: 95f0ea5e0c1162b78da412a34db446a01dfc33bf	2019-11-05 15:49:13 -08:00
Igor Fedan	35f8b450fc	explicitly provide memory format when calling to clone() at SobolEngineOps.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28662 Test Plan: Imported from OSS Differential Revision: D18333374 Pulled By: ifedan fbshipit-source-id: c8e18e9937b373daba0ead819622350b693c4bfa	2019-11-05 15:45:50 -08:00
Igor Fedan	9232143d6a	explicitly provide memory format when calling to clone() at Sorting.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28663 Test Plan: Imported from OSS Differential Revision: D18333373 Pulled By: ifedan fbshipit-source-id: 908880dd58d5e795db661a7249a11028f610c328	2019-11-05 15:35:55 -08:00
lsrock1	6389c18709	C++ parity, nn::CrossMapLRN2d (#29039 ) Summary: yf225 https://github.com/pytorch/pytorch/issues/25883 re- pull request because of rebase mistake! Pull Request resolved: https://github.com/pytorch/pytorch/pull/29039 Differential Revision: D18326829 Pulled By: yf225 fbshipit-source-id: 5ed737f6275e4463efa4951d9b7f45c6f2723c82	2019-11-05 15:27:08 -08:00
Jianyu Huang	492764b18f	Enable the intra-op parallelism for layer norm (#28464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28464 We would like to enable the intra-op parallelism for layer norm. This will be mapped to the parallel performance win for the BERT/RoBERTa model. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" Reviewed By: BIT-silence Differential Revision: D18063407 fbshipit-source-id: c116e744d78ea50b3aadf2e9a819e5b876a944bf	2019-11-05 15:24:32 -08:00
Elias Ellison	a5aeb37493	Don't throw when type is used in TorchScript (#28053 ) Summary: Type objects in python have an attribute `__abstractmethods__` that throws when it is accessed, so we were failing with an AttributeError whenever a type was used in TorchScript. This pr prevents that error from happening. We can't just throw when a type is used because it could be used to access a static method: https://github.com/pytorch/pytorch/pull/27163 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28053 Differential Revision: D18332347 Pulled By: eellison fbshipit-source-id: 9c7f2220f92674ad4d903621d9762cecc566ab0d	2019-11-05 15:15:12 -08:00
Shihao Xu	ac027d30d5	Half test time, test_asymmetric_load_with_join, to avoid flakiness (#29139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29139 Each test has 100 sec timeout. Current this test takes 90~110 secs to finish, causing flakiness. Half the load to make it not on the edge of timeout. ghstack-source-id: 93203670 Differential Revision: D5644012 fbshipit-source-id: 2a85999cf1ae6d18e9a871cd76ce194e1ce7b3e8	2019-11-05 14:54:19 -08:00
Tao Xu	ebf5dd447e	Cocoapods 1.3.1 release (#29240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29240 ### Summary The 1.3.1 binary has been uploaded to AWS - https://ossci-ios.s3.amazonaws.com/libtorch_ios_1.3.1.zip. This PR updates the cocoapods version to 1.3.1 ### Test Plan - The 1.3.1 binary works well Test Plan: Imported from OSS Differential Revision: D18333750 Pulled By: xta0 fbshipit-source-id: fe6e42c51f3902ad42cab33f473dffb0f6f33333	2019-11-05 14:50:46 -08:00
Huan Gui	8a2dcff189	Add cuda version for operators BatchSparseToDense and BatchDenseToSparse (#29166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29166 As titled Test Plan: unittest buck test mode/dev-nosan caffe2/caffe2/python/operator_test:batch_sparse_to_dense_op_test Reviewed By: xianjiec Differential Revision: D18197966 fbshipit-source-id: 7486300c509dd552ddb7484c2d83099f62878278	2019-11-05 13:06:23 -08:00
Pearu Peterson	fd4f22e4ea	Generalized LU factorization (#28608 ) Summary: This PR implements support for generalized LU factorization that is required for various algorithms such as PCA (see issue https://github.com/pytorch/pytorch/issues/8049). Pull Request resolved: https://github.com/pytorch/pytorch/pull/28608 Differential Revision: D18326449 Pulled By: ezyang fbshipit-source-id: d4011d75710e06e87ddbf5ad9afae42ba3330548	2019-11-05 12:27:40 -08:00
Wanchao Liang	9492994feb	submodule swapping via module interface (#28409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28409 This PR enables submodule swapping via module interface. User could declare a submodule as an module interface type in the ScriptModule, during compilation we will record the module interface type in ModuleInfo of ConcreteModuleType, the JIT type associated will have the correct ModuleInterfaceType, and CppModule will get the correct module list Given that we still keep the module interface type in the type system, the graph is not inlined when we call Module::Attr and it will use prim::CallMethod to call the method, this allow us to do module swapping for the ScriptModule that also meet the same module interface type, and we only allow the module swapping through the module interface approach. Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D18284309 fbshipit-source-id: 2cb843e4b75fa3fcd8c6020832a81014dbff4f03	2019-11-05 11:31:40 -08:00
Richard Zou	f1c78492f8	Revert D18299298: Migrate conv3d from TH to ATen (CPU) Test Plan: revert-hammer Differential Revision: D18299298 Original commit changeset: 97d53e8c976a fbshipit-source-id: 33057d5a91d11bca136f69bc2d6ff0699d31492a	2019-11-05 11:26:48 -08:00
Ivan Kobzarev	eb4189089a	README (#28533 ) Summary: Copy of android.md from the site + information about Nightly builds It's a bit of duplication with separate repo pytorch.github.io , but I think more people will find it and we can faster iterate on it and keep in sync with the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28533 Reviewed By: dreiss Differential Revision: D18153638 Pulled By: IvanKobzarev fbshipit-source-id: 288ef3f153d8e239795a85e3b8992e99f072f3b7	2019-11-05 11:06:23 -08:00
Edward Yang	26f57cbe5e	Revert D18309297: CPU-strided-complex support for ComplexFloat Test Plan: revert-hammer Differential Revision: D18309297 Original commit changeset: adf4bc3a45ba fbshipit-source-id: de45d9d7863a7f530be6773635b05bc4a7251d96	2019-11-05 10:26:30 -08:00
Xiang Gao	25e261d6d5	assertEquals is deprecated, use assertEqual instead Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28335 Differential Revision: D18263456 Pulled By: ngimel fbshipit-source-id: c0f79071feaa5a4c3c4b20505013bf7c4b5455d5	2019-11-05 09:52:21 -08:00
Prasun Anand	c99cdfeb7d	link to documentation for RNNBase.flatten_parameters() (#29196 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28658 I have added the link to the docs for `flatten_parameters`. RNNBase is a superclass of RNN, LSTM and GRM classes. Should I add a link to `flatten_parameters()` in those sections as well ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/29196 Differential Revision: D18326815 Pulled By: ezyang fbshipit-source-id: 4239019112e77753a0820aea95c981a2c868f5b0	2019-11-05 09:45:21 -08:00
Dylan Bespalko	f32ab6157b	CPU-strided-complex support for ComplexFloat (#29133 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) Changes - [x] Fixed Vec256 Permute operations for Complex Float - [x] Fixed copy_kernel_cast between complex data types - copy_kernel_cast should not call std::real during inter-complex dtype conversion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29133 Differential Revision: D18309297 Pulled By: ezyang fbshipit-source-id: adf4bc3a45ba2918c8998d59fa94a52f89663e94	2019-11-05 09:17:54 -08:00
Hong Xu	21d11e0b64	FindCUDA: Use find_program instead of find_path to find nvcc (#29160 ) Summary: Otherwise nvcc is not found if it is in env PATH but a non-standard location. Import from my patch for CMake: https://gitlab.kitware.com/cmake/cmake/merge_requests/3990 Although we currently do nvcc search in a Python script, it will be removed soon in https://github.com/pytorch/pytorch/issues/28617. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29160 Differential Revision: D18326693 Pulled By: ezyang fbshipit-source-id: dc7ff3f6026f0655386ff685bce7372e2b061a4b	2019-11-05 08:51:35 -08:00
Zak Hassan	a02681f804	Cleaned up func removed unused variable (#29179 ) Summary: I don't see `_frames_up` being used anywhere. Just to clean up the code thought it should be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29179 Differential Revision: D18319876 Pulled By: suo fbshipit-source-id: 5e612ff94ccc88fc85288ffc26213e1d11580c36	2019-11-05 08:48:45 -08:00
Zak Hassan	7434da2c3f	value assigned but never used in _recursive.py (#29181 ) Summary: # Description I'm new to this project just wanted to start with small bug fixes. I found some unused local variables and I've removed them in this pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29181 Differential Revision: D18319893 Pulled By: suo fbshipit-source-id: e4f9f13b6db2ca213015569deb12d3fd9beb74a8	2019-11-05 08:48:41 -08:00
Vadim Kantorov	c6d908d491	Support Conv+BatchNorm fusion for 1d/3d (#29113 ) Summary: Support Conv+BatchNorm fusion for 1d/3d by being adaptive to number of dimensions (partially fixes https://github.com/pytorch/pytorch/issues/28757) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29113 Differential Revision: D18298248 Pulled By: soumith fbshipit-source-id: 2fc75353aecc0e315c90e63476481acef6ebf784	2019-11-05 08:43:51 -08:00
Andreas Koepf	546ae3002d	Migrate conv3d from TH to ATen (CPU) (#29007 ) Summary: This is a port of the VolumetricConvolutionMM TH (CPU) implementation to ATen as `slow_conv3d`. - [x] unfolded3d_copy & unfolded3d_acc - [x] forward - [x] backward - [x] basic sanity cross check with 1.3 impl - [ ] systematic testing - [ ] performance comparison & optimization Script used for performance testing: [benchmark_conv3d.py](https://gist.github.com/andreaskoepf/8865eea4bb05220f78fc6d9d408c49fc) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29007 Differential Revision: D18299298 Pulled By: ezyang fbshipit-source-id: 97d53e8c976a09aecbc6f05dd8e982cc58cdf6d8	2019-11-05 08:09:20 -08:00
Xinyu Li	f2a35db2d3	batch_norm_cpu_inference for channel last (#28982 ) Summary: channels last version for batch_norm_cpu_inference_contiguous Benchmark: The benchmark test uses a fixed batch size n=20, channel number in [1,3,10,100,1000], height and width size in [1,4,16,64,256], height and width size are always the same in this test. We use the following code to do this benchmark. It tests contiguous, channels last and non-contiguous tensor in each loop and print out the benchmark. It also compare the outputs within each loop to make sure the correctness of the new change. for c in [1,3,10,100,1000]: for hw in [1,4,16,64,256]: print('Benchmark n=20 c={0} h={1} w={2}'.format(c, hw, hw)) m = nn.BatchNorm2d(c, affine=False) m.eval() input = torch.randn(20, c, hw, hw) output = m(input) %timeit m(input) for name, param in m.named_parameters(): if param.requires_grad: if param.data.dim() == 4: param.data = param.data.contiguous(memory_format=torch.channels_last) m.eval() input = input.contiguous(memory_format=torch.channels_last) output1 = m(input) %timeit m(input) m = nn.BatchNorm2d(c, affine=False) m.eval() input = input.permute(0,1,3,2) output2 = m(input) %timeit m(input) output2 = output2.permute(0,1,3,2) print(output.equal(output1), output.equal(output2)) Sample output: Benchmark n=20 c=100 h=256 w=256 -> title line 101 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> contiguous tensor 100 ms ± 898 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) -> channels last tensor 1.3 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) -> non-contiguous tensor True True -> 1st output compare with 2nd output, 1st output compare 3rd output, expect True Benchmark Before this change: Benchmark n=20 c=1 h=1 w=1 10.1 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.2 µs ± 305 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.7 µs ± 784 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=4 w=4 10.2 µs ± 152 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.1 µs ± 98 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.5 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=16 w=16 11 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 17.3 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=64 w=64 24.2 µs ± 536 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 23.9 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 66 µs ± 409 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=1 h=256 w=256 539 µs ± 7.85 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 539 µs ± 15.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.42 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=3 h=1 w=1 10 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 9.97 µs ± 93 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.4 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=3 h=4 w=4 10.4 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 16.1 µs ± 601 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 19.1 µs ± 658 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=3 h=16 w=16 13.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 25.3 µs ± 558 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 32.4 µs ± 625 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=3 h=64 w=64 51.1 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 159 µs ± 7.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 199 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=3 h=256 w=256 1.25 ms ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.95 ms ± 203 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.14 ms ± 42.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) True True Benchmark n=20 c=10 h=1 w=1 9.97 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.5 µs ± 852 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11.7 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=10 h=4 w=4 11.2 µs ± 84.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 29.7 µs ± 343 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 39.4 µs ± 396 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=10 h=16 w=16 19.7 µs ± 632 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 68.3 µs ± 912 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 90.3 µs ± 4.76 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=10 h=64 w=64 325 µs ± 5.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 918 µs ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 991 µs ± 44.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=10 h=256 w=256 9.47 ms ± 73.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 34.7 ms ± 2.12 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 91.5 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=100 h=1 w=1 11.8 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.1 µs ± 800 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12 µs ± 533 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=100 h=4 w=4 26.7 µs ± 2.83 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 231 µs ± 8.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 335 µs ± 15.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=100 h=16 w=16 178 µs ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 1.45 ms ± 187 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.52 ms ± 94.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=100 h=64 w=64 6.9 ms ± 554 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 30.3 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 27 ms ± 272 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=100 h=256 w=256 98.9 ms ± 818 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.29 s ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.32 s ± 9.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) True True Benchmark n=20 c=1000 h=1 w=1 18.6 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 18.7 µs ± 947 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 15.8 µs ± 261 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1000 h=4 w=4 111 µs ± 2.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 2.07 ms ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.19 ms ± 163 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) True True Benchmark n=20 c=1000 h=16 w=16 3.87 ms ± 336 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 25.6 ms ± 394 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 27 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=1000 h=64 w=64 70.1 ms ± 1.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 467 ms ± 26.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 444 ms ± 25.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) True True Benchmark n=20 c=1000 h=256 w=256 2.39 s ± 19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 19.2 s ± 181 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 22.1 s ± 1.13 s per loop (mean ± std. dev. of 7 runs, 1 loop each) True True Benchmark After this change: Benchmark n=20 c=1 h=1 w=1 10.4 µs ± 247 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.5 µs ± 149 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.7 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=4 w=4 11.8 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 13.6 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=16 w=16 11.9 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.1 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.2 µs ± 205 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1 h=64 w=64 27.6 µs ± 2.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 32.2 µs ± 8.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 68.9 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=1 h=256 w=256 601 µs ± 49 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 597 µs ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.48 ms ± 24.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=3 h=1 w=1 10.8 µs ± 127 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.6 µs ± 194 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.5 µs ± 137 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=3 h=4 w=4 11.6 µs ± 551 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11.7 µs ± 266 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 19.9 µs ± 340 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=3 h=16 w=16 13.7 µs ± 223 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 24.7 µs ± 424 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 33.7 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=3 h=64 w=64 53.3 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 212 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 204 µs ± 5.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=3 h=256 w=256 1.49 ms ± 295 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.27 ms ± 136 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.08 ms ± 290 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) True True Benchmark n=20 c=10 h=1 w=1 10.7 µs ± 166 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.8 µs ± 225 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 10.8 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=10 h=4 w=4 11.6 µs ± 129 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.9 µs ± 503 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 43.7 µs ± 3.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=10 h=16 w=16 20.7 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 37.2 µs ± 795 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 92.5 µs ± 1.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) True True Benchmark n=20 c=10 h=64 w=64 342 µs ± 9.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 622 µs ± 37.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.03 ms ± 37.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=10 h=256 w=256 9.49 ms ± 130 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.9 ms ± 408 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 90.5 ms ± 1.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=100 h=1 w=1 12 µs ± 575 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 11 µs ± 182 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=100 h=4 w=4 22.3 µs ± 451 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 18.7 µs ± 255 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 323 µs ± 6.22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=100 h=16 w=16 211 µs ± 22 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 222 µs ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.5 ms ± 59.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) True True Benchmark n=20 c=100 h=64 w=64 7.2 ms ± 1e+03 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.51 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 27.4 ms ± 695 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=100 h=256 w=256 101 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 100 ms ± 898 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.3 s ± 10.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) True True Benchmark n=20 c=1000 h=1 w=1 16.9 µs ± 589 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 16.5 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 16.5 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) True True Benchmark n=20 c=1000 h=4 w=4 116 µs ± 6.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 67 µs ± 1.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 3.23 ms ± 80 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) True True Benchmark n=20 c=1000 h=16 w=16 3.53 ms ± 72.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.53 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 27 ms ± 129 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) True True Benchmark n=20 c=1000 h=64 w=64 68.6 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 68 ms ± 288 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 425 ms ± 1.25 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) True True Benchmark n=20 c=1000 h=256 w=256 2.51 s ± 97.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 2.84 s ± 471 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 21.5 s ± 933 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) True True The channel last batch normalization is getting faster with this change and the previous existing code/logic is not affected based on the benchmark above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28982 Reviewed By: VitalyFedyunin Differential Revision: D18253305 Pulled By: glaringlee fbshipit-source-id: a0fcac65544f10d736141ee70edeab8a3f1b3e02	2019-11-05 07:59:39 -08:00
Richard Zou	cb6d9deec6	support for cdist (#29129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29129 cdist(x1, x2) does the following: - assume x1, x2 are 2-dimensional. Then x1, x2 are each considered to be a list of vectors. - The operation returns a matrix that is the pairwise distance between each vector in x1 and each vector in x2. The matrix has first dimension size equal to the number of vectors in x1 and second dimension size equal to the number of vectors in x2. - cdist also supports arbitrary left-hand broadcastable batch dimensions. In this case, x1 and x2 are each considered to be a batch of a list of vectors. The above leads to the following name inference rule for cdist: - In the 2D case, propagate x1.names[-2] and x2.names[-1] (because the final result has size (x1.size[-2], x2.size[-2]). - in the ND case, unify all the batch dimensions together to produce the output batch dimensions and then apply the rule for the 2D case. Furthermore, I moved all of the name checking in the implementation to occur before name inference because name inference assumes that the shapes are valid. Test Plan: - new test: `pytest test/test_namedtensor.py -v -k "cdist"` Differential Revision: D18311867 Pulled By: zou3519 fbshipit-source-id: 713d7cdda93c8fe92e7f1bd7f7c5c6e20a8138e3	2019-11-05 07:24:23 -08:00
Richard Zou	3233a058fa	Add TensorNames::checkUnique, operator<< (TensorName) (#29124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29124 TensorNames::checkUnique gives a nice error message if there are duplicate names. Adding operator<< on TensorName cleans up some code. A TensorName gets printed out as: "'H' (index 2 of ['N', 'C', 'H', 'W'])" for example. Test Plan: - New c++ tests. test with `build/bin/NamedTensor_test`. Differential Revision: D18311868 Pulled By: zou3519 fbshipit-source-id: 5be197dba227f0328b40d7f66e78fffefe4dbd00	2019-11-05 07:24:19 -08:00
Hiroshi Ogawa	2c3c702d29	Fix poisson_nll_loss with full option (#28637 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/28575. It seems `poisson_nll_loss` was implemented with the incorrect assumption about `masked_select`, which actually doesn't return tensor with the same storage, so in-place operation used there didn't work as intended. Here I used `masked_fill` instead. Also, the existing test didn't have `reference_fn`, so I added it (although it's not fundamentally useful since current cpp `poisson_nll_loss` itself does exactly same algorithm as `reference_fn`). Thanks in advance for reviewing this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28637 Differential Revision: D18299724 Pulled By: albanD fbshipit-source-id: 1aac5b20e77bf54874b79018207ba8f743766232	2019-11-05 07:10:35 -08:00
Pieter Noordhuis	49fba35208	Run clang-format for torch/distributed/rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27531 Test Plan: Imported from OSS Differential Revision: D17808206 Pulled By: pietern fbshipit-source-id: 7d23327bfba42dab4b60779c9f03b7952ff0db7a	2019-11-05 06:25:30 -08:00
Pieter Noordhuis	6c3915643b	Rename PythonUDF{Call,Resp} (#27530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27530 Per discussion in #27286, the `UDF` part is superfluous. This makes the naming consistent with the `MessageType` enum. Test Plan: Imported from OSS Differential Revision: D17808211 Pulled By: pietern fbshipit-source-id: 0ff925de26d027951ce285750ad276ed17fee4c6	2019-11-05 06:25:26 -08:00
Pieter Noordhuis	b4df413712	Scope pybind11 functions to torch.distributed.{autograd,rpc} Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27529 Test Plan: Imported from OSS Differential Revision: D17808209 Pulled By: pietern fbshipit-source-id: 1e3e086085167320c3fc369467f5d75ce39fa4ea	2019-11-05 06:25:22 -08:00
Pavel Belevich	69f845cb77	C++ API parity: MarginRankingLoss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29000 Test Plan: Imported from OSS Differential Revision: D18271855 Pulled By: pbelevich fbshipit-source-id: cbafc7f059173306c83673d7be374c2d3700911f	2019-11-05 05:41:40 -08:00
svcscm	0d056e75e9	Updating submodules Summary: GitHub commits: `d70aa3c904` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 2c18456a882470f185946af2749a4e0c2e6f9cde	2019-11-05 02:10:50 -08:00
Jongsoo Park	ca7d0803e9	use fbgemm's 3d group conv fast path (#29085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29085 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/159 Change DNNLOWP operators to use fbgemm's new 3D groupwise convolution (D18192339) This diff also fixes an issue when column offsets are fused into bias. In this case, we construct ReQuantizeOutput with col_offsets == 0 and A_zero_point == 0 even if real A_zero_point is 0. In fbgemmGroupwiseConv, when we call dispatchOutputProcessing, we shouldn't pass the original A_zero_point . Test Plan: https://github.com/pytorch/pytorch/pull/29134 Reviewed By: dskhudia Differential Revision: D18282373 fbshipit-source-id: 993d584e7fa8e07c74597304c0fd9386f7ed0e41	2019-11-05 00:58:49 -08:00
Geeta Chauhan	9e314f557f	Fix for torch.save not saving source files (#28965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28965 Fixed the reference to correct object Test Plan: Added new unit test test_serialization_save_warnings in test_torch Verified by running the test_torch tests Imported from OSS Differential Revision: D18306797 fbshipit-source-id: bbdc7a1aa59a395fcbb736bcc7c3f96db45454d3	2019-11-04 23:16:51 -08:00
Will Feng	026fd36c71	Use at::kLong for torch::tensor(integer_value) when dtype is not specified (#29066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29066 This PR is BC-breaking in the following way: Previously, C++ `torch::tensor` with an integer literal or a braced-init-list of integer literals produces a tensor with dtype being the type of the integer literal(s). After this PR, it always produces a tensor of dtype `at::kLong` (aka. int64_t), matching Python `torch.tensor` behavior. Test Plan: Imported from OSS Differential Revision: D18307248 Pulled By: yf225 fbshipit-source-id: 7a8a2eefa113cbb238f23264843bdb3b77fec668	2019-11-04 21:39:10 -08:00
Kevin Chen	1189f559cc	Creating new layer FCWithBootstrap used in bootstrapping uncertainty approach (#29152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29152 Bootstrapping uncertainty approach: bootstrap the last layer before the last fully-connected layer. FCWithBootstrap is a new layer to handle the logic for the bootstrapping process. Goal: - return a struct with the bootstrapped indices and bootstrapped predictions from this layer - separate the functionality in the train_net and eval_net - save the bootstrapped FC in this object so that the eval_net can use them during prediction time Reviewed By: wx1988 Differential Revision: D17822429 fbshipit-source-id: 15dec501503d581aeb69cb9ae9e8c3a3fbc7e7b5	2019-11-04 21:18:15 -08:00
Kevin Chen	56f7415795	L0 norm approx with budget (#29155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29155 Update the L0 norm regularizer with a budget feature to penalize features over this limit Formula and summary: {F212248495} Test Plan: * Unit test located in: ~/fbsource/fbcode/caffe2/caffe2/fb/dper/layer_models/tests/split_1/fsparse_nn_test.py Reviewed By: un-disclosed, wx1988 Differential Revision: D17458138 fbshipit-source-id: 2ed9ce6f55573b0bfc0fefbfd392f90c7542a0fd	2019-11-04 21:09:53 -08:00
svcscm	64cbea0fbb	Updating submodules Summary: GitHub commits: `0432ab3260` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 77234d9f3baf213270258a6cc21bf4e3cb75ca7f	2019-11-04 20:00:42 -08:00
Zafar Takhirov	974702fba0	Removing quantization from the dispatcher. Changing the message. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29054 Test Plan: Imported from OSS Differential Revision: D18276881 Pulled By: z-a-f fbshipit-source-id: 3adee0bc784b13f2e00a643d2a96447cb666806d	2019-11-04 17:31:20 -08:00
David Reiss	0d9dc469cc	Introduce math_compat.h for older Android versions (#28567 ) Summary: When building with Android NDK platforms prior to android-21, and when building for Android with libstdc++, there are some gaps in the C and C++ standard libraries. We use both for our internal 32-bit builds, so we need PyTorch to support this platform. All of the gaps are filled with this math_compat.h header, which needs to be included in any file that uses one of the functions that are not properly defined on Android. The file is a bit hack-tastic, but it is only used on a platform that is not receiving updates, so there shouldn't be a risk of breakage in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28567 Test Plan: Internal android build. Differential Revision: D18099513 Pulled By: dreiss fbshipit-source-id: 020aab19c6fa083206310b018925d92275d4a548	2019-11-04 17:26:17 -08:00
Xiaodong Wang	cb72c9f5b1	Make caffe2/fb folder compatible with AMD (#29131 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29131 caffe2_pb2.CUDA --> workspace.GpuDeviceType workspace.NumCudaDevices() --> workspace.NumGpuDevices() Also added the totalGlobalMem into get_device_properties(), which is needed by multi_gpu_utils.py Test Plan: sandcastle f148921769 Reviewed By: bddppq Differential Revision: D18290090 fbshipit-source-id: bde7c175d1fb6ff59a062266c1b17de39d113b24	2019-11-04 16:40:29 -08:00
Xiang Gao	02e34919ae	Bring back the stack #28426 with Windows build fixed (#28843 ) Summary: ezyang This brings back the stack https://github.com/pytorch/pytorch/pull/28426 with hopefully windows build fixed. Let's wait for the CI to see what happens. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28843 Differential Revision: D18224616 Pulled By: ezyang fbshipit-source-id: e13051e9ff9cb8d437a733b2c89b4172a379cafc	2019-11-04 16:32:56 -08:00
Edward Yang	df22e4c157	Remove Unicode characters from header, fixing lint. (#29126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29126 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18300420 Pulled By: ezyang fbshipit-source-id: d9b3ec75098cdb54624e4f98d4c66db1f4ff62bd	2019-11-04 15:07:37 -08:00
Edward Yang	379f3ae3ea	Double fetch depth. (#29030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29030 Might fix #27648 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18300252 Pulled By: ezyang fbshipit-source-id: 542c16b6c1e78c2f9cc45e567f2e0cd1d4272ee3	2019-11-04 15:04:48 -08:00
Edward Yang	25261a4776	Merge Tensor and Variable. (#28620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28620 All Tensors are Variables now, they just happen to have requires_grad=False. Tensors ALWAYS have `VariableTensorId` in their type set. When constructing this patch, I had to make decisions about what I would fix in this patch, and what I would leave for follow up PRs. Here is the cleanup that happens in this patch: - The `is_variable` property is removed from TensorOptions. I removed this immediately because unlike Tensor::is_variable, TensorOptions::is_variable doesn't respect our VariableTensorId thread-local state. This means that there were a bunch of places where TensorOptions::is_variable was false, which is obviously bogus in the world when tensor and variable are merged. Instead of keeping the method as a function that always returns true, I just opted to remove it entirely (it's not public API.) All places we set `is_variable` are deleted. - Knock on effect: there is no longer a separate DeprecatedTypeProperties for the variable and non-variable versions of type. - Knock on effect: instead of asserting on TensorOptions::is_variable, instead we just test `at::impl::variable_is_excluded()` - There is now only one copy of the cuDNN RNN dropout cache, not two (I'm not sure why we had two to begin with) Some cleanup that doesn't happen in this patch: - Eliminating unnecessary uses of `make_variable` - Eliminating `Tensor::is_variable` The most subtle part of this patch is retaining tracing behavior: the fact that everything is a Variable means that more code gets routed to VariableType than before; this can change traces. I identified two places where we didn't appropriately turn off VariableType, mostly factory functions: - `torch.tensor` must turn off VariableType before invoking `at::empty` to construct the tensor, as it subsequently does direct data access - `tensor_slow` (invoked when you pass a Python scalar to a tensor argument) must turn off VariableType before calling `scalar_to_tensor` so the scalar gets traced as constant, rather than as a call to `scalar_to_tensor`. Honestly, these are all giant hacks, and should be replaced with a more specialized guard that just toggles tracing. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D18171156 Pulled By: ezyang fbshipit-source-id: 5b6a045beba37492647e350190f495114e86504d	2019-11-04 14:59:57 -08:00
Edward Yang	215ac1065a	Print which output didn't have dependence. (#29047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29047 When a tuple is returned, it is helpful to know specifically which output was the culprit. Actually, it was somewhat /more/ helpful to actually see the contents of the tensor which didn't have dependence (or, e.g., the backtrace of the code that populated it), but that seemed a step too far. ghstack-source-id: 93091993 Test Plan: manually tested because I was debugging an incorrect trace and looked to see that the output number was indeed identifying the correct tensor. Reviewed By: dreiss Differential Revision: D18274323 fbshipit-source-id: f1551bb03a3cdfa58b9e7f95736d53f317f53d5e	2019-11-04 14:59:53 -08:00
svcscm	150357c887	Updating submodules Summary: GitHub commits: `f746854f94` `99e8fc1fc4` `1c0794abd7` `126d5bb8c5` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 51255fdf8b51237d08afa5362d9dc19e6961ea28	2019-11-04 14:56:13 -08:00
Rohan Varma	fd0f9811ad	add timeout for RPC futures, and ability to set timeout when initializing rpc (#28392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28392 Per #25531, we want to clean up futures when we detect that there are failures/timeouts. As a first step, this diff adds timers to the future object, provides functionality to check if a future is timed out, and allows specification of the timeout when initializing rpc. A future diff will check for these timeouts and mark the future completed with an exception indicating that it has timed out. ghstack-source-id: 93192622 Test Plan: Added unit tests. Differential Revision: D18025163 fbshipit-source-id: 195fb50c736caf5c7b2bada9a5f6116bb106ed33	2019-11-04 14:43:03 -08:00
Elias Ellison	60cb56d128	Refactor iterables (#29138 ) Summary: Refactor list comprehensions so they go through the same path as other for loops, making List Comprehensions work with modulelists, also fixing https://github.com/pytorch/pytorch/issues/27255 Replacing https://github.com/pytorch/pytorch/pull/28296 which was gh-poisoned and previously accepted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29138 Differential Revision: D18303432 Pulled By: eellison fbshipit-source-id: 8e4c0ba6f800142d5c4d921d56917cfae0c74655	2019-11-04 14:39:22 -08:00
Spandan Tiwari	7560b8c5a7	Modify ONNX constant folding test point in test_utility_funs.py for clarity (#28861 ) Summary: This is a minor update to the test point `TestUtilityFuns.test_constant_fold_concat` in `test/onnx/test_utility_fun.py` for clarity. Unlike before, the test model forward() method now uses the input `x`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28861 Differential Revision: D18306881 Pulled By: houseroad fbshipit-source-id: dda8b4123e7646c2e416ce914a4698f9b96e2a6c	2019-11-04 14:37:01 -08:00
Edward Yang	7102aceaf8	Default to not build Caffe2 operators on Windows. (#29061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29061 It looks like we are too close to the maximum library size on Windows. Kill Caffe2 operators to get us lower again. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D18281083 Pulled By: ezyang fbshipit-source-id: 8a11f9059dbf330f659bd96cc0cc2abc947723a8	2019-11-04 14:32:47 -08:00
Mingzhe Li	044ff91950	reduce predefined_min_secs for execution time (#29142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29142 as title Test Plan: ``` Before this diff: # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 122.965 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 229.735 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 950.455 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu_bwd1 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 826.893 After this diff: buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test; Parsing buck files: finished in 0.7 sec Building: finished in 02:35.7 min (100%) 7281/7281 jobs, 1 updated Total time: 02:36.4 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 125.021 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 244.076 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 946.280 # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu_bwd1 # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 863.835 Reviewed By: hl475 Differential Revision: D18305676 fbshipit-source-id: d382084e39b87c554084891f87701b87cd2d3800	2019-11-04 14:29:00 -08:00
Jongsoo Park	20e8634999	pass more arguments to Int8ConvPackWeight op in unit tests (#29086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29086 For Int8ConvPackWeight to decide which convolution implementation should be used, we need to pass more arguments. Test Plan: CI Reviewed By: dskhudia Differential Revision: D18286931 fbshipit-source-id: d178cc6d696d0e83aad18bb34eb071f44b0c2015	2019-11-04 13:55:24 -08:00
Riddhiman Dasgupta	7fb2ccaed8	Update type definitions for nn.Identity (#29135 ) Summary: Updated PR instead of https://github.com/pytorch/pytorch/issues/29114 Running mypy on the following code is throwing an error, Module has no attribute Identity: ``` import torch.nn as nn layer = nn.Identity() ``` Using the following instead does not give an error: ``` import torch layer = torch.nn.Identity() ``` CC: ezyang soumith (Sorry for causing the revert previously! Hope this one works fine!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29135 Differential Revision: D18306331 Pulled By: ezyang fbshipit-source-id: f10be8a0cccecef423184d009bad8be6d54098a5	2019-11-04 13:27:13 -08:00
xiaobing.zhang	e01324d058	Port l1_loss to Aten (#26795 ) Summary: VitalyFedyunin, This PR is about port L1 lose to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" loss = nn.L1Loss(reduction = 'sum') if torch.cuda.is_available(): device = "cuda" loss = loss.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) target = torch.randn(128, n, device=device) for i in range(1000): output = loss(input, target) output.backward() #get running time for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) target = torch.randn(128, n, device=device) for i in range(10000): t1 = _time() output = loss(input, target) t2 = _time() output.backward() t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P100. Perfromance: Before: ``` GPU: reduction=’mean’ nput size(128, 100) forward time is 0.31 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 0.33 (ms); backwad avg time is 0.14 (ms). reduction=’sum’ input size(128, 100) forward time is 0.31 (ms); backwad avg time is 0.10 (ms). input size(128, 10000) forward time is 0.34 (ms); backwad avg time is 0.14 (ms). CPU: reduction=’mean’ input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.10 (ms). input size(128, 10000) forward time is 1.92 (ms); backwad avg time is 2.96 (ms). reduction=’sum’ input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 1.96 (ms); backwad avg time is 2.79 (ms). nume_thread = 1: reduction=’mean’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 1.67 (ms); backwad avg time is 2.50 (ms). reduction=’sum’: input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 1.67 (ms); backwad avg time is 2.51 (ms). ``` After: ``` GPU: reduction=’mean’ input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.10 (ms). input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.17 (ms). reduction=’sum’ input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.08 (ms). input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.16 (ms). CPU: reduction=’mean’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.14 (ms); backwad avg time is 0.18 (ms). reduction=’sum’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.15 (ms); backwad avg time is 0.17 (ms). nume_thread = 1: reduction=’mean’: input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.06 (ms). input size(128, 10000) forward time is 1.05 (ms); backwad avg time is 1.72 (ms). reduction=’sum’: input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 1.03 (ms); backwad avg time is 1.71 (ms). ``` How to set number thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run `./run.sh 1 L1loss.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26795 Differential Revision: D18140434 Pulled By: VitalyFedyunin fbshipit-source-id: d0b976ec36797f2e6b4e58fbbac89688d29e736f	2019-11-04 13:20:07 -08:00
Negin Raoof	ebc216a076	Opset 11 updates (#28225 ) Summary: This PR contains: 1- pad updates for opset11 symbolic 2- Updated avg_pool for opset11 3- TopK updates for opset 11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28225 Reviewed By: hl475 Differential Revision: D18282928 Pulled By: houseroad fbshipit-source-id: aff2cabca9a155a9b475e35fed69a678544d6669	2019-11-04 12:16:12 -08:00
svcscm	669662cd2f	Updating submodules Summary: GitHub commits: `c72ce78355` `3c1420258d` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 48e6a2175e768dbbaf67dfec557c7741808a9458	2019-11-04 12:08:18 -08:00
Ilia Cherniavskii	7190789f58	Handling of failing and terminal async cpu ops (#29052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29052 Make sure we handle the case of multiple, async, terminal (no children) and failing cpu ops. Test Plan: AsyncIf tests Reviewed By: yyetim Differential Revision: D18276401 Pulled By: ilia-cher fbshipit-source-id: 35b175dd025bc7e392056ac1331b159376a29e60	2019-11-04 12:01:21 -08:00
Hong Xu	19ac5929e2	Remove definitions of acosh and asinh from TH (#28696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28696 They are not used anywhere. Test Plan: Imported from OSS Differential Revision: D18302769 Pulled By: VitalyFedyunin fbshipit-source-id: 8680951cbceb607ef545f92cbfa9204ce8f7ac4a	2019-11-04 11:56:25 -08:00
svcscm	24d43750ee	Updating submodules Summary: GitHub commits: `b4d85028d8` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 0417fe507561cde8b7739ae289a8d16d1429bea5	2019-11-04 11:04:26 -08:00
Xiaomeng Yang	69b1d71427	Fix GELU module docs (#29112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29112 Fix GELU module docs Test Plan: unittest Reviewed By: hl475 Differential Revision: D18297681 fbshipit-source-id: 6b86a1a58c62fbb3b1395639271ee16c4043d03d	2019-11-04 10:45:00 -08:00
Xingying Cheng	00a561a23a	Fix build error caused by recent commits. (#29056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29056 There are a couple of recent published diffs break the internal pytorch build, so fix it here. ghstack-source-id: 93101569 Test Plan: buck install -r aidemos-android buck install -r fb4a Reviewed By: iseeyuan Differential Revision: D18236331 fbshipit-source-id: e1cecae8c30fd9b23b6bf379f652b4926542618d	2019-11-04 10:13:09 -08:00
Edward Yang	93acd1998f	Revert D18249048: Moved VonMises distribution with sampling upstream from Pyro. Test Plan: revert-hammer Differential Revision: D18249048 Original commit changeset: 3e6df9006c7b fbshipit-source-id: 001666e4b5b9879d36147bacfc761ea661ded900	2019-11-04 09:50:50 -08:00
svcscm	0a4433750e	Updating submodules Summary: GitHub commits: `9588c8bbf9` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: b882fe57643c1983e9859290e2dddec198a78ed0	2019-11-04 09:24:25 -08:00
Elias Ellison	fdeef45852	Add Support For Module Containers as Iterables (#28255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28255 Add support for treating Sequentials, ModuleLists, and ModuleDicts as iterables. As previously, when emitting a for loop over a Module Container we unroll the for loop over all elements. We require that any Sugared Value in an iterable with a Module Container have a statically - determinable length. Otherwise, if you zipped over a list of varying length and an nn.Sequential that alternated between returning a Tensor and a Dictionary, the output type would change based on the length of the list. Fix for #17179 And https://github.com/pytorch/pytorch/issues/27401 and https://github.com/pytorch/pytorch/issues/27506 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D18278124 Pulled By: eellison fbshipit-source-id: aca336a5b8da89c756b1f0884883649510cbde3c	2019-11-04 09:19:40 -08:00
Jie	8160f390cf	(#23861 ) Summary: Added nhwc support for: 1. cudnn_batch_norm & cudnn_batch_norm_backward 2. cudnn_convolution_forward & cudnn_convolution_backward 3. cudnn_convolution_transpose & cudnn_convolution_transpose_backward patching suggest_memory_format for convolution suggest_memory_format has ambiguous meaning for two cases: 1. tensor with NCHW where C = 1. we could use stride of C as a hint to tell the intended memory format. 2. tensor with NCHW where H == W == 1. there's no way to identify the intended memory format from strides. Currently we fallback to NCHW whenever we see contiguous tensor. Hence avoiding ambiguity for some of the special cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23861 Differential Revision: D18263434 Pulled By: VitalyFedyunin fbshipit-source-id: dd9f69576ec12fec879cd87a3d446931371360d9	2019-11-04 09:11:50 -08:00
Jie	70f3f23e3a	(#29016 ) Summary: Adding limitation on launch config for grid size Test added in test_cuda; Pull Request resolved: https://github.com/pytorch/pytorch/pull/29016 Differential Revision: D18293788 Pulled By: ngimel fbshipit-source-id: 44de308b05a4fe44bfffc2f3713fd9fa67ef74fa	2019-11-04 08:50:18 -08:00
Ahmad Salim Al-Sibahi	0f97e08a36	Moved VonMises distribution with sampling upstream from Pyro. (#17168 ) Summary: At the encouragement of Pyro developers and https://github.com/pytorch/pytorch/issues/13811, I have opened this PR to move the (2D) von Mises distribution upstream. CC: fritzo neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/17168 Differential Revision: D18249048 Pulled By: ezyang fbshipit-source-id: 3e6df9006c7b85da7c4f55307c5bfd54c2e254e6	2019-11-04 08:44:11 -08:00
Gerard Goossen	7ff39d2942	LayerNorm: Handling if batch size is zero (#28614 ) Summary: Handling of empty example was giving a cuda error. Adding getLastError check to make sure cuda errors are attributed to the correct function (instead of currently it was attributing the error to the next cuda operator). Added special case for batch-size zero, also added to cpu to keep things consistent. Resubmit of D18085429 without stacked commits Pull Request resolved: https://github.com/pytorch/pytorch/pull/28614 Test Plan: test included Differential Revision: D18122212 Pulled By: ggoossen fbshipit-source-id: 8c6741a157a9fbbc82685d81a6f8021452b650d4	2019-11-04 08:37:19 -08:00
Alexander Golynski	23695ab23f	Moving python allgather_coalesced impl from Py to C. (#29059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29059 This is a resubmit of reverted diff D18209289 ( PR #28857 ). Test Plan: buck test caffe2/test:c10d buck test caffe2/test:distributed_gloo Reviewed By: pietern Differential Revision: D18277097 fbshipit-source-id: aecfd7206d70829f0cac66182bf02fccee410fed	2019-11-04 08:34:34 -08:00
Edward Yang	a1386bd950	Fix smoketests by running them with postnightly job. (#28994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28994 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18273476 Pulled By: ezyang fbshipit-source-id: de59faa49c13198c18e61fdb05ab1d3d7cc16e08	2019-11-04 08:30:17 -08:00
Edward Yang	0fbce15828	Retry conda installation on OS X. (#28979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28979 Fixes #28969 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18273477 Pulled By: ezyang fbshipit-source-id: 9bcc10034a4ad7d55709dd54735d60500043da65	2019-11-04 08:30:13 -08:00
xiaobing.zhang	a90389f20e	Port cuda sigmoid to Aten(CUDA) (#26643 ) Summary: VitalyFedyunin, this PR port cuda sigmoid to Aten: https://github.com/pytorch/pytorch/issues/24624; TH/THC sigmoid code can 't be removed because the sigmoid_backward in THNN/THCUNN rely on it. I will port sigmoid_backward to Aten next step, incuding CPU and CUDA, which will remove the sigmoid code in TH/THC . Test script: ``` import timeit device = "cuda" for n, t in [(10, 100000),(1000, 10000)]: print('a.sigmoid() (a.numel() == {}) for {} times'.format(n, t)) for dtype in ('torch.float', 'torch.double', 'torch.half'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.sigmoid()\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.ones({n}, device="{device}", dtype={dtype})', number=t)) ``` Device: Tesla P40 Before: ``` a.sigmoid() (a.numel() == 10) for 100000 times device: cuda, dtype: torch.float, 100000 times 1.2853778750286438 device: cuda, dtype: torch.double, 100000 times 1.2787265420192853 device: cuda, dtype: torch.half, 100000 times 1.2610833930084482 a.sigmoid() (a.numel() == 1000) for 10000 times device: cuda, dtype: torch.float, 10000 times 0.1274153349804692 device: cuda, dtype: torch.double, 10000 times 0.13953313598176464 device: cuda, dtype: torch.half, 10000 times 0.1265286349807866 ``` After: ``` a.sigmoid() (a.numel() == 10) for 100000 times device: cuda, dtype: torch.float, 100000 times 1.275270765996538 device: cuda, dtype: torch.double, 100000 times 1.285128042974975 device: cuda, dtype: torch.half, 100000 times 1.2761492819990963 a.sigmoid() (a.numel() == 1000) for 10000 times device: cuda, dtype: torch.float, 10000 times 0.12851508799940348 device: cuda, dtype: torch.double, 10000 times 0.13738596899202093 device: cuda, dtype: torch.half, 10000 times 0.12715664599090815 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26643 Differential Revision: D17666550 Pulled By: VitalyFedyunin fbshipit-source-id: 376479d94d0649c171fd0b2557699bbdd050fec3	2019-11-04 07:40:06 -08:00
Edward Yang	bbada862dc	Revert D18298225: Update modules/__init__.pyi.in to include Identity Test Plan: revert-hammer Differential Revision: D18298225 Original commit changeset: b271bf000868 fbshipit-source-id: 77667adf6817a242f4f2e4eaa7ea8190f5090c49	2019-11-04 07:28:56 -08:00
Riddhiman Dasgupta	a0dc060682	Update modules/__init__.pyi.in to include Identity (#29114 ) Summary: Running mypy on the following code is throwing an error, `Module has no attribute Identity`: ``` import torch.nn as nn layer = nn.Identity() ``` Using the following instead does not give an error: ``` import torch layer = torch.nn.Identity() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29114 Differential Revision: D18298225 Pulled By: soumith fbshipit-source-id: b271bf00086876cca8d63ae0cde6cebf69a7051e	2019-11-04 06:33:03 -08:00
Xiaomeng Yang	2460dced8f	Add torch.nn.GELU for GELU activation (#28944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28944 Add torch.nn.GELU for GELU activation Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "GELU" Reviewed By: hl475, houseroad Differential Revision: D18240946 fbshipit-source-id: 6284b30def9bd4c12bf7fb2ed08b1b2f0310bb78	2019-11-03 21:55:05 -08:00
James Webber	3bffb730b6	Add note about when to install typing package (#29103 ) Summary: Was just trying to build pytorch from source and had a small hiccup because the instructions say to `conda install typing`. Because `typing` is a built-in module in recent Python 3 versions, conda interpreted that to mean that I want Python 2. So I added a note to the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29103 Differential Revision: D18294139 Pulled By: soumith fbshipit-source-id: 621a2f62ebe870520197baec8f8bcdc1a0c57de9	2019-11-03 19:38:55 -08:00
Wanchao Liang	e95dc9814e	introduce module interface declaration (#28408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28408 This enable interface to defined on a nn.Module, and the InterfaceType now have a field of is_module_ to distinguish if it's a module interface or a normal interface (This is similar to what ClassType distinguish on module and torchscript classes). The module interface can be assigned with any ScriptModule that has the compatible signatures on schemas. A normal object that is not a ScriptModule will not be able to assigned to an module interface and will error out when user explicitly doing so. Assigning a ScriptModule to class interface will make it only available in attribute_list, not module_list. More details on subtyping relationship documented in the jit_type.h If you declare an module interface inside an nn.Module that is being compiled to a ScriptModule, behavior to our internal compilation will be: 1. ConcreteModuleType will record it as an module attribute and add to the attributes_ list. 2. JitType that is created from the ConcreteModuleType will record it as an attribute and pre-genenerate the slot. The slot will be marked as EntityType::MODULE still to make sure JitType record it as a Module slot 3. cpp_module will also register it as a Module as the Slot type is the source of truth Since JitType will record it as attribute as store its type, it will behave normally as the class interface attribute behave now. This means the submodule assigned to this module interface is not getting inlined into the graph as the normal `Module::attr` behave, it will generate interface callMethod and allow us to later swap this with another ScriptModule that implicitly implements this module interface. Test Plan: Imported from OSS Differential Revision: D18284311 fbshipit-source-id: e0b8f6e8c34b2087fab337a969e5ea3fb37ec209	2019-11-02 16:39:00 -07:00
Wanchao Liang	1e904049ca	guard against inheritance on torchscript classes (#28407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28407 Given that we do not have support for inheitance or any polymorphism strategy yet, we should guard against user from using it until we get the full support so that user won't confuse by the weird behaviors. Test Plan: Imported from OSS Differential Revision: D18284310 fbshipit-source-id: f55a224f4190d57926d91ed98f6168d787387eb8	2019-11-02 16:38:56 -07:00
Tao Xu	73d77626b8	Check device connection before running xcodebuild (#28996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28996 ### Summary It'd be frustrated to realize the device is not connected after waiting for the build finishes. This PR checks the device connection status before xcodebuild. ### Test Plan - Don't break `bootstrap.sh` Test Plan: Imported from OSS Differential Revision: D18258348 Pulled By: xta0 fbshipit-source-id: dda90e7194114e99b2774a3b64ed41f78221f827	2019-11-02 14:38:08 -07:00
svcscm	0c5e738cf7	Updating submodules Summary: GitHub commits: `612ae995a6` `a6f5d4d621` `799f6a8c0d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 365832bf627e855c0aa15e083a873894368b0cfd	2019-11-02 14:38:04 -07:00
svcscm	496d23224f	Updating submodules Summary: GitHub commits: `7cb2e01c52` `0d91a981e9` `adeb2b0e38` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9dbe4512464819932e1a82ae08c3ab37e7f7c1ff	2019-11-01 19:03:44 -07:00
Ashkan Aliabadi	1345dabb1d	Only set CCACHE_WRAPPER_PATH in the build scripts if it is not already passed in. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29002 Test Plan: Imported from OSS Differential Revision: D18277225 Pulled By: AshkanAliabadi fbshipit-source-id: eb70607790754cd5d214133967404242c05dd5d5	2019-11-01 18:39:12 -07:00
Pritam Damania	e8e7d93293	Additional autograd unit tests for Python UDFs. (#29041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29041 1) Enhanced autograd unit tests to test the torch.distributed.autograd.backward() API more thoroughly on Python UDFs. 2) Enhanced `python_error` to override `what` such that it returns an appropriate error string if we call `what()` on this error. This ensures we can propagate exceptions over the wire during RPCs (since we get the error string by calling what() on the exception) ghstack-source-id: 93098679 ghstack-source-id: 93098679 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D18273041 fbshipit-source-id: 85d3932fed6337668a812367fdfce233c1b3ff8e	2019-11-01 18:30:09 -07:00
nuka137	a68c1e109e	C++ API: torch::nn::BatchNorm{2,3}d (#28936 ) Summary: Add torch::nn::BatchNorm{2,3}d module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 #28176 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28936 Differential Revision: D18274584 Pulled By: yf225 fbshipit-source-id: 3784eee9f8947f6c7c9f1699544a3d36a1a019b7	2019-11-01 17:50:33 -07:00
Jerry Zhang	23193c155f	Quantized Tensor support copy (#28612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28612 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18255247 fbshipit-source-id: 814b12640fdf9d79b27482ee642ce430dbaeea68	2019-11-01 17:40:17 -07:00
Edward Yang	41e42c34d6	Revert D17989951: Move unboxed dispatch decision into dispatcher Test Plan: revert-hammer Differential Revision: D17989951 Original commit changeset: b343d9650deb fbshipit-source-id: 0d2f470bab47e40fcffd5ec23f88549da15af873	2019-11-01 14:11:59 -07:00
Kevin Wilfong	cddda17394	ParallelWorkersTest.testParallelWorkersInitFun is flaky (#29045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29045 Addressing an issue seen in GitHub https://github.com/pytorch/pytorch/issues/28958 It seems sometimes the workers in this test don't stop cleanly. The purpose of this test is to check that the init_fun in init_workers works as expected, which is captured by the assertEqual in the for loop in the test. The behavior of stop() is not really important here. The fact it's returning false is probably indicative that a worker is getting blocked but that doesn't affect the correctness of the test. Test Plan: Ran the test 100 times, it consistently succeeds. Reviewed By: akyrola Differential Revision: D18273064 fbshipit-source-id: 5fdff8cf80ec7ba04acf4666a3116e081d96ffec	2019-11-01 13:59:02 -07:00
Michael Carilli	314066bd74	Making torch/csrc/cuda nccl usage safe for nccl 2.5 (#29014 ) Summary: Thanks to AddyLaddy ptrblck for tracking this fix down. In torch/csrc/cuda/nccl.cpp and torch/csrc/cuda/python_nccl.cpp, construction of the `AutoNcclGroup` guard (which calls `ncclGroupStart()`) [precedes](https://github.com/pytorch/pytorch/pull/29014/files#diff-3b6a42619dd44000cf58c0328b679a1cL239-L241) a possible call to `get_communicators`, which may call `ncclCommInitAll()`. Calling `ncclCommInitAll()` within a `ncclGroupStart()/End()` is incorrect according to our Nccl people. It seemed ok (relevant tests were silently passing) as long as Pytorch was compiled/linked against Nccl 2.4.x (which is currently what's locked into your third_party/nccl subrepo). However, when we tried to compile and link against Nccl 2.5.x in internal builds, we began to see test hangs (TestAutogradDeviceTypeCUDA.test_unused_output_device_cuda was what initially brought it to our attention). The present PR fixes those hangs, as far as we know, and will prevent a nasty future surprise when you start building against nccl 2.5. The backend affected by this PR is exposed via https://github.com/pytorch/pytorch/blob/master/torch/cuda/nccl.py. I'm not sure if the exposure is actually used anywhere (I think the distributed frontend is now backed by ProcessGroupNCCL in torch/lib/c10d). So this PR may affect code that is already dead or dying, but still tested, it seems. I skimmed ProcessGroupNCCL.cpp for potential similar vulnerabilities and didn't spot anything obvious. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29014 Differential Revision: D18274799 Pulled By: ezyang fbshipit-source-id: c5f88cf187960d61736be14458be01e3675c6702	2019-11-01 13:53:31 -07:00
Xiang Gao	d8d7af0811	Fix CUDA shared memory out of bound access in findPattern (#28989 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/28789 Only the first two elements of `smem` are used in this function but at the beginning, it resets all the `C10_WARP_SIZE` to 0. When the `scalar_t` is 64bit, it goes out of the total shared memory size which is `sizeof(int) * C10_WARP_SIZE`, although this does not lead to any failure in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28989 Differential Revision: D18271598 Pulled By: ngimel fbshipit-source-id: 38cc863722509892646f719efb05e2730a7d9ae1	2019-11-01 13:50:25 -07:00
Nikolay Korovaiko	bace0c8d7a	remove a redundant move preventing a copy elision Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29040 Differential Revision: D18272902 Pulled By: Krovatkin fbshipit-source-id: 23d4546aeb8945b7c7a5d472f543171699fc08b9	2019-11-01 13:13:00 -07:00
Mingzhe Li	b693c5d6a0	replace add benchmark with add_ (#29050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29050 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add_ # Mode: Eager # Name: add__M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 31475.766 Reviewed By: hl475 Differential Revision: D18265767 fbshipit-source-id: 7aaa04f5fa5b2dd58bbc1aa045693314032e0ff0	2019-11-01 13:08:27 -07:00
Sergei Nikolaev	1e2049c566	#26426 fixed (#28715 ) Summary: This is the fix for reverted https://github.com/pytorch/pytorch/issues/26426 houseroad bddppq soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28715 Reviewed By: hl475 Differential Revision: D18146731 Pulled By: houseroad fbshipit-source-id: 247366451a6334e84df82d00339521f797b33130	2019-11-01 12:53:01 -07:00
Pavel Belevich	4a94eaa60b	C++ API parity: PoissonNLLLoss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28755 Test Plan: Imported from OSS Differential Revision: D18202436 Pulled By: pbelevich fbshipit-source-id: a7a27d5f3cdbcbbd9bbbffa02b576609d5fdc9b3	2019-11-01 12:35:59 -07:00
Zafar Takhirov	7ea83120df	Fixing the shape calculation for pool tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28853 Test Plan: Imported from OSS Differential Revision: D18212290 Pulled By: z-a-f fbshipit-source-id: 44a41f3192c8b168a8a0fb68eb33b68400917c7a	2019-11-01 12:29:27 -07:00
Jerry Zhang	5ac3df7712	Minor fix and turn off fold_convbn (#27403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27403 In fold_convbn pass, we need to recompute the parameter(weight, bias) for conv, update the attribute of conv and update the access of bias in conv because if the original conv have no bias, the `self.bias` access will be inline and replaced by Constant node `None = prim::Constant()`, we need to update this to use `GetAttr[name="bias"]` to make this work. But there is also some work going on the handle constants, so we'll fix this pass after that is done. Test Plan: . Imported from OSS Differential Revision: D18182918 fbshipit-source-id: bba510bc41ab58e0eb76f7b77335b6e3ffe2862d	2019-11-01 12:15:38 -07:00
Jerry Zhang	d690521cf6	Add e2e test for conv+bn (#27348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27348 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18182920 fbshipit-source-id: 40edc4d85903f979cd4755d6785d2842faa4d566	2019-11-01 11:28:47 -07:00
Shen Li	9041e29d94	Revert D18209289: Moving python allgather_coalesced impl from Py to C Test Plan: revert-hammer Differential Revision: D18209289 Original commit changeset: c5a4c4a1aaa0 fbshipit-source-id: d4865e3f8c4eeee285c711e5c2250b8c9f9b0d25	2019-11-01 11:23:41 -07:00
Hong Xu	dbbb2fc9e5	Remove the linkage to CUDA libraries when ROCM is used. (#29009 ) Summary: Currently when ROCM is used, CUDA libraries are still linked. There has been no error because USE_CUDA is set to OFF upon a preliminary check in tools/setup_helper/cuda.py, and no CUDA variable is set. Hence, these lines can pass simply because those variables are always undefined, and thus expanded to empty strings. But this cannot be safely relied on, and is causing https://github.com/pytorch/pytorch/issues/28617 to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29009 Differential Revision: D18273472 Pulled By: ezyang fbshipit-source-id: b8b6580e8a44d874ac678ed9073412d4d2e393ee	2019-11-01 11:18:21 -07:00
svcscm	a49a656264	Updating submodules Summary: GitHub commits: `efdfedc749` `15a29e620b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: df306a4b693299f76d904bf15f24bb2cf367ab30	2019-11-01 11:11:58 -07:00
Richard Zou	71be5fe54e	add support for {ones,zeros,full,rand,randn}_like ops (#28981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28981 This PR adds support for calling those functions on named tensors. The implementation is not the nicest: in the future we have plans to merge names into TensorOptions, at which point we don't need the extra branches that check if the tensor has names. Right now, however, these functions are very useful to have (in particular, ones_like is used by autograd to generate gradients). Test Plan: - Added tests for each of these Differential Revision: D18270937 Pulled By: zou3519 fbshipit-source-id: 720739ff0474449a960b81728345a4250becbfc3	2019-11-01 11:04:42 -07:00
Richard Zou	0a101bf8d5	Improve name inference API by introducing a TensorName helper struct (#28904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28904 Motivation ============ Before this PR, a core problem with writing name inference rules was that each rule needed to handle misalignment by themselves. A misaligned name occurs when we are matching None with a non-None name, but the non-None name already exists in the first tensor. For example, `A` is misaligned in `Tensor[A, None] + Tensor[None, A]`. Each op handled this in a custom way - align_from_right (used by broadcasting) handles misalignment - compute_matmul_outnames checks for misalignment across batch and feature dimensions. We can actually codify "misalignment" into something more rigorous by folding it into the definition of `match` and eliminate special handling of "misalignment". That is what this PR attempts to do. Approach ============ Definition: Two names in two tensors match if they are equal, or if at least one of them is a wildcard that can be refined to the other name. With this new definition, to check if two names match, we need to know about the names list that each name came from to determine if a wildcard can successfully be refined to the other name. For example, consider the following: ``` tensor: Tensor[A, None] other: Tensor[None, A]` ``` when unifying `tensor.names[-1]` with `other.names[-1]`, we see that `tensor.names[-1]` is None and `other.names[-1]` is A. Then we check to see if `tensor.names[-1]` can be refined to `A`; it can't be refined if there is already an `A` in `tensor.names`. Enter `TensorNames`. A TensorName represents a Dimname associated with some DimnameList (that came from a Tensor). `TensorNames` is a list of such TensorName objects with some helper functions attached. One can perform the following operations: - unify two `TensorName` objects - unify two `TensorNames` objects with right alignment. Plan ============ This PR changes `compute_matmul_outnames` to use `TensorNames` to demonstrate how they make writing name inference rules easier. In the future I'll convert other name inference rules to use `TensorNames` as well. Test Plan - run all tests Test Plan: Imported from OSS Differential Revision: D18270666 Pulled By: zou3519 fbshipit-source-id: 3ec96cc957747eb4cfe4ea17fd02ef3d8828a20c	2019-11-01 11:01:48 -07:00
generatedunixname89002005287564	d0204ea92a	Remove dead includes in caffe2/binaries Reviewed By: ezyang Differential Revision: D18136357 fbshipit-source-id: df357c9d4b344b5621b838c2a2657658e10f7000	2019-11-01 10:58:42 -07:00
Edward Yang	bbea34f283	Revert D18266918: C++ API: torch::nn::BatchNorm{2,3}d Test Plan: revert-hammer Differential Revision: D18266918 Original commit changeset: f432904c7298 fbshipit-source-id: 0e1c596b2e2f13b59082ff422c67ba025df4be07	2019-11-01 10:46:49 -07:00
Sebastian Messmer	88a34ef690	Move unboxed dispatch decision into dispatcher (#28251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28251 Before, the dispatch key for unboxed operators from native_functions.yaml was generated in codegen and passed to the c10 dispatcher. Now, we generate it inside of the dispatcher, right next to where the same thing happens for boxed calls. ghstack-source-id: 93085152 Test Plan: unit tests Differential Revision: D17989951 fbshipit-source-id: b343d9650debc62bfcff84cf4d6bdaf9dacc9d16	2019-11-01 10:37:52 -07:00
Alexander Golynski	22a346ee34	Moving python allgather_coalesced impl from Py to C Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28857 Test Plan: buck test caffe2/test:c10d buck test caffe2/test:distributed_gloo Reviewed By: mrshenli Differential Revision: D18209289 fbshipit-source-id: c5a4c4a1aaa07286a05a7c842dda428eeb46f696	2019-11-01 10:34:23 -07:00
nuka137	b7c5b3d398	C++ API: torch::nn::BatchNorm{2,3}d (#28936 ) Summary: Add torch::nn::BatchNorm{2,3}d module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 #28176 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28936 Differential Revision: D18266918 Pulled By: yf225 fbshipit-source-id: f432904c72985d52ec52cb992cceb372b6ff0244	2019-11-01 09:28:58 -07:00
Andreas Koepf	c447941bda	Migrate conv2d from TH to ATen (CPU) (#28793 ) Summary: This is a port of the SpatialConvolutionMM TH (CPU) implementation to ATen as `slow_conv2d`. In practice it is invoked for ungrouped, non-dilated, non-float32 convolutions (e.g. float64, long, bfloat16). - [x] unfolded_copy & unfolded_acc - [x] forward - [x] backward - [x] basic sanity cross check with 1.3 impl - [x] systematic testing - [x] performance comparison & optimization File used for performance testing: [benchmark_conv2d.py](https://gist.github.com/andreaskoepf/c2777b2e5e9d11610f9fc74372930527) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28793 Differential Revision: D18256451 Pulled By: ezyang fbshipit-source-id: d09e84eef11ccf8a6178dfad485fe6fd0ddf0c86	2019-11-01 08:17:53 -07:00
HanGuo97	31c932d9ab	fixed replicate typo in torch/nn/parallel/__init__.pyi (#29005 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/29004 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29005 Differential Revision: D18264637 Pulled By: pbelevich fbshipit-source-id: 03013f668235deca35a58f70732111b53d792de5	2019-11-01 08:00:41 -07:00
albanD	a5d65d1f8f	Fix embedding renormalization on cpu (#28546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28546 Fix #28370 repro Test Plan: Imported from OSS Differential Revision: D18251533 Pulled By: albanD fbshipit-source-id: cd9ab609797b8c887ec9128752cc6a2f58a9aee6	2019-11-01 07:37:15 -07:00
albanD	7776d5bfe9	Update parallel_for/reduce doc (#28545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28545 * #28545 Update parallel_for/reduce doc Test Plan: Imported from OSS Differential Revision: D18251534 Pulled By: albanD fbshipit-source-id: e743e4acfe1a4b5a329c11f7d03efd34d19efda8	2019-11-01 07:37:11 -07:00
Richard Zou	dd288d3b21	support addcmul, addcdiv (#28975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28975 TensorIterator supports propagating names so we just needed to enable them with support_named_tensor: True Test Plan: - really basic tests to test that each variant (outplace, inplace, out=) supports named tensors. Differential Revision: D18252421 Pulled By: zou3519 fbshipit-source-id: ea7fb59dcf8c708b6e45d03b9c2ba27fa6b6ce98	2019-11-01 07:11:58 -07:00
Edward Yang	08860721ad	Revert D18195584: Additional autograd unit tests for Python UDFs. Test Plan: revert-hammer Differential Revision: D18195584 Original commit changeset: b795daf644ba fbshipit-source-id: 413dac34f1a28e0a591893f43e116f006fd3f2be	2019-11-01 06:59:54 -07:00
Carlos Miranda	72b9bda9e5	Smooth L1 loss (#27661 ) Summary: In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `SmoothL1Loss` module and `smooth_l1_loss` functional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27661 Differential Revision: D18002332 Pulled By: yf225 fbshipit-source-id: b382df8becb0de14986ec16ee0dc953d7b10e917	2019-10-31 23:41:35 -07:00
Xiang Gao	1c8ef29ac5	Remove copy-pasted code in THCTensorTopK.cuh (#28995 ) Summary: This is independent from https://github.com/pytorch/pytorch/pull/28989, but when https://github.com/pytorch/pytorch/issues/28989 get landed, this fixes https://github.com/pytorch/pytorch/issues/28792 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/28995 Differential Revision: D18265797 Pulled By: soumith fbshipit-source-id: 6dd7cffd05aa65e4b366f1c40b8bda0a633e3154	2019-10-31 21:26:50 -07:00
Aleks Borovtsov	cd3ed4db76	Update README.md (#28971 ) Summary: Fixed some grammar. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28971 Differential Revision: D18265791 Pulled By: soumith fbshipit-source-id: 778ab3e8a31f5f520a048c089c719c618427eaa6	2019-10-31 21:04:21 -07:00
jokerkeny	aa30176c68	Add C++ API clip_grad_value_ for nn:utils (#28736 ) Summary: Adds C++ API clip_grad_value_ for torch::nn:utils module. Also, fix the for indent level error in the original test/test_nn.py. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28736 Differential Revision: D18263807 Pulled By: yf225 fbshipit-source-id: 29282450bd2099df16925e1d0edd3d933f6eeb9b	2019-10-31 19:11:54 -07:00
Hong Xu	8a1f42b81e	Speed up threshold on CPU. (#27155 ) Summary: This is a small fix, but the runtime improvement does seem consistent (a bit less than 10%): Benchmark (no turbo, Release build, gcc 8.3, RHEL 7.7, Intel(R) Core(TM) i7-8850H): ```python import timeit for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'dtype={dtype}') for n, t in [(70_000, 200000), (700_000, 20000)]: print(f'torch.nn.Threshold(0.1, 20)(a), numel() == {n} for {t} times') print(timeit.timeit(f'm(a)', setup=f'import torch; m=torch.nn.Threshold(0.1, 20); a = torch.arange({n}, dtype={dtype})', number=t)) ``` Before: ``` dtype=torch.double torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 8.88117562699972 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 9.525143070000013 dtype=torch.float torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 5.673380930000349 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 3.677610996000112 dtype=torch.int16 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 3.957677209999929 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 1.8512293700005102 dtype=torch.int32 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 5.624350482999944 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 3.670380037000541 dtype=torch.int64 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 8.86375758200029 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 9.468234717999621 ``` After: ``` dtype=torch.double torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 8.64173036200009 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 9.456986365000375 dtype=torch.float torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 5.431988049000211 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 3.446968590000324 dtype=torch.int16 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 3.743787463999979 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 1.823233144000369 dtype=torch.int32 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 5.42801834400052 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 3.4600211680008215 dtype=torch.int64 torch.nn.Threshold(0.1, 20)(a), numel() == 70000 for 200000 times 8.562551314000302 torch.nn.Threshold(0.1, 20)(a), numel() == 700000 for 20000 times 9.37924196699987 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27155 Differential Revision: D17790768 Pulled By: VitalyFedyunin fbshipit-source-id: 3281eaff77ddddd658048c9e73824dd68c548591	2019-10-31 17:47:11 -07:00
Alisson Gusatti Azzolini	d3cd64d71d	PyRRef.owner() to return WorkerInfo (#28909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28909 This allows to chain calls on RRef as exemplified in the new test case added. ghstack-source-id: 92996018 Test Plan: unit test. Differential Revision: D18231081 fbshipit-source-id: deeac044ef6d63f18ea241760ac17a3e644cb3d7	2019-10-31 17:11:24 -07:00
Jerry Zhang	59c5de4d0e	Don't permute in quantized::conv2d pattern (#27347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27347 it's already done in the op, we don't need to permute again Test Plan: test_jit.py we'll test in e2e tests Imported from OSS Differential Revision: D18182919 fbshipit-source-id: 04dd2a19a719828fbc7b62e451b81752187e0fcb	2019-10-31 15:58:28 -07:00
Edward Yang	ba6defeb07	Revert D18254898: Revert D18202646: [pytorch][PR] Use aten's GRAIN_SIZE for TH Tensor ops Test Plan: revert-hammer Differential Revision: D18254898 Original commit changeset: df19992db610 fbshipit-source-id: 4da5b3b2c4f6fb8f490a319cce50d619d54af0e1	2019-10-31 14:45:59 -07:00
Pritam Damania	3bba751cd6	Additional autograd unit tests for Python UDFs. (#28824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28824 1) Enhanced autograd unit tests to test the torch.distributed.autograd.backward() API more thoroughly on Python UDFs. 2) Enhanced `python_error` to override `what` such that it returns an appropriate error string if we call `what()` on this error. This ensures we can propagate exceptions over the wire during RPCs (since we get the error string by calling what() on the exception) ghstack-source-id: 92972494 Test Plan: waitforbuildbot Differential Revision: D18195584 fbshipit-source-id: b795daf644ba1816fdec484545192ab55a2f71e7	2019-10-31 14:03:00 -07:00
Jeremy Lilley	579ffb647d	Add HashStore to c10d (#28921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28921 This implementation is quite similar to the HashStore in gloo - an ephemeral in-process store with a lock and unordered_map<>. There are a few tweaks/differences based on c10d vs gloo: - c10d expects add/check methods - c10d get() use cases expect to wait up to super::timeout_ if the value isn't present - c10d set() isn't expected to throw if the value is present. - c10d uses uint8_t vs char It's potentially a better choice for some cases than FileStore when we don't need cross-process access, or care about the backing file. ghstack-source-id: 92992341 Test Plan: buck build mode/dev-nosan caffe2/torch/lib/c10d/... buck-out/dev/gen/caffe2/torch/lib/c10d/HashStoreTest Differential Revision: D18233713 fbshipit-source-id: ab23f3f93d3148c1337f2cc6a8f2aff4aa6549f3	2019-10-31 13:55:22 -07:00
Edward Yang	4654795d13	Revert D18202646: Use aten's GRAIN_SIZE for TH Tensor ops Test Plan: revert-hammer Differential Revision: D18202646 Original commit changeset: ab30e5ef24e6 fbshipit-source-id: df19992db61055541fc0131426421038dea32a48	2019-10-31 13:42:40 -07:00
Mingzhe Li	f63cbf3ae2	change op benchmark forward_only flag (#28967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28967 Change forward_only flag to take True or False so it should be integrated with PEP. Test Plan: ``` [mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only True --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 152.489 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 236.608 [mingzhe0908@devgpu203.prn2 ~/fbsource/fbcode] ~/fbsource/fbcode/buck-out/opt/gen/caffe2/benchmarks/operator_benchmark/pt/add_test.par --forward_only False --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 147.174 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 253.437 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu_bwdall # Input: M: 64, N: 64, K: 64, device: cpu Backward Execution Time (us) : 1044.082 Reviewed By: hl475 Differential Revision: D18247416 fbshipit-source-id: 1c6cff1ac98233d4f0ca298e0cb4a0d3466e5834	2019-10-31 13:28:58 -07:00
Mingzhe Li	fcd6a8252c	add shapes for fill benchmark (#28966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28966 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:fill_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: fill_ # Mode: Eager # Name: fill__N1024_cpu_dtypetorch.int32 # Input: N: 1024, device: cpu, dtype: torch.int32 Forward Execution Time (us) : 2.008 Reviewed By: hl475 Differential Revision: D18241521 fbshipit-source-id: 6eb6e1ab7e8a2f461c6fc537f5bb971d12f594c3	2019-10-31 13:28:49 -07:00
Mingzhe Li	9034762a7d	add more operators to benchmark_all_test (#28968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28968 Add fill and as_strided operators. Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --list_ops # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # List of Operators to run: # round_ # exponential_ # QLinear ... Reviewed By: hl475 Differential Revision: D18241522 fbshipit-source-id: aade1d68a68a660d19d8dfd980eb4d5d0891488b	2019-10-31 13:28:39 -07:00
Vitaly Fedyunin	4bfe2f0900	Fix jit outplace tracing and reapply changes to *_like operators. (#28839 ) Summary: Reapply reverted and fix files `gen_variable_type.py` `test_jit.py` https://github.com/pytorch/pytorch/issues/27891 Cleanup testing of _like operators https://github.com/pytorch/pytorch/issues/27890 Add memory format support to randn_like operator https://github.com/pytorch/pytorch/issues/27889 Add memory format support to randint_like operator https://github.com/pytorch/pytorch/issues/27562 Add memory format support to zeros_like operator https://github.com/pytorch/pytorch/issues/27561 Add memory format support to rand_like operator https://github.com/pytorch/pytorch/issues/27270 Add memory format support to ones_like operator https://github.com/pytorch/pytorch/issues/27262 Add memory format support to full_like operator Pull Request resolved: https://github.com/pytorch/pytorch/pull/28839 Test Plan: Imported from GitHub, without a `Test Plan:` line. buck test mode/dev //language_technology/neural_mt/os/pytorch_translate/test:test_onnx -- 'test_forced_decoder_export_vocab_reduction $language_technology\.neural_mt\.os\.pytorch_translate\.test\.test_onnx\.TestONNX$' Differential Revision: D18203397 Pulled By: VitalyFedyunin fbshipit-source-id: eea41cbd4c232cf5a54172b1e1b16b173798f298	2019-10-31 13:23:08 -07:00
Michael Suo	0e441dd386	flip the "don't inline" switch (#26706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26706 This has been ready for some time, just waiting on services to push with the new code. #forceTDhashing Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D17543304 fbshipit-source-id: baad22f4abc5af724ebde8507e948bee3e8bf6d4	2019-10-31 13:02:32 -07:00
Will Feng	595209bddc	Fix bugs in torch::tensor constructor (#28523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28523 New features: 1. Previously, `torch::tensor({true, false, true})` throws `"tensor_cpu" not implemented for 'Bool'`. After this PR, it produces the correct bool tensor, matching the Python API behavior. 2. Tensors with zero-size dimensions are now supported, e.g. `torch::tensor({{}, {}})` produces a tensor with sizes `{2, 0}`, matching the Python API behavior. BC-breaking bug fixes: 1. Previously, `torch::tensor({{1}, {2}})` produces a tensor of sizes `{2}`. After this PR, it produces a tensor of sizes `{2, 1}`, matching the Python API behavior. 2. Fixed semantics of `torch::tensor(1.1)`: it now returns a 0-dim tensor instead of a 1-dim tensor, matching the Python API behavior. 3. Previously, when passed a non-dtype `TensorOptions` to the `torch::tensor` constructor, it always produces a tensor of dtype `float`. After this PR, it produces tensor of different dtypes based on the dtype of the braced-init-list, matching the behavior of the no-options case. ```cpp // Previously: torch::tensor({1, 2, 3}, torch::TensorOptions(/non-dtype-options/)).dtype() -> float torch::tensor({{1, 2, 3}}, torch::TensorOptions(/non-dtype-options/)).dtype() -> float torch::tensor({1., 2., 3.}, torch::TensorOptions(/non-dtype-options/)).dtype() -> float torch::tensor({{1., 2., 3.}}, torch::TensorOptions(/non-dtype-options/)).dtype() -> float // Now: torch::tensor({1, 2, 3}, torch::TensorOptions(/non-dtype-options/)).dtype() -> int torch::tensor({{1, 2, 3}}, torch::TensorOptions(/non-dtype-options/)).dtype() -> int torch::tensor({1., 2., 3.}, torch::TensorOptions(/non-dtype-options/)).dtype() -> double torch::tensor({{1., 2., 3.}}, torch::TensorOptions(/non-dtype-options/)).dtype() -> double // As comparison, currently: torch::tensor({1, 2, 3}).dtype() -> int torch::tensor({{1, 2, 3}}).dtype() -> int torch::tensor({1., 2., 3.}).dtype() -> double torch::tensor({{1., 2., 3.}}).dtype() -> double ``` Notes: 1. From now on, the behavior of `at::tensor(scalar_value)` (which produces a 1-dim tensor) would be different from `torch::tensor(scalar_value)` (which produces a 0-dim tensor). I will fix the behavior of `at::tensor(scalar_value)` in a follow-up PR. 2. From now on, the behavior of `at::tensor({1, 2, 3}, torch::TensorOptions(/non-dtype-options/))` (which produces a `float` tensor) would be different from `torch::tensor({1, 2, 3}, torch::TensorOptions(/non-dtype-options/))` (which produces a an `int` tensor). I will fix this behavior of `at::tensor` constructor in a follow-up PR. Context for the changes in this PR: The motivation comes from fixing the "`torch::tensor({{1}, {2}})` gives tensor of wrong sizes" bug - in order to fix it, I have to move the handling of `at::ArrayRef` and `std::vector` into `InitListTensor` (see below on why we need to do this) and renamed `InitListTensor` to `TensorDataContainer`. After such changes, support for bool values comes out of the box without extra effort, and support for tensors with zero-size dimensions only requires adding a default constructor for `TensorDataContainer`, so I added those two in this PR. For the semantic change of `torch::tensor(1.1)`, it's actually more effort to preserve the original wrong behavior (i.e. we need to check the sizes of the tensor converted from `TensorDataContainer` and reshape any scalar tensor to a 1-D tensor). I think preserving the original wrong behavior doesn't give us much value, and since the above changes naturally fix the problem, we should just start using the right behavior instead. For the "constructor with non-dtype options behavior" fix, the code looks simpler and easier to reason about with the fix, so I included it in this PR. -------- Why we need to move the handling of `at::ArrayRef` and `std::vector` into `TensorDataContainer`: `torch::tensor({{1}, {2}})` can match this function overload: `torch::tensor(at::ArrayRef<int> values)`, because `{1}` and `{2}` can be treated as a list-initialization of an `int` value. However, this will produce a Tensor with sizes `{2}`, but we actually want a Tensor with sizes `{2, 1}`. In order to avoid matching this function overload, we removed the function overload and moved the ability to convert `at::ArrayRef<T>` (and similarly `std::vector<T>`) into `TensorDataContainer`, and since for braced-init-list the `TensorDataContainer(std::initializer_list<TensorDataContainer>)` constructor is always preferred over all other constructors, it will take the `std::initializer_list` path, and all is good. Test Plan: Imported from OSS Differential Revision: D18234625 Pulled By: yf225 fbshipit-source-id: 0f3f6912e82e2117d2103e31b74e7e97baaa8693	2019-10-31 12:53:06 -07:00
xiaobing.zhang	c8771f5a44	Port mse_lose to ATen (#26529 ) Summary: VitalyFedyunin, This PR is about port mse lose to Aten： Test script: ``` import torch import torch.nn as nn import time def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" loss = nn.MSELoss(reduction = 'sum') if torch.cuda.is_available(): device = "cuda" loss = loss.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) target = torch.randn(128, n, device=device) for i in range(1000): output = loss(input, target) output.backward() #get running time for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) target = torch.randn(128, n, device=device) for i in range(10000): t1 = _time() output = loss(input, target) t2 = _time() output.backward() t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. ### Perfromance: Before: ``` GPU: reduction=’mean’ input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.14 (ms). input size(128, 10000) forward time is 0.12 (ms); backwad avg time is 0.21 (ms). reduction=’sum’ input size(128, 100) forward time is 0.09 (ms); backwad avg time is 0.15 (ms). input size(128, 10000) forward time is 0.11 (ms); backwad avg time is 0.20 (ms). CPU: OMP_NUM_THREADS=56 reduction=’mean’ input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 3.49 (ms); backwad avg time is 3.23 (ms). reduction=’sum’ input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 3.49 (ms); backwad avg time is 3.23 (ms). OMP_NUM_THREADS=1 reduction=’mean’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 1.41 (ms); backwad avg time is 1.66 (ms). reduction=’sum’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 1.44 (ms); backwad avg time is 1.68 (ms). ``` After: ``` GPU: reduction=’mean’ input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.13 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 0.20 (ms). reduction=’sum’ input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.14 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 0.20 (ms). CPU: OMP_NUM_THREADS=56 reduction=’mean’ input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.06 (ms). input size(128, 10000) forward time is 0.14 (ms); backwad avg time is 0.30 (ms). reduction=’sum’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.06 (ms). input size(128, 10000) forward time :qis 0.13 (ms); backwad avg time is 0.30 (ms). OMP_NUM_THREADS=1 reduction=’mean’ input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.85 (ms); backwad avg time is 1.27 (ms). reduction=’sum’ input size(128, 100) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 0.83 (ms); backwad avg time is 1.27 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26529 Differential Revision: D18225144 Pulled By: VitalyFedyunin fbshipit-source-id: ce837a297c70398a3ffa22f26ee9e812cf60d128	2019-10-31 12:37:54 -07:00
David Reiss	42faf961c8	Update fbjni submodule to new upstream and latest version Summary: The central fbjni repository is now public, so point to it and take the latest version, which includes support for host builds and some condensed syntax. Test Plan: CI Differential Revision: D18217840 fbshipit-source-id: 454e3e081f7e3155704fed692506251c4018b2a1	2019-10-31 11:48:25 -07:00
Edward Yang	80b46ca35a	Null AutogradMeta optimization (#28610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28610 The basic idea is, in some cases where we stored a pointer to a full AutogradMeta object, instead store a nullptr. We let a nullptr represent a default-constructed AutogradMeta object, and simply populate it with a real AutogradMeta if there is ever a situation where we need to modify it. The primary technical contrivance in this diff is I have to use AutogradMetaFactory to lazily initialize the AutogradMeta, as it is not available in the dynamic library that TensorImpl is in. (I spent a while trying to put them in the same compilation unit, but gave up in the end as it pushed us over the Windows linking binary size limit. Eep.) Some other notes: - `set_autograd_meta` now unconditionally turns a tensor into a variable. I audited all call sites and observed there are no occurrences where nullptr is passed (after this patch, there are now!) - `copy_tensor_metadata` is updated to unconditionally preserve the VariableTensorId-ness of the destination tensor. I think this is the more correct semantics; we can't do the old semantics anymore. - There's a bunch of places in the API where we return const references to objects. This is pretty weird to me, but I didn't feel like cleaning it up. But sometimes I don't conveniently have something that's the right lifetime, so I introduced a number of singletons to handle this correctly. You might wonder why I'm doing the optimization before the variable-tensor dynamic merge. The reason is simple: this change is semantics preserving, while variable-tensor dynamic merge is not. So it is easier to get right, and prevents us from regressing performance if we do it the other way. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171162 Pulled By: ezyang fbshipit-source-id: 580df729e4d04881b2b9caa0f0c00785b3afbb92	2019-10-31 11:45:16 -07:00
Edward Yang	85e72edf3e	Delete dead TensorImpl::detach_autograd_meta (#28609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28609 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171159 Pulled By: ezyang fbshipit-source-id: 509061ca56186c7762da9634abecbafad0277d94	2019-10-31 11:45:12 -07:00
Edward Yang	b52ceec80b	Remove unused gradient_edge argument from make_variable_view (#28602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28602 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171163 Pulled By: ezyang fbshipit-source-id: 3f3d4cf0bd05c302f502795a04ecace0fc064255	2019-10-31 11:45:07 -07:00
Edward Yang	335bfa24e0	Add an AutogradMeta factory. (#28593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28593 When I turn on Variable everywhere, I will need to be able to construct AutogradMetas from TensorImpl. But I cannot call the constructor directly as it lives in another dynamic library. So I need another virtual factory interface to be able to do this. I also adjust the AutogradMeta constructor so that the TensorImpl argument is optional. This argument is only needed if `requires_grad == True`, as we use it to test if the variable is valid (only floating point tensors can have requires grad true). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171161 Pulled By: ezyang fbshipit-source-id: 3f2e86720899b3bda36ddd90244c2624645cc519	2019-10-31 11:45:03 -07:00
Edward Yang	18f2efa997	Unfriend Variable factory functions. (#28601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28601 In the process, I moved AutogradMeta out of the Variable class. The intent here is that I'm going to delete Variable class entirely, so I had better not be putting stuff in it! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171160 Pulled By: ezyang fbshipit-source-id: 9c0bcdc82797eca0577d1b0745b4a2ae962f3010	2019-10-31 11:44:58 -07:00
Edward Yang	9643f066cf	Move all autograd_meta_ manipulating operations out-of-line. (#28592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28592 These aren't perf critical, and putting them in a cpp file makes it easier to work on them. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171158 Pulled By: ezyang fbshipit-source-id: 4aad434ad4aecba7ed46761f676df6bbec37733e	2019-10-31 11:44:54 -07:00
Edward Yang	a844809a2c	Test TensorTypeSet instead of autograd_meta_ for variable-ness. (#28543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28543 By the current autograd_meta_ <=> type_set_ invariant (now explicitly documented in the right place!), these are equivalent. But when I introduce null autograd_meta_ optimization, they won't be equivalent anymore: TensorTypeSet is going to give me the right information no matter what. In the long run, this patch will be a wash, because everything will "be a variable" in the long term. But I am making this change now to make sure that the invariant actually holds. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18171157 Pulled By: ezyang fbshipit-source-id: cbba8fd5df9e6873a8757925db5f578fecbd2486	2019-10-31 11:44:50 -07:00
svcscm	38388b9b3c	Updating submodules Summary: GitHub commits: `41e219e542` `1a853c0fb4` `727113485b` `0bf264f1fc` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 771f54489bc6680b15df540dbfb789615a1a4a3f	2019-10-31 11:41:01 -07:00
Donald Pinckney	00bd9eae33	Fix typo in `Dataset` and `IterableDataset` docs (#28960 ) Summary: Replaced "overrite" with "overwrite". Pull Request resolved: https://github.com/pytorch/pytorch/pull/28960 Differential Revision: D18246411 Pulled By: soumith fbshipit-source-id: dc0979a44b7c621a316823061760e0358c227727	2019-10-31 11:34:52 -07:00
David Reiss	b1bf595e54	Update generated test model Summary: The Java and Python code were updated, but the test currently fails because the model was not regenerated. Test Plan: Ran test. Reviewed By: xcheng16 Differential Revision: D18217841 fbshipit-source-id: 002eb2d3ed0eaa14b3d7b087b621a6970acf1378	2019-10-31 11:03:20 -07:00
Yinghai Lu	c60bf2704a	Support Offline Tensors through ONNXIFI layer Summary: Previous import was b2ec1a8041879b7be98d81387a14cae895f952f4 Included changes: - [97fe555](https://github.com/houseroad/foxi/commit/97fe555): Add deferred weight reader pointer when initializing the graph (#15) <Yinghai Lu> - [ba2faf7](https://github.com/houseroad/foxi/commit/ba2faf7): Add status and timeout to events (#14) <Jack Montgomery> Test Plan: kicksandcastle Reviewed By: ipiszy Differential Revision: D18231697 fbshipit-source-id: 7566e2438d2b57f0feaadcd51f55a03552adeab9	2019-10-31 10:33:42 -07:00
Rohan Varma	05e88dc4fe	skip additional flaky rpc tests (#28934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28934 These tests are flaky, skip them as we investigate for a root cause ghstack-source-id: 92945898 Test Plan: tests pass Differential Revision: D18235766 fbshipit-source-id: 9bff65653954b767e32bcc1d25c65b0cea2c4331	2019-10-31 10:12:59 -07:00
Nikolay Korovaiko	275adb143e	fix printing a node header (a kind wasn't being printed) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28887 Differential Revision: D18226435 Pulled By: Krovatkin fbshipit-source-id: b8edf8bb52ff45ab625ccedf66263d3ab5895faa	2019-10-31 09:55:02 -07:00
Ailing Zhang	fca99e96e8	Move cuda functions to cuda folder. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28818 Differential Revision: D18232782 Pulled By: ailzhang fbshipit-source-id: 936a0635bccc7c759bbbff438f43f3812e34faed	2019-10-31 09:41:56 -07:00
Soumith Chintala	c63e15aef8	Revert D18241759: Test Plan: revert-hammer Differential Revision: D18241759 Original commit changeset: 8f2535bb0bc4 fbshipit-source-id: 870ac8e860e31f32138d42d470321e225a19990d	2019-10-31 07:54:26 -07:00
Alban Desmaison	1dcf1b8938	Update pinverse doc for recent commit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28877 Differential Revision: D18225510 Pulled By: albanD fbshipit-source-id: 698af06ac9e4259eed93d146edb3a7fb13e39242	2019-10-31 07:36:35 -07:00
Peter Bell	fe8804695b	Use aten's GRAIN_SIZE for TH Tensor ops (#28770 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28198 in my tests on a 24 core AMD threadripper. Profiling the benchmark showed that most of the slowdown in https://github.com/pytorch/pytorch/issues/28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size. Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before. Here are the timing results I get: \| Version \| Full iteration time \| `index_select` \| `mm` \| `addmm` \| \|:----------:\|---------------:\|-------------:\|---------:\|---------:\| \| master \| 3505.85 ms/it \| 184.302 ms \| 9.520 ms \| 8.494 ms \| \| no scaling \| 3453.18 ms/it \| 184.456 ms \| 5.810 ms \| 5.069 ms \| \| this PR \| 3453.23 ms/it \| 184.526 ms \| 5.824 ms \| 5.202 ms \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/28770 Differential Revision: D18202646 Pulled By: ezyang fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854	2019-10-31 07:18:46 -07:00
Evgeny Fiksman	9630b78c49	Pow() : Use lightweight operations for predefined scalar exponent values (#28903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28903 Use of predefined and less compute intensive functions instead of pow() for predefined scalar exponent values. Test Plan: automated tests Reviewed By: jspark1105 Differential Revision: D18227280 fbshipit-source-id: 0a443832c3ff8372e64dbe04de4f7fb4ce7c0740	2019-10-31 05:39:39 -07:00
Jie	1b1e3d565c	(#28927 ) Summary: This is to fix https://github.com/pytorch/pytorch/issues/22526 Adding limitation on launch config for grid sizes as well, previous code is asking to launch blocks more than what's supported by the hardware; Test added in test_cuda; Pull Request resolved: https://github.com/pytorch/pytorch/pull/28927 Differential Revision: D18241759 Pulled By: soumith fbshipit-source-id: 8f2535bb0bc4ea7998024b137576a38067668999	2019-10-31 01:00:47 -07:00
Michael Suo	9fb0079036	merge some of the lint checks (#28933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28933 Merge all the things that don't add annotations into a single `quick-checks` job. This helps reduce concurrency and clutter at the top of the status check page. This doesn't touch the actually important items (flake8 + clang-tidy), but those are a little trickier to handle because of how annotations are added. Test Plan: Imported from OSS Differential Revision: D18235396 Pulled By: suo fbshipit-source-id: 8fba44f3f5d398b1dce0f39f51d6652f3e0c1bf7	2019-10-30 23:02:34 -07:00
Tongzhou Wang	d071ca2972	Improve reshape backward when the op is a view (#28901 ) Summary: Currently, `reshape` does an `as_strided` when the geometry is viewable. However, `as_strided` backward is not very optimized, and can not always detect such cases. Improvements are planned at https://github.com/pytorch/pytorch/pull/8965, and I will finish it some day. But the current situation is that in these cases backward through `reshape` will copy gradient while a simple `view` will not. This is unnecessary. Notably this affects `flatten` and a whole bunch of other ops implemented on top of `reshape`. ```py In [15]: x = torch.randn(3, 4, requires_grad=True) In [16]: y = x.reshape(x.shape) In [17]: assert y._base is not None In [18]: gy = torch.randn_like(y) In [20]: gx = torch.autograd.grad(y, x, gy)[0] In [21]: gx Out[21]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) In [22]: gx._base # not gy Out[22]: tensor([ 0.2189, 0.3396, -0.1108, 1.7703, 1.0737, -0.1222, 1.0765, -1.3363, -1.3798, -0.2950, 0.0800, 0.2501]) In [23]: gy.zero_() Out[23]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [24]: gx # not sharing storage with gy Out[24]: tensor([[ 0.2189, 0.3396, -0.1108, 1.7703], [ 1.0737, -0.1222, 1.0765, -1.3363], [-1.3798, -0.2950, 0.0800, 0.2501]]) # but everything is optimized with view, which should be equivalent with reshape in this case In [25]: y = x.view(x.shape) In [26]: assert y._base is not None In [27]: gy = torch.randn_like(y) In [28]: gx = torch.autograd.grad(y, x, gy)[0] In [29]: gx Out[29]: tensor([[-2.4463, 1.1446, 0.1501, 0.1212], [-1.1125, 1.4661, 0.9092, -0.2153], [-0.1937, -0.3381, -1.3883, -0.7329]]) In [30]: gy.zero_() Out[30]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) In [31]: gx # sharing storage with gy Out[31]: tensor([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28901 Differential Revision: D18240868 Pulled By: ezyang fbshipit-source-id: 28fdaa0c7014a9dae6731dfe8b67784d38fc27f0	2019-10-30 22:38:41 -07:00
Owen Anderson	47301a153b	Eliminate unnecessary Tensor ref count bumps. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28773 Differential Revision: D18229349 fbshipit-source-id: 4d0bc22ae827d8f207a08f9f08d8fe13ad700656	2019-10-30 21:14:13 -07:00
Shen Li	64c7ac233e	Disable flaky remote tests in dist_autograd_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28920 Test Plan: Imported from OSS Differential Revision: D18233625 Pulled By: mrshenli fbshipit-source-id: d4b04ea3629d0828756ebb118f5763677d62729b	2019-10-30 18:43:10 -07:00
Edward Yang	fd5c68b5e4	Revert D18231741: Enable PyTorch Probot as a GitHub Action. Test Plan: revert-hammer Differential Revision: D18231741 Original commit changeset: d49711ad41d7 fbshipit-source-id: f390ec3ca8c55bfc308d8eacad5dd7dfae36500e	2019-10-30 18:24:40 -07:00
Mingzhe Li	5e94e66c6f	unify unary ops benchmark (#28913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28913 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:unary_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: abs # Mode: Eager # Name: abs_M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 90.233 ... Reviewed By: hl475 Differential Revision: D18231641 fbshipit-source-id: 3093db47d0356b927768f15dc63af6ad8aadd430	2019-10-30 17:46:13 -07:00
Mingzhe Li	2ffc4cca67	unify split benchmark (#28912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28912 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:split_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 3.434 Reviewed By: hl475 Differential Revision: D18231542 fbshipit-source-id: 84898db55996aa3faf156d4fb14f32d6db780e7a	2019-10-30 17:46:09 -07:00
Mingzhe Li	94d2599d77	unify softmax benchmark (#28911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28911 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Softmax # Mode: Eager # Name: Softmax_N4_C3_H256_W256_cpu # Input: N: 4, C: 3, H: 256, W: 256, device: cpu Forward Execution Time (us) : 17929.381 ... Reviewed By: hl475 Differential Revision: D18231517 fbshipit-source-id: 61f35849e1f4cf44cf09e60a7b618f8e9fc67b9c	2019-10-30 17:46:05 -07:00
Mingzhe Li	ed4a978d79	unify pool benchmark (#28898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28898 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:pool_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: MaxPool1d # Mode: Eager # Name: MaxPool1d_kernel3_stride1_N8_C256_L256_cpu # Input: kernel: 3, stride: 1, N: 8, C: 256, L: 256, device: cpu Forward Execution Time (us) : 7133.492 Reviewed By: hl475 Differential Revision: D18228351 fbshipit-source-id: 47af93d5dd3776384f89b1289fbbe01c572ba9fc	2019-10-30 17:46:01 -07:00
Mingzhe Li	f5e99b3249	unify matmul benchmark (#28899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28899 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:matmul_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: matmul # Mode: Eager # Name: matmul_M128_N128_K128_trans_aTrue_trans_bFalse_cpu # Input: M: 128, N: 128, K: 128, trans_a: True, trans_b: False, device: cpu Forward Execution Time (us) : 39.535 Reviewed By: hl475 Differential Revision: D18228271 fbshipit-source-id: 681ed2745c25a122997346a23acdbc67e55e5ec4	2019-10-30 17:45:57 -07:00
Zafar Takhirov	28be2d4994	Better error message for quantized dispatch (#28635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28635 Fixes #28518 Test Plan: Imported from OSS Differential Revision: D18132566 Pulled By: z-a-f fbshipit-source-id: 08acc3033b12a0b79b43a5346b7af100416ffa94	2019-10-30 16:51:22 -07:00
Mingzhe Li	6e1c18303b	unify linear benchmark (#28897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28897 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:linear_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N4_IN256_OUT128_cpu # Input: N: 4, IN: 256, OUT: 128, device: cpu Forward Execution Time (us) : 39.275 Reviewed By: hl475 Differential Revision: D18228070 fbshipit-source-id: 9c209eb74e574c6ef85ebcd78b824ef7d5e65dde	2019-10-30 16:25:48 -07:00
Mingzhe Li	a7b235f968	unify gather benchmark (#28895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28895 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Conv1d # Mode: Eager # Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu # Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu Forward Execution Time (us) : 208.936 Reviewed By: hl475 Differential Revision: D18227757 fbshipit-source-id: 493dd81108848fe3d48fb5ad940eb6aef84b639c	2019-10-30 16:25:43 -07:00
Mingzhe Li	6e4147c72c	unify conv benchmark (#28894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28894 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:conv_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Conv1d # Mode: Eager # Name: Conv1d_in_c256_out_c256_kernel3_stride1_N1_L64_cpu # Input: in_c: 256, out_c: 256, kernel: 3, stride: 1, N: 1, L: 64, device: cpu Forward Execution Time (us) : 208.936 Reviewed By: hl475 Differential Revision: D18227626 fbshipit-source-id: 1ae768f529aa888415840ca10197323407e47d39	2019-10-30 16:25:39 -07:00
Mingzhe Li	dbf8f535fc	unify chunk benchmark (#28892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28892 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:chunk_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: chunks # Mode: Eager # Name: chunks_M256_N512_chunks2_cpu # Input: M: 256, N: 512, chunks: 2, device: cpu Forward Execution Time (us) : 4.098 Reviewed By: hl475 Differential Revision: D18227499 fbshipit-source-id: 72268b7fe94a7d92d6e47f58f33902a33367c68b	2019-10-30 16:25:35 -07:00
James Reed	15deee25bc	Fix aten::format regex for clang8 (#28916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28916 The previous regex caused a `std::regex_error` under clang8 complaining about `error_brack`, which is strange because the square brackets are balanced. Seems like a stdlib bug to me. So to workaround this, I've switched to the older regex with a non-greedy match in the inner atom Test Plan: Imported from OSS Differential Revision: D18232654 Pulled By: jamesr66a fbshipit-source-id: f82a9a24acf090010b03f23454d2b0f7a1e3589e	2019-10-30 16:14:46 -07:00
Mingzhe Li	88b2bfd706	unify cat benchmark (#28893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28893 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:cat_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim0_cpu # Input: M: 256, N: 512, K: 1, dim: 0, device: cpu Forward Execution Time (us) : 78.607 Reviewed By: hl475 Differential Revision: D18227341 fbshipit-source-id: d383709a5aab600f99b37d07e4d4393645289101	2019-10-30 15:53:37 -07:00
Mingzhe Li	aa30b37d2e	unify batchnorm benchmark (#28889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28889 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:batchnorm_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: batchnorm # Mode: Eager # Name: batchnorm_M1_N256_K3136_cpu # Input: M: 1, N: 256, K: 3136, device: cpu Forward Execution Time (us) : 276.192 Reviewed By: hl475 Differential Revision: D18227180 fbshipit-source-id: d8abe56237bb84903315332a5ecdaa1dff613110	2019-10-30 15:53:33 -07:00
Mingzhe Li	740474838f	unify as_strided benchmark (#28890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28890 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:as_strided_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0_cpu # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0, device: cpu Forward Execution Time (us) : 2.792 ... Reviewed By: hl475 Differential Revision: D18227052 fbshipit-source-id: e17d9335ec89b47706a363bdb31451a01d4cbc5b	2019-10-30 15:53:29 -07:00
Mingzhe Li	db15c2ba20	unify add benchmark format (#28891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28891 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 125.279 ... Reviewed By: hl475 Differential Revision: D18226789 fbshipit-source-id: 0cc51c6691533b02f662d4b6108916455f3a5b95	2019-10-30 15:53:25 -07:00
Pavel Belevich	d6f1e49c4a	C++ API parity: CTCLoss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28654 Test Plan: Imported from OSS Differential Revision: D18202437 Pulled By: pbelevich fbshipit-source-id: a4b80a57e65da84f3988002a026c648fa52a0fde	2019-10-30 14:35:02 -07:00
Andreas Koepf	2466dc8544	Migrate nll_loss from TH to ATen (CPU) (#28270 ) Summary: This is a port of the negative log likelihood TH loss implementation to ATen which is used by `torch.nn.functional.nll_loss()` for 2d inputs (N, C). ## Performance Impact I measured no significant performance-difference of the port compared to the original implementation when using this [benchmark test script](https://gist.github.com/andreaskoepf/3c8e3698607773db2788dfd8885a9ed9). ### WITH PR applied: ``` CPU forward 1000 took 2.5290995836257935e-05 CPU forward 10000 took 5.757302278652787e-05 CPU forward 100000 took 0.0004873779835179448 CPU forward 1000000 took 0.0051894880016334355 CPU forward 10000000 took 0.026263039995683357 CPU forward TOTAL time 0.8068871730065439 CPU for- & backward 1000 took 0.00018794499919749796 CPU for- & backward 10000 took 0.0002642899926286191 CPU for- & backward 100000 took 0.0011828370043076575 CPU for- & backward 1000000 took 0.01250307000009343 CPU for- & backward 10000000 took 0.11453165800776333 CPU for- & backward TOTAL time 0.824805997981457 ``` ### Original TH version: ``` CPU forward 1000 took 2.1958985598757863e-05 CPU forward 10000 took 6.608400144614279e-05 CPU forward 100000 took 0.0004632119962479919 CPU forward 1000000 took 0.005477247992530465 CPU forward 10000000 took 0.02681165697867982 CPU forward TOTAL time 0.8073387439944781 CPU for- & backward 1000 took 0.00020634100656025112 CPU for- & backward 10000 took 0.00031720998231321573 CPU for- & backward 100000 took 0.0011843869870062917 CPU for- & backward 1000000 took 0.010876987013034523 CPU for- & backward 10000000 took 0.09893897600704804 CPU for- & backward TOTAL time 0.8271351839939598 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28270 Differential Revision: D18009584 Pulled By: ezyang fbshipit-source-id: 77daf47c61a9dd9bb3b5a8d3e48585bbb665e979	2019-10-30 14:12:09 -07:00
peter	732a3d8f8c	Fix `UNICODE` conflict on Windows (#28782 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27568. cc IlyaOvodov. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28782 Differential Revision: D18201449 Pulled By: ezyang fbshipit-source-id: 404e7c0abdfeef52a0e81ab2acd1b61e86c28f39	2019-10-30 14:09:31 -07:00
Edward Yang	e3a24ba6d5	Enable PyTorch Probot as a GitHub Action. (#28879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28879 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18231741 Pulled By: ezyang fbshipit-source-id: d49711ad41d7ff7e527326c68fd8db86da10a818	2019-10-30 13:59:23 -07:00
Alban Desmaison	f5edb62a7f	Clean extending autograd doc for output size 1 (#28860 ) Summary: Fix https://github.com/pytorch/pytorch/issues/28583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28860 Differential Revision: D18224497 Pulled By: albanD fbshipit-source-id: 0fa4eacce6f6092d555e509dc23bd75206f78d41	2019-10-30 13:57:10 -07:00
Xinyi Zhang	5821b9bf0f	Remove error logging of high empty range ratio Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28854 Reviewed By: xianjiec Differential Revision: D18206695 fbshipit-source-id: 4ce471f0236b2ceaf54ba1b1ce96e193feca720b	2019-10-30 12:55:25 -07:00
jon-tow	1d3d9ec7d4	C++ API Parity: `functional::fold` and `Fold::pretty_print` (#28732 ) Summary: Adds `torch::nn::functional::fold` support and updates `Fold::pretty_print` in the C++ API for more thorough Python parity. Note: Small updates in source files to maintain consistency elsewhere. Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28732 Differential Revision: D18219955 Pulled By: yf225 fbshipit-source-id: fd2e9be8f17db77c1b1f384c0d2e16cc34858c0c	2019-10-30 11:37:39 -07:00
Shen Li	807fbf8816	Disable flaky tests in dist_autograd_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28876 Test Plan: Imported from OSS Differential Revision: D18224445 Pulled By: mrshenli fbshipit-source-id: 4de2c24ac6e9ffb004457e2dc43730dc7e478e5a	2019-10-30 11:34:35 -07:00
mansoorcheema	a465b033fd	Local response norm (#28759 ) Summary: Implemented LocalResponseNorm and some initial tests for modules and functional. Reference https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28759 Differential Revision: D18219745 Pulled By: yf225 fbshipit-source-id: e6aad568a8b1e81f54752decaefd4f9044029da9	2019-10-30 11:31:00 -07:00
Gerard Goossen	3073785f4c	Fix when giving jit format warning about unsupported options (#28616 ) Summary: Current reges also matches strings with '{}' so warning is always given. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28616 Test Plan: Previous code was giving a warning about unspported options, these disappeared. When adding something inside '{}' the warning came back. Differential Revision: D18039443 Pulled By: ggoossen fbshipit-source-id: bb3a2892d5707a32030d43250c40f3058aa1d18b	2019-10-30 11:24:14 -07:00
qzhong0605	50fd20b64a	fix bug on setup.py to include header files on caffe2/utils/math (#28869 ) Summary: This problem is from issue [https://github.com/pytorch/pytorch/issues/28753](https://github.com/pytorch/pytorch/issues/28753) The header files on directories`math` and `threadpool` should be included on the built package because they are included on the other header files, such as on file `torch/include/caffe2/utils/math.h` ``` #include "caffe2/core/common.h" #include "caffe2/core/types.h" #include "caffe2/utils/math/broadcast.h" #include "caffe2/utils/math/elementwise.h" #include "caffe2/utils/math/reduce.h" #include "caffe2/utils/math/transpose.h" #include "caffe2/utils/math/utils.h" ``` But the `setup.py` doesn't include the header files on `master` branch. The header files on `utils` directory of a built `torch` package are the following: ``` > ls include/caffe2/utils bench_utils.h conversions.h eigen_utils.h map_utils.h murmur_hash3.h proto_wrap.h smart_tensor_printer.h cast.h cpuid.h filler.h math-detail.h proto_convert.h signal_handler.h string_utils.h cblas.h cpu_neon.h fixed_divisor.h math.h proto_utils.h simple_queue.h zmq_helper.h ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28869 Differential Revision: D18226319 Pulled By: soumith fbshipit-source-id: 51575ddc559181c069b3324aa9b2d1669310ba25	2019-10-30 11:11:15 -07:00
Jeremy Lilley	331e09eca4	Make FileStore not segfault with concurrent accesses. (#28812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28812 FileStore isn't thread-safe. We've observed a few FB unittests already using this class in an unsafe manner. This change enforces at most a single concurrent use of the various file options, from this specific Store instance. This protects the cache_, pos_, and the relative integrity of the operations. An alternative would be simply to explicitly document this class as non-thread-safe, though perhaps not everybody will read the warning. ghstack-source-id: 92874098 Test Plan: buck test mode/dev-nosan caffe2/... Actual observed failures were in ThreadRpcAgentTest Differential Revision: D18187821 fbshipit-source-id: 67c765da74c836a9ac9f887cdf1a28a75247e04b	2019-10-30 11:03:00 -07:00
Hong Xu	e0009fdeb1	Migrate `sinh` and `sinh_` from the TH to Aten (CUDA) (#28527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28527 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.sinh(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.sinh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.sinh(a) a.numel() == 10000 for 20000 times torch.half 0.3807680979998622 torch.sinh(a) a.numel() == 10000 for 20000 times torch.float 0.37430476099962107 torch.sinh(a) a.numel() == 10000 for 20000 times torch.double 1.0580407639999976 torch.sinh(a) a.numel() == 100000 for 20000 times torch.half 0.7996397469996737 torch.sinh(a) a.numel() == 100000 for 20000 times torch.float 1.010930432999885 torch.sinh(a) a.numel() == 100000 for 20000 times torch.double 7.310400856999877 ``` After: ``` torch.sinh(a) a.numel() == 10000 for 20000 times torch.half 0.3720399889998589 torch.sinh(a) a.numel() == 10000 for 20000 times torch.float 0.3694016069994177 torch.sinh(a) a.numel() == 10000 for 20000 times torch.double 1.0551542660004998 torch.sinh(a) a.numel() == 100000 for 20000 times torch.half 0.7431191599998783 torch.sinh(a) a.numel() == 100000 for 20000 times torch.float 0.9953043630002867 torch.sinh(a) a.numel() == 100000 for 20000 times torch.double 7.3146168890007175 ``` Close #24628 Test Plan: Imported from OSS Differential Revision: D18124732 Pulled By: VitalyFedyunin fbshipit-source-id: 054b0c0884ac12de2dd1a92c5de916aaf047f9e9	2019-10-30 10:57:14 -07:00
Hong Xu	a7166ae448	Migrate `asin` and `asin_` from the TH to Aten (CUDA) (#28482 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28482 Benchmark (RHEL 7.3, Release, P1000, gcc 8.3): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.asin(a) a.numel() == 10000 for 20000 times torch.half 0.475854377997166 torch.asin(a) a.numel() == 10000 for 20000 times torch.float 0.4772826389998954 torch.asin(a) a.numel() == 10000 for 20000 times torch.double 0.6297428649995709 torch.asin(a) a.numel() == 100000 for 20000 times torch.half 0.5475849750000634 torch.asin(a) a.numel() == 100000 for 20000 times torch.float 0.6156488769993302 torch.asin(a) a.numel() == 100000 for 20000 times torch.double 2.728912709000724 ``` After: ``` torch.asin(a) a.numel() == 10000 for 20000 times torch.half 0.5107104659982724 torch.asin(a) a.numel() == 10000 for 20000 times torch.float 0.509122366001975 torch.asin(a) a.numel() == 10000 for 20000 times torch.double 0.6929216960015765 torch.asin(a) a.numel() == 100000 for 20000 times torch.half 0.5914848840002378 torch.asin(a) a.numel() == 100000 for 20000 times torch.float 0.6518679289983993 torch.asin(a) a.numel() == 100000 for 20000 times torch.double 2.916458261999651 ``` Close #24537 Test Plan: Imported from OSS Differential Revision: D18089074 Pulled By: VitalyFedyunin fbshipit-source-id: f27515dd1ee73b6e2391ebcc0004af28bcb82234	2019-10-30 10:57:10 -07:00
Hong Xu	d0bd8a3afc	Migrate `sin` and `sin_` from the TH to Aten (CUDA) (#28237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28237 Benchmark (RHEL 7, gcc 8.3.1, P1000): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4649172620011086 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.4616892600006395 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5166665920005471 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5376560490003612 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6207812359989475 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.873208982999131 ``` After: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4796977340010926 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.48329569199995603 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5380683220009814 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5299932739999349 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6144487999990815 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.8838113630008593 ``` Close #24627 Test Plan: Imported from OSS Differential Revision: D18089072 Pulled By: VitalyFedyunin fbshipit-source-id: 4824804960309fe7fdb16073d021388704986993	2019-10-30 10:57:06 -07:00
Thomas Viehmann	2526f97464	Include hierarchy information in C++ API loading error messages (#28499 ) Summary: Before, we would only give the key we are looking for (i.e. typically just "No such serialized tensor 'weight'", no matter for which submodule we were looking for a weight. Now we error with "No such serialized tensor '0.conv1.weight'" or similar. The analogous information is added to missing module error messages. I threw in a test, and it saved me already... Pull Request resolved: https://github.com/pytorch/pytorch/pull/28499 Differential Revision: D18122442 Pulled By: yf225 fbshipit-source-id: a134b6d06ca33de984a11d6fea923244bcd9fb95	2019-10-30 08:41:37 -07:00
Edward Yang	726f0ce946	Increase verbosity of Hypothesis on CI. (#28799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28799 When the verbosity is quiet, hypothesis no longer prints the real error when it finds multiple falsifying examples: it just says that there are two failures. This is supremely unuseful. Make it print more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18206936 Pulled By: ezyang fbshipit-source-id: 03bb60ba24cee28706bb3d1f0858c32b6743a109	2019-10-30 08:28:20 -07:00
Benny Chen	496f740824	Connect with clip range gather operator (#28866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28866 When we are working on the fix for int32 instead of int64, we also need to take care of the ClipRangesGatherSigridHash since this is the operator that actually gets used during inference. Test Plan: Added unittest to cover for the new case Reviewed By: ipiszy Differential Revision: D17147237 fbshipit-source-id: 2b562b72a6ae8f7282e54d822467b8204fb1055e	2019-10-29 23:32:08 -07:00
Jerry Zhang	eb00af37bd	insert_prepack_unpack for conv (#27346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27346 att Test Plan: test_jit.py Imported from OSS Differential Revision: D18182915 fbshipit-source-id: d646ae76ce44f5d12e974c776a3e92e5e163493c	2019-10-29 22:03:07 -07:00
Yinghai Lu	790563b374	Add OfflineTensor (#28855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28855 Resubmit: OfflineTensor will be a shell to just carry the shape and dtype. No data will be stored. This should help us plumb through the onnxifi process. Test Plan: ``` buck test caffe2/caffe2/fb/opt:onnxifi_with_offline_tensor_test ``` Reviewed By: ipiszy, ChunliF Differential Revision: D18212824 fbshipit-source-id: 5c8aaed2ef11d719dfa2a2901875efd66806ea56	2019-10-29 21:59:57 -07:00
svcscm	a8b63cacbc	Updating submodules Summary: GitHub commits: `4b2da87ee6` `b997eec151` `9f34d1f643` `a3960fc875` `541c404784` `b2438faaf0` `06335bac7c` `2ac6f45e20` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: a6fb756d2d210d0505c889ba6c0e207e6a2d074d	2019-10-29 19:54:36 -07:00
Shen Li	043530a9b9	Support remote for Python UDF in distributed autograd Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28656 Test Plan: Imported from OSS Differential Revision: D18138561 Pulled By: mrshenli fbshipit-source-id: 798e7c00465b5a299f7b4642683bc407895bc7da	2019-10-29 19:39:04 -07:00
Shen Li	400293fcc6	Support remote for builtin operators in distributed autograd (#28630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28630 This includes: 1. Respect autograd context in rpc.remote for builtin ops 2. Force setting autograd context in RRef.to_here() even if the message for to_here() does not contain any tensor. Test Plan: Imported from OSS Differential Revision: D18138562 Pulled By: mrshenli fbshipit-source-id: a39ec83e556d19130f22eb317927241a017000ba	2019-10-29 19:39:00 -07:00
vishwakftw	ec81cd55fc	Migrate implementations of triu and tril to a separate file (#28750 ) Summary: Having them in BatchLinearAlgebra.cpp/.cu seemed out of place, since they are more general purpose and this code was interspersed between LAPACK and MAGMA wrappers as well. Changelog: - Move tril* / triu* to TriangularOps.cpp/.cu Pull Request resolved: https://github.com/pytorch/pytorch/pull/28750 Test Plan: - Builds should complete successfully to ensure that the migration is error-free - Tests should pass to ensure the methods that the front-end is unaffected. Differential Revision: D18205456 Pulled By: soumith fbshipit-source-id: 41966b9ddfe9f196f4d7c6a5e466782c1985d3d9	2019-10-29 19:24:05 -07:00
Jerry Zhang	1c436ded44	Remove `test_quantizer.py` and reuse one of its test in `test_quantization.py` (#27269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27269 Remove `test_quantizer.py`, add and rewrite one of the tests in `test_quantizer` in `test_quantization.py` The conv test is removed for now since conv pattern is still broken, we'll add another test later ghstack-source-id: 92869823 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18182916 fbshipit-source-id: 325b5d8e877228d6a513e3ddf52c974479250d42	2019-10-29 19:04:21 -07:00
mrsalehi	dfe7b25eaf	Add nn::Flatten to C++ Frontend (#28072 ) Summary: Adds torch::nn::Flatten module support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28072 Differential Revision: D18202778 Pulled By: yf225 fbshipit-source-id: 43345dcbdf2f50d75746bf9a0ba293b84df275ab	2019-10-29 17:52:47 -07:00
Zafar Takhirov	57c9b1cefc	Enabling inplace relu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28710 Test Plan: Imported from OSS Differential Revision: D18146120 Pulled By: z-a-f fbshipit-source-id: d8f0982f5a2ae35f7deb34e67cdb64be700a9d6c	2019-10-29 17:33:48 -07:00
nuka137	cbc234bceb	C++ API: torch::nn::BatchNorm1d (#28176 ) Summary: Add torch::nn::BatchNorm1d function/module support for the C++ API. torch::nn::BatchNorm{2,3}d will be added after this PR is merged. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 I would like to discuss about below items. * Necessity of `num_batches_tracked` in `BatchNormImplBase` * `num_batches_tracked` is needed to calculate `momentum` when we do not feed `momentum` argument in Python API. But in C++ API, `momentum` argument has a default value. * `num_batches_tracked` is only used for counting up `BatchNorm1d::foward()` call. I think it is no necessary for user anymore. * The design of `BatchNorm{1,2,3}dOptions` * We have already `BatchNormOptions` used for deprecated `BatchNorm` module. However, it is hard to use it for `BatchNorm{1,2,3}dOptions` because of the arguments disagreement of each modules. * In this PR, I introduce `BatchNormOptionsv2` template class for the `BatchNorm{1,2,3}dOptions`. But I'm not sure this design is good or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28176 Differential Revision: D18196843 Pulled By: yf225 fbshipit-source-id: 667e2b5de4150d5776c41b9088c9e6c2ead24cd4	2019-10-29 17:29:42 -07:00
Shihao Xu	8f1564b8ab	Add enum type to rpc registry for consolidating RPC initialization code path (#28628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28628 Consolidate code paths of ProcessGroupAgent construction and other RPC Backend construction. ghstack-source-id: 92845348 Differential Revision: D5516188 fbshipit-source-id: 151d9b7b74f68631d6673fecc74dec525949b8f0	2019-10-29 17:26:15 -07:00
Jianyu Huang	b1ea19ca17	Update the misleading comments for zero_points and scale in dynamic quant linear module [1/2] (#28767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28767 The scale and zero_point are for the output activation tensor, not for the weight tensor. We removed them here because we don't need the zero points and scales for the output tensors in dynamic quantization. ghstack-source-id: 92807318 Test Plan: CI Differential Revision: D18164949 fbshipit-source-id: 0f9172bfef615c30dc28e1dd4448a9f3cc897c2e	2019-10-29 17:20:32 -07:00
Nikolay Korovaiko	4e56455b09	whitelist autogradanynonzero (#28852 ) Summary: prim::AutogradAnyNonZero is optimized away under normal circumstances (a graph executor specializes tensor arguments and runs `specializeAutogradZero`), so the change should be backward compatible for as long as we are running the original executor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28852 Differential Revision: D18213118 Pulled By: Krovatkin fbshipit-source-id: 223f172c59e5f2b05460db7de98edbadc45dd73d	2019-10-29 17:00:27 -07:00
Will Feng	f1f86994bc	Fix implementation of F::kl_div / F::mse_loss / F::binary_cross_entropy Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28806 Test Plan: Imported from OSS Differential Revision: D18202859 Pulled By: yf225 fbshipit-source-id: 1aa19111cd5111dd5f2779f7f00f07f2f2e16d4d	2019-10-29 16:54:27 -07:00
Jerry Zhang	d201ff8925	Factor out insertPrepackUnpackForLinear (#27239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27239 att Test Plan: python test/test_jit.py 'TestJit.test_insert_prepack_unpack' Imported from OSS Differential Revision: D18182913 fbshipit-source-id: 7cbaac9159520d9e873079d10bf80764f2ec27ae	2019-10-29 16:06:16 -07:00
David Reiss	80e270a76c	Add support for host build to pytorch_android native code (#27664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27664 When ANDROID_ABI is not set, find libtorch headers and libraries from the LIBTORCH_HOME build variable (which must be set by hand), place output under a "host" directory, and use dynamic linking instead of static. This doesn't actually work without some local changes to fbjni, but I want to get the changes landed to avoid unnecessary merge conflicts. Test Plan: Imported from OSS Differential Revision: D18210315 Pulled By: dreiss fbshipit-source-id: 685a62de3c2a0a52bec7fd6fb95113058456bac8	2019-10-29 16:04:18 -07:00
David Reiss	34455c68b5	Remove unnecessary BUILD_DIR variable in Android CMake build (#27663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27663 CMake sets CMAKE_BINARY_DIR and creates it automatically. Using this allows us to use the -B command-line flag to CMake to specify an alternate output directory. Test Plan: Imported from OSS Differential Revision: D18210316 Pulled By: dreiss fbshipit-source-id: ba2f6bd4b881ddd00de73fe9c33d82645ad5495d	2019-10-29 16:04:13 -07:00
David Reiss	c9423c30b3	Add host build for pytorch_android (#27662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27662 This adds a new gradle subproject at pytorch_android/host and tweaks the top-level build.gradle to only run some Android bits on the other projects. Referencing Java sources from inside the host directory feels a bit hacky, but getting host and Android Gradle builds to coexist in the same directory hit several roadblocks. We can try a bigger refactor to separate the Android-specific and non-Android-specific parts of the code, but that seems overkill at this point for 4 Java files. This doesn't actually run without some local changes to fbjni, but I want to get the files landed to avoid unnecessary merge conflicts. Test Plan: Imported from OSS Differential Revision: D18210317 Pulled By: dreiss fbshipit-source-id: dafb54dde06a5a9a48fc7b7065d9359c5c480795	2019-10-29 16:04:09 -07:00
Huayu Li	793e2914e4	Support full id interations (#28769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28769 Support full id interaction. Test Plan: * unit-tests * buck test caffe2/caffe2/python/operator_test:pack_ops_test -- * buck test caffe2/caffe2/fb/dper/layer_models/tests:sparse_nn_attention_test -- test_sparse_nn_full_id * canary * apply SUM + full id with max_length as 20 on SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID: f147253340 (v1: f146340704) # of embeddings for this features is 20: {F219139816} The corresponding ops: two lookups, which is as expected. ``` op { input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_0/Repeat_0/sparse_lookup/w" input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:values" input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:lengths" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_0/Repeat_0/sparse_lookup/output" name: "" type: "SparseLengthsSum" } op { input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/w" input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:values" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/output" name: "" type: "Gather" } op { input: "feature_preproc/output_features:SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM:lengths" input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/sparse_lookup/output" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/PackSegments/embedding_packed" name: "" type: "PackSegments" arg { name: "max_length" i: 20 } arg { name: "pad_minf" i: 0 } } op { input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/PackSegments/embedding_packed" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/reshaped_record" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/old_shape" name: "" type: "Reshape" arg { name: "shape" ints: -1 ints: 1280 } } op { input: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/Reshape/reshaped_record" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_0" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_1" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_2" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_3" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_4" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_5" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_6" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_7" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_8" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_9" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_10" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_11" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_12" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_13" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_14" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_15" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_16" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_17" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_18" output: "nested/dot/SPARSE_AD_MEDIA_XRAY_V11_TOPIC_ID_AUTO_FIRST_X_AUTO_UNIGRAM/Pool_Option_1/Repeat_0/full_id/split/output_19" name: "" type: "Split" arg { name: "axis" i: 1 } } ``` Reviewed By: chonglinsun Differential Revision: D18083520 fbshipit-source-id: f592fb7734dd4e3e712ba42dc0afcd0b32a4afa0	2019-10-29 14:56:18 -07:00
Jerry Zhang	aa949b12b3	InsertObservers (#27238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27238 att Test Plan: test_jit.py insert_observers Imported from OSS Differential Revision: D18182914 fbshipit-source-id: 718300f259a2e38e730d3e7cc6308813fd1112af	2019-10-29 14:24:16 -07:00
Michael Suo	4045d6c3fa	Revert D18187208: Add OfflineTensor Test Plan: revert-hammer Differential Revision: D18187208 Original commit changeset: 57c70f6f9897 fbshipit-source-id: d13b089ceb645b2a9852923cd21a752a2f45a15b	2019-10-29 14:20:46 -07:00
Will Feng	e33b4b6761	Use c10::variant-based enums for Reduction Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27942 Test Plan: Imported from OSS Differential Revision: D18202857 Pulled By: yf225 fbshipit-source-id: 0303ce2508e3b7665c6a91ae270a7d0ef0e45900	2019-10-29 14:15:48 -07:00
Dylan Bespalko	d8c368bd62	CPU-strided-complex support for compare and pointwise ops (#28735 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) These changes optimize complex Vec256 math kernels so that are within 2X real number performance on average. [Benchmarks are here](https://docs.google.com/spreadsheets/d/17pObcrSTpV4BOOX9FYf1vIX3QUlEgQhLvL1IBEyJyzs/edit#gid=0) Changes so far: - [x] Added complex support for eq, neq, max, and min ops. - max/min ops need to compare the absolute value for complex numbers (using zabs). - [x] Added complex support for is_nonzero and where. - where op compares the absolute value for complex numbers (using zabs). - [x] Added complex support for linear interp and and pointwise ops. - [x] Added complex support for check_convert and Linspace/Logspace. - std::complex does not support ++operator. - All compilers from clang, g++, c++ on aarch64, x86 produce the same assembly code when using `+=1' instead of `++`. [example for loop](https://godbolt.org/z/O6NW_p) - [x] Added complex support for log, log2, log10. - [x] Optimized Vec256 operators using various logarithmic identities. - `asin()`, `acos()`, `atan()` is optimized using a `ln()` identity. - `sqrt()` is optimized by splitting the computation into real and imag parts. - several `_mm256_mul_pd` are avoided by using `_mm256_xor_pd` ops instead. - [x] Added complex support for pow. - exp is cast to `std::complex<double>`. - no special optimization is added when the `exp` is real because the `std::pow()` operator expects a std::complex number. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28735 Differential Revision: D18170691 Pulled By: ezyang fbshipit-source-id: 6f167398e112cdeab02fcfde8b543cb6629c865a	2019-10-29 13:37:01 -07:00
Yinghai Lu	22d70bc1ec	Add OfflineTensor Summary: OfflineTensor will be a shell to just carry the shape and dtype. No data will be stored. This should help us plumb through the onnxifi process. Test Plan: ``` buck test caffe2/caffe2/fb/opt:onnxifi_with_offline_tensor_test ``` Reviewed By: ChunliF, zrphercule Differential Revision: D18187208 fbshipit-source-id: 57c70f6f9897a5fc66580c81295db108acd03862	2019-10-29 13:04:00 -07:00
Jerry Zhang	6b5bfd4cfc	Make inserted child module names unique (#27237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27237 Making inserted observer module and wrapper module names unique Test Plan: test_jit.py Imported from OSS Differential Revision: D18182917 fbshipit-source-id: 77aa5997fbf024c73085866550372b5e68ad9ae1	2019-10-29 12:30:49 -07:00
Pavel Belevich	7e8c48bff5	argmax for half datatype fix (#28787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28787 Stack from [ghstack](https://github.com/ezyang/ghstack): * #28787 argmax for half datatype fix Test Plan: Imported from OSS Differential Revision: D18194420 Pulled By: pbelevich fbshipit-source-id: d2abec1ea8a9ce3a93aec5a2c5bba57d163197e6	2019-10-29 12:25:43 -07:00
Thomas Viehmann	e57a119773	Remove autograd copy_ specific isFloatingPoint (#28279 ) Summary: Remove autograd copy_ specific isFloatingPoint and use c10's isFloatingType (and isComplexType). Before this, .to or .copy_ would drop requires_grad for bfloat16 as the floating types were only considered to be double, float, and half. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28279 Differential Revision: D18176084 Pulled By: izdeby fbshipit-source-id: 8a005a6105e4a827be5c8163135e693a7daae4f4	2019-10-29 12:25:39 -07:00
Michael Suo	83331bf123	don't overspecify required python version (#28842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28842 We don't care which python version, and github actions has changed the versions available, breaking our CI. So just pin it to 3-something to make it more future proof Test Plan: Imported from OSS Differential Revision: D18205349 Pulled By: suo fbshipit-source-id: bf260dc29a138dd8bf8c85081a182aae298fe86d	2019-10-29 12:08:47 -07:00
Jerry Zhang	b7d472a109	Some fixes for jit overview doc (#28112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28112 att Test Plan: reading Imported from OSS Differential Revision: D18173102 fbshipit-source-id: d8574758288bfce08eaf0f4f6163284defb56d6e	2019-10-29 12:08:42 -07:00
Anjali Chourdia	efbaa8a563	added a check for zero stride Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28784 Differential Revision: D18178889 Pulled By: anjali411 fbshipit-source-id: 976810bf3f9def3a8f5ca6885b1e049b831f06f3	2019-10-29 12:08:38 -07:00
Mingzhe Li	607defa8a9	print per block avg time when running on AI-PEP machines (#28838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28838 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test -- --ai_pep_format true Total time: 02:36.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Softmax /proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. """ PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.83197245048359"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.839232977246866"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.7970924858236685"} PyTorchObserver {"type": "PyTorch_Softmax_N4_C3_H128_W128", "metric": "latency", "unit": "ms", "value": "4.708389271399938"} # Benchmarking PyTorch: Softmax ... Reviewed By: hl475 Differential Revision: D18202504 fbshipit-source-id: 4a332763432b3b5886f241bb2ce49d4df481a6f3	2019-10-29 12:08:33 -07:00
Mingzhe Li	0a68e8bab0	fix op bench runtime error when use_jit is enabled (#28837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28837 The JIT code used in op bench is not compatibility with latest JIT code path. This diff aims to resolve that issue. Test Plan: ```buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:add_test -- --use_jit Building: finished in 02:29.8 min (100%) 7055/7055 jobs, 1 updated Total time: 02:30.3 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: JIT # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 118.052 Reviewed By: hl475 Differential Revision: D18197057 fbshipit-source-id: 92edae8a48abc4115a558a91ba46cc9c3edb2eb8	2019-10-29 12:08:28 -07:00
Amy Yang	ac4c72db3b	add DNNLOWP static qparam choosing to pybind Summary: as title Test Plan: test in stacked diff Reviewed By: csummersea Differential Revision: D18123726 fbshipit-source-id: ce75db1e6f314a822a94ebdfc11988fab50ee836	2019-10-29 12:05:33 -07:00
Xiang Gao	f42768f8c0	Add scripts to run cuda-memcheck (#28127 ) Summary: This PR adds scripts that could be used for https://github.com/pytorch/pytorch/issues/26052 Example output: ``` Success: TestTorchDeviceTypeCPU.test_advancedindex_big_cpu Success: TestTorchDeviceTypeCPU.test_addcmul_cpu Success: TestTorchDeviceTypeCPU.test_addbmm_cpu_float32 Success: TestTorchDeviceTypeCPU.test_advancedindex_cpu_float16 Success: TestTorchDeviceTypeCPU.test_addmv_cpu Success: TestTorchDeviceTypeCPU.test_addcdiv_cpu Success: TestTorchDeviceTypeCPU.test_all_any_empty_cpu Success: TestTorchDeviceTypeCPU.test_atan2_cpu Success: TestTorchDeviceTypeCPU.test_advancedindex_cpu_float64 Success: TestTorchDeviceTypeCPU.test_baddbmm_cpu_float32 Success: TestTorchDeviceTypeCPU.test_atan2_edgecases_cpu Success: TestTorchDeviceTypeCPU.test_add_cpu Success: TestTorchDeviceTypeCPU.test_addr_cpu_bfloat16 Success: TestTorchDeviceTypeCPU.test_addr_cpu_float32 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28127 Differential Revision: D18184255 Pulled By: mruberry fbshipit-source-id: 7fd4bd9faf9f8b37b369f631c63f26eb965b16e7	2019-10-29 12:05:29 -07:00
Mingzhe Li	4703854321	change softmax input shape (#28836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28836 as title Test Plan: ``` buck run mode/opt //caffe2/benchmarks/operator_benchmark/pt:softmax_test Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds. Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev') ... and 56 more. See logs for all changes Parsing buck files: finished in 6.2 sec Creating action graph: finished in 8.8 sec Building: finished in 05:42.6 min (100%) 28336/28336 jobs, 23707 updated Total time: 05:57.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: Softmax /proc/self/fd/4/softmax_test.py:57: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. """ # Mode: Eager # Name: Softmax_N4_C3_H256_W256 # Input: N: 4, C: 3, H: 256, W: 256 Forward Execution Time (us) : 18422.487 Reviewed By: hl475 Differential Revision: D18202335 fbshipit-source-id: 0bb376cb465d998a49196e148d48d436126ae334	2019-10-29 12:05:25 -07:00
Jianyu Huang	ef5a6b2262	Avoid the misleading zero_point and scale [2/2] (#28827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28827 When we print the `DynamicLinear` module, we don't want to print the scale and zero points as they are not needed for the dynamic quantization. Let's take the output of RoBERTa model as an example: Before this PR: ``` (19): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (20): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072, scale=1.0, zero_point=0) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024, scale=1.0, zero_point=0) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096, scale=1.0, zero_point=0) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024, scale=1.0, zero_point=0) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ``` After this PR: ``` (19): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) (20): TransformerEncoderLayer( (dropout): Dropout(p=0.1, inplace=False) (attention): MultiheadAttention( (dropout): Dropout(p=0.1, inplace=False) (input_projection): DynamicQuantizedLinear(in_features=1024, out_features=3072) (output_projection): DynamicQuantizedLinear(in_features=1024, out_features=1024) ) (residual_mlp): ResidualMLP( (mlp): Sequential( (0): DynamicQuantizedLinear(in_features=1024, out_features=4096) (1): GeLU() (2): Dropout(p=0.1, inplace=False) (3): DynamicQuantizedLinear(in_features=4096, out_features=1024) (4): Dropout(p=0.1, inplace=False) ) ) (attention_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ``` ghstack-source-id: 92807317 Test Plan: CI Differential Revision: D18197022 fbshipit-source-id: e41635330cfdfb008a0468d6a8ff67a06f7e1c59	2019-10-29 12:02:45 -07:00
Nikolay Korovaiko	47faee2fae	Switching tests to ProfilingExecutor (rebased) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28535 Differential Revision: D18197932 Pulled By: Krovatkin fbshipit-source-id: 2639b205e899f800787ee57c157447d54e4669c3	2019-10-29 11:41:42 -07:00
svcscm	eb55104185	Updating submodules Summary: GitHub commits: `214b370edb` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: aa03f9a37d316c232fdf2e4289c32ec68a22b469	2019-10-29 10:13:21 -07:00
Edward Yang	5fbec1f55d	Revert D18170996: Move type casting to c10/util/TypeCast.h Test Plan: revert-hammer Differential Revision: D18170996 Original commit changeset: 41658afd5c0a fbshipit-source-id: 394e84bbc52bdd708609304261ffa1513a771d57	2019-10-29 07:43:01 -07:00
Edward Yang	0301f5f30b	Revert D18170997: Make TensorIterator stop promoting types by copying Test Plan: revert-hammer Differential Revision: D18170997 Original commit changeset: 9c82c1c89583 fbshipit-source-id: 8862d9628864d23a087f2895870386772a634e45	2019-10-29 07:42:56 -07:00
Edward Yang	dff159804f	Revert D18170995: Simplify copy kernel Test Plan: revert-hammer Differential Revision: D18170995 Original commit changeset: 461b53641813 fbshipit-source-id: 1ebb119325d746a153982ac3209d3570a7e18d88	2019-10-29 07:42:52 -07:00
Xiaomeng Yang	f6692146e7	Add Conv3dInt8 (#28768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28768 Add Conv3dInt8 Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv" Reviewed By: jianyuh Differential Revision: D18023661 fbshipit-source-id: 8fc7a4350baf29271dfd6fa3c1c4b10e60e2fdbf	2019-10-28 23:28:11 -07:00
svcscm	295401f04c	Updating submodules Summary: GitHub commits: `edee4921c4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: b69770ac1a801b372fba0e112124b25ad1572821	2019-10-28 22:24:35 -07:00
Tao Xu	a0339c8d8f	`bootstrap.sh` refactor (#28809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28809 ### Summary This PR adds the interactive mode to `bootstrap.sh`. Instead of passing the credential information from command parameters(`-t`,`-p`), we're going to ask the user enter that information and save it to a config file, such that next time you don't have to enter again. So all you need now, is one line command ```shell ./bootstrap ``` ### Test Plan - TestApp.ipa can be installed on any devices - Don't break CI jobs Test Plan: Imported from OSS Differential Revision: D18194032 Pulled By: xta0 fbshipit-source-id: a416ef7f13fa565e2c10bb55f94a8ce994b4e869	2019-10-28 22:20:29 -07:00
Lu Fang	097da55249	Fix BC check CI (#28816 ) Summary: Skip the functions which were reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28816 Reviewed By: hl475 Differential Revision: D18196628 Pulled By: houseroad fbshipit-source-id: 30d43fcd57efb21b870c6a630b7ee305604dc603	2019-10-28 21:47:18 -07:00
jon-tow	52dd587123	C++ API parity: Upsample (#28413 ) Summary: Adds `interpolate` functional and `Upsample` module support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28413 Differential Revision: D18165014 Pulled By: yf225 fbshipit-source-id: ecae2f432a301b1f4afa7c038b2d104cbad139f2	2019-10-28 21:34:44 -07:00
Linbin Yu	1e3e1f5bf9	Fix build error in VariableTypeManual Summary: build error in internal pt mobile build ``` xplat/caffe2/torch/csrc/autograd/VariableTypeManual.cpp:118:49: error: address of function 'requires_grad' will always evaluate to 'true' [-Werror,-Wpointer-bool-conversion] autograd::utils::requires_grad_leaf_error(requires_grad) ~~~~~~~~ ^~~~~~~~~~~~~ xplat/caffe2/torch/csrc/autograd/VariableTypeManual.cpp:118:49: note: prefix with the address-of operator to silence this warning ``` I think the variable name in requires_grad_leaf_error is wrong. Test Plan: mobile build works Reviewed By: pbelevich Differential Revision: D18192663 fbshipit-source-id: a3d3ebb9039022eb228c1d183a1076f65f9e84e0	2019-10-28 17:55:41 -07:00
svcscm	c6ad68cf10	Updating submodules Summary: GitHub commits: `724e939772` `f4fb4266c0` `95d4b19724` `8b8131450e` `ac8faa6528` `5487e2b1a2` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9b9f4cccd869638215c17111361a6f6c480c73af	2019-10-28 17:55:35 -07:00
Jianyu Huang	6f90567e0c	Add the unittest import for test_fake_quant.py (#28815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28815 Add the unittest import ghstack-source-id: 92789329 Test Plan: CI Differential Revision: D18191989 fbshipit-source-id: c54e0309e21156c33e4fec01bfba17a1c30463c9	2019-10-28 17:52:57 -07:00
James Reed	949678bd9e	Small fixes for torchbind (#28800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28800 Fix up namespaces and make friendly error message when registered class doesn't inherit from the right base Test Plan: Imported from OSS Differential Revision: D18175067 Pulled By: jamesr66a fbshipit-source-id: 5c7cf3a49fb45db502d84eb3f9a69be126ee59fb	2019-10-28 16:45:24 -07:00
Will Feng	f63cf96c4d	Update C++ parity table for torch::nn::Linear (#28804 ) Summary: Since we have merged https://github.com/pytorch/pytorch/pull/27382 (thanks pbelevich!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28804 Differential Revision: D18185714 Pulled By: yf225 fbshipit-source-id: 1148f5837fbf578843b989fc53fd334519943cdd	2019-10-28 14:55:25 -07:00
Xiang Gao	5c5b2c68db	Simplify copy kernel (#28428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28428 Using the new type promotion and dynamic casting added to `TensorIterator`, the copy kernels could be greatly simplified. Benchmark on CUDA: ```python import torch import timeit import pandas import itertools from tqdm.notebook import tqdm import math print(torch.__version__) print() _10M = 10 * 1024 ** 2 d = {} for from_, to in tqdm(itertools.product(torch.testing.get_all_dtypes(), repeat=2)): if from_ not in d: d[from_] = {} a = torch.empty(_10M, dtype=from_, device='cuda') min_ = math.inf for i in range(100): torch.cuda.synchronize() start = timeit.default_timer() a.to(to) torch.cuda.synchronize() end = timeit.default_timer() elapsed = end - start if elapsed < min_: min_ = elapsed d[from_][to] = int(min_ * 1000 * 1000) pandas.DataFrame(d) ``` original: ![image](https://user-images.githubusercontent.com/1032377/67623519-e3e6dd80-f7da-11e9-86ea-9cc9f237123b.png) new: ![image](https://user-images.githubusercontent.com/1032377/67623527-fc56f800-f7da-11e9-82bd-dc1ff9821b68.png) Test Plan: Imported from OSS Differential Revision: D18170995 Pulled By: ezyang fbshipit-source-id: 461b53641813dc6cfa872a094ae917e750c60759	2019-10-28 14:49:06 -07:00
Xiang Gao	b9f099ed93	Make TensorIterator stop promoting types by copying (#28427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28427 Fixes: https://github.com/pytorch/pytorch/issues/26401 This PR fixes the issue by using the newly added dynamic cast inside `TensorIterator` so that instead of converting the type at the beginning (which generates extra kernel launches), the `TensorIterator` do a load-cast-compute-store for each element while looping. So there is only one read and one write of memory. nvprof: ```python import torch _100M = 100 * 1024 2 r = torch.randn(_100M, dtype=torch.float32, device='cuda') d = torch.randn(_100M, dtype=torch.float64, device='cuda') torch.cuda.synchronize() torch.cuda.profiler.start() r.add_(d) torch.cuda.profiler.stop() torch.cuda.synchronize() ``` ``` ==11407== NVPROF is profiling process 11407, command: /home/xgao/anaconda3/bin/python simple.py ==11407== Profiling application: /home/xgao/anaconda3/bin/python simple.py ==11407== Profiling result: Type Time(%) Time Calls Avg Min Max Name GPU activities: 100.00% 2.0611ms 1 2.0611ms 2.0611ms 2.0611ms _ZN2at6native18elementwise_kernelILi512ELi1EZNS0_15gpu_kernel_implIZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEvS4_RKT_EUliE_EEviT1_ API calls: 100.00% 1.05006s 1 1.05006s 1.05006s 1.05006s cudaLaunchKernel 0.00% 2.7740us 2 1.3870us 673ns 2.1010us cudaGetDevice 0.00% 2.3730us 1 2.3730us 2.3730us 2.3730us cudaSetDevice 0.00% 830ns 1 830ns 830ns 830ns cudaGetLastError ``` benchmark** ```python import torch print(torch.__version__) print(torch.version.git_version) _100M = 100 * 1024 ** 2 r = torch.randn(_100M, dtype=torch.float32, device='cuda') d = torch.randn(_100M, dtype=torch.float64, device='cuda') torch.cuda.synchronize() %timeit r.add_(d); torch.cuda.synchronize() ``` original ``` 1.4.0a0+7d277b0 7d277b0670eb1f9098a7e098e93b20453e8b5c9f 6.83 ms ± 1.12 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` after ``` 1.4.0a0+f0f2f65 f0f2f654cba9b8c569f0bcd583732bbc891f80b2 2.08 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` For more benchmark, see: https://github.com/pytorch/pytorch/pull/28344 Test Plan: Imported from OSS Differential Revision: D18170997 Pulled By: ezyang fbshipit-source-id: 9c82c1c89583f3e6202c5d790b9b73ad9f960fad	2019-10-28 14:49:02 -07:00
Xiang Gao	688a9dbe3c	Move type casting to c10/util/TypeCast.h (#28426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28426 Type casting is used in copy, and will be used also in tensor iterator in the next stacked diff. I move it to c10 to make it serve as an common util for different things. I also add two dynamic casting functions - fetch_and_cast - cast_and_store fetch_and_cast fetch a value with dynamic type specified by a ScalarType from a void pointer and cast it to a static type. cast_and_store casts a static typed value into dynamic type specified by a ScalarType, and store it into a void pointer. Test Plan: Imported from OSS Differential Revision: D18170996 Pulled By: ezyang fbshipit-source-id: 41658afd5c0ab58c6b6c510424893d9a2a0c059e	2019-10-28 14:48:57 -07:00
Peter Bell	f33813d589	Return NotImplemented from all binary math ops (#27423 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26333 Fixes the operators missed in https://github.com/pytorch/pytorch/issues/26507 and includes a test for all operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27423 Differential Revision: D17835390 Pulled By: ezyang fbshipit-source-id: 7a1351c7ccc8ad11454dbaa00d3701dcee4f06a8	2019-10-28 14:28:33 -07:00
Jianyu Huang	9e64c54c01	Add the warning message for API with linear modules (#28766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28766 Add the warning message to explicitly ask the users to upgrade the deprecated `torch.jit.quantized` API to the new `torch.quantization.quantize_dynamic` API. ghstack-source-id: 92711620 Test Plan: CI Differential Revision: D18164903 fbshipit-source-id: e6aff2527f335c2d9f362e6856ce8597edb52aaa	2019-10-28 14:24:44 -07:00
Jianyu Huang	02d318461e	Temporarily disable test_numerical_consistency_per_channel due to failure (#28807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28807 `FAIL: test_numerical_consistency_per_channel (_main_.TestFakeQuantizePerChannel)` This test is failing consistently on master, we can't find a clean blame. ghstack-source-id: 92763176 Test Plan: CI Differential Revision: D18181496 fbshipit-source-id: 5948af06c4cb7dea9a8db1366deb7c12f6ec1c72	2019-10-28 13:51:10 -07:00
Michael Suo	9f890a9218	make sure clang-tidy is diffing against the right thing (#28788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28788 Okay, my last fix was wrong because it turns out that the base SHA is computed at PR time using the actual repo's view of the base ref, not the user's. So if the user doesn't rebase on top of the latest master before putting up the PR, the diff thing is wrong anyway. This PR fixes the issue by not relying on any of these API details and just getting the merge-base of the base and head refs, which should guarantee we are diffing against the right thing. This solution taken from https://github.com/github/VisualStudio/pull/1008 Test Plan: Imported from OSS Differential Revision: D18172391 Pulled By: suo fbshipit-source-id: 491a50119194508b2eefa5bd39fe813ca85f27b1	2019-10-28 13:47:51 -07:00
Will Feng	5804e54c81	Deprecate torch::nn::modules_ordered_dict API (#28774 ) Summary: I finally found a way to get the following API to work for constructing a list of named submodules for `Sequential`: ```cpp Sequential sequential({ {"m1", MyModule(1)}, {"m2", MyModule(2)} })` ``` which was actually our original proposed design and much simpler than our current API: ```cpp Sequential sequential(modules_ordered_dict({ {"m1", MyModule(1)}, {"m2", MyModule(2)} })); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28774 Differential Revision: D18174013 Pulled By: yf225 fbshipit-source-id: 3a18c2d36b6a65a07bee7346a7516780567c7774	2019-10-28 13:01:13 -07:00
Vitaly Fedyunin	87c98acf5d	Back out "Add memory format support to `full_like` operator" (#28803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28803 Original commit changeset: 1761a9939aa7 ghstack-source-id: 92748946 Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx Reviewed By: ifedan Differential Revision: D18175282 fbshipit-source-id: d3f537bbed50a4524797edd96b210b8455ef1bcc	2019-10-28 12:44:53 -07:00
Vitaly Fedyunin	d828fef8ac	Back out "Add memory format support to `ones_like` operator" (#28802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28802 Original commit changeset: 5da9530f6b23 ghstack-source-id: 92748794 Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx Reviewed By: ifedan Differential Revision: D18175303 fbshipit-source-id: ac36c7d345cba901bc2b64dc22661b8d0f179f13	2019-10-28 12:44:49 -07:00
Vitaly Fedyunin	266c1652e6	Back out "Add memory format support to `rand_like` operator" (#28801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28801 Original commit changeset: 2a1d47571268 ghstack-source-id: 92748792 Test Plan: buck test language_technology/neural_mt/os/pytorch_translate/test:test_onnx Reviewed By: ifedan Differential Revision: D18175304 fbshipit-source-id: ffd61f6e42f256b39b80a6b42d989c238228f25d	2019-10-28 12:44:45 -07:00
nuka137	648749b203	C++ API: torch::nn::LPPool2d (#28492 ) Summary: Add torch::nn::LPPool2d module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 #27800 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28492 Differential Revision: D18109401 Pulled By: yf225 fbshipit-source-id: 5cedecb895d9d44c2167cdb3f6f758f3426b3497	2019-10-28 12:28:25 -07:00
Jianyu Huang	052046b18e	Enabling intra-op parallelism for dynamic quantized Linear operator (#28477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28477 Similar to https://github.com/pytorch/pytorch/pull/26692, we would like to enable the intra-op parallelism for dynamic Linear op. ghstack-source-id: 92419573 Test Plan: CI Test Benchmark: ``` import time import torch K, N = 1024, 1024 print('M', 'nthread=1', 'nthread=2', 'nthread=4', 'nthread=8', 'nthread=16', sep=', ') for M in range(512, 2049, 512): print(M, sep=',', end=', ') for num_threads in (1, 2, 4, 8, 16,): torch.set_num_threads(num_threads) x = torch.rand(M, K) w = torch.rand(K, N) NITER = 20 # Test dynamic quantized q_w = torch.quantize_per_tensor(w, 0.01, 0, dtype=torch.qint8) packed_w = torch.ops.quantized.linear_prepack(q_w, None) s = time.time() for i in range(NITER): torch.ops.quantized.linear_dynamic(x, packed_w) elapsed_per_iter_dyn_quant = (time.time() - s) / NITER print("{:0.2f}".format(2.0MN*K/elapsed_per_iter_dyn_quant/1E9), end=', ') print("\n", end='') ``` Before this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 119.28, 139.50, 141.66, 141.58, 141.42, 1024, 122.42, 141.21, 123.09, 141.85, 123.03, 1536, 122.80, 122.18, 141.39, 123.25, 141.35, 2048, 123.41, 141.34, 123.62, 140.55, 123.76, ``` After this Diff: ``` (base) [root@rtptest10054.frc2 ~/jhuang_test/dynamic_quant]# python benchmark_quantize_dynamic.py M, nthread=1, nthread=2, nthread=4, nthread=8, nthread=16 512, 123.29, 271.99, 508.66, 882.83, 1295.07, 1024, 126.05, 273.15, 515.42, 914.11, 877.63, 1536, 142.48, 236.85, 524.10, 481.32, 970.81, 2048, 124.76, 279.03, 433.73, 958.67, 1045.82, ``` Differential Revision: D18074757 fbshipit-source-id: ad5b43477d2187c818c137093c6d6af02d5ca1d5	2019-10-28 12:13:35 -07:00
Mingzhe Li	9f44a04613	separate PT and C2 to reduce build time (#28731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28731 as title Test Plan: ``` Before: buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Invalidating internal cached state: Buck configuration options changed between invocations. This may cause slower builds. Changed value project.buck_out='buck-out/opt' (was 'buck-out/dev') ... and 69 more. See logs for all changes Parsing buck files: finished in 7.2 sec Creating action graph: finished in 10.0 sec Building: finished in 06:38.4 min (100%) 29890/29890 jobs, 29890 updated Total time: 06:55.7 min # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: sigmoid With this diff buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid Parsing buck files: finished in 6.4 sec Creating action graph: finished in 9.8 sec Building: finished in 06:35.9 min (100%) 29892/29892 jobs, 29892 updated Total time: 06:52.1 min Reviewed By: hl475 Differential Revision: D18152071 fbshipit-source-id: 80c29570581bbd2f0e78e2df32734c17a2b036ee	2019-10-28 11:10:47 -07:00
Bill Farner	0c7537c409	Fix obviously-broken .clang-tidy files (#28547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28547 Pull Request resolved: https://github.com/pytorch/glow/pull/3672 See D18090864 for more background. The issue i addressed there is more widespread, so i'm fixing all the other `.clang-tidy` files clearly not working as intended. Perhaps this means it's time to lint the linter config :-) Test Plan: Here's the resulting output for `~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy` related to each file touched: `fbcode/admarket/intent/.clang-tidy`: P119723794 `fbcode/caffe2/.clang-tidy`: P119723978 `fbcode/glow/glow/.clang-tidy`: P119724081 `fbcode/ice_palace/.clang-tidy`: P119724774 `fbcode/unified_graph/aggregator/.clang-tidy`: P119724375 `xplat/caffe2/.clang-tidy`: P119724464 `xplat/mcfcpp/.clang-tidy`: ``` [billfarner@devvm2187.ftw3 ~/fbsource/xplat/mcfcpp] ~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy -explain-config 'readability-identifier-naming' is enabled in the /home/billfarner/fbsource/xplat/mcfcpp/.clang-tidy. ``` `xplat/wa-msys/mcfcpp/.clang-tidy`: ``` [billfarner@devvm2187.ftw3 ~/fbsource/xplat/wa-msys/mcfcpp] ~/fbsource/fbcode/third-party-buck/platform007/build/llvm-fb/bin/clang-tidy -explain-config 'readability-identifier-naming' is enabled in the /home/billfarner/fbsource/xplat/wa-msys/mcfcpp/.clang-tidy. ``` Reviewed By: soumith Differential Revision: D18092684 fbshipit-source-id: 951307d125c0346322cb2c636c0300004a48d7a9	2019-10-28 09:54:34 -07:00
Xinyi Zhang	f5ea2ca34a	Reduce logging frequency for empty range tolarence Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28704 Reviewed By: xianjiec Differential Revision: D18138828 fbshipit-source-id: 4f3c376502cb6e30b931217702c4ca537c9eb644	2019-10-28 09:52:17 -07:00
Xinyu Li	7ed9a3ec48	Change reorder_dimensions behavior to favor output writting sequence (#28615 ) Summary: reorder_dimensions() currently iterate all the operands when determining the dimension order in the TensorIterator. It tries to move a dimension to front if any operand has a dimension whose stride is bigger than this dimension. reorder_dimensions() do respect the case that stride has zero value. I did not see a reason why reorder_dimensions() need to keep probing each operand under regular cases. Changed behavior a little bit. Since operands is ordered by outputs tensor first followed by input tensor. I would favor the writing of outputs is as sequential as possible. This could make the copy between tensors with different memory format faster. Pls correct me if this change is wrong, thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28615 Reviewed By: VitalyFedyunin Differential Revision: D18122474 Pulled By: glaringlee fbshipit-source-id: f36467489fe6c6514b14ce9dcc439628d5d5ad0e	2019-10-28 08:50:03 -07:00
titaneric	82f31e02a3	Remove the redundant calculation of derivative of power function (#28651 ) Summary: Hi, I notice that the pytorch faced the the issue as HIPS/autograd#541 . I try to solve it, hope it can help. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28651 Reviewed By: gchanan Differential Revision: D18137163 Pulled By: albanD fbshipit-source-id: 888bef65c72c4c15c2acdd4b13d5041008b1354e	2019-10-28 08:37:04 -07:00
Prasun Anand	4230132baf	Added docs for context method mixins. Fixes issue #27365 (#28643 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27365 . This PR: 1. Makes Context method docs available. 2. Links [Extending torch autograd](https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd) notes to Context method docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28643 Differential Revision: D18170089 Pulled By: albanD fbshipit-source-id: a1119ea8e2f8a71f0d1aadf416f2f98343aa9b7b	2019-10-28 08:31:35 -07:00
svcscm	0e86c99bfb	Updating submodules Summary: GitHub commits: `a3277b4e50` `9ac4d71072` `98141ffe1b` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: e1d9ec467e9d72774fda12ca1e8ca2e740fbe5c7	2019-10-28 08:29:12 -07:00
Igor Fedan	5835ad07cb	provide memory format as Contiguous explicitly when calling to clone() (#28029 ) Summary: provide memory format explicitly when calling to clone(): ``` clone(MemoryFormat::Contiguous); \\instead of clone() ``` This change is based on https://github.com/pytorch/pytorch/pull/27106 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28029 Differential Revision: D17937468 Pulled By: ifedan fbshipit-source-id: 0a6a600af76fc616f88893e5db16aabd7981ce14	2019-10-28 08:21:39 -07:00
Gregory Chanan	6eaea39867	Kill _th_zero binding, just use a simple native function instead. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28597 Test Plan: Imported from OSS Differential Revision: D18116721 Pulled By: gchanan fbshipit-source-id: f93b968333042700c31e37f434080b200754dddc	2019-10-28 08:17:46 -07:00
Gregory Chanan	8e67b78d9b	Kill THTensor_(match), which isn't used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28596 Test Plan: Imported from OSS Differential Revision: D18116718 Pulled By: gchanan fbshipit-source-id: a038eaad0f6cf951a5d412078cfcba3ae534ea95	2019-10-28 08:17:43 -07:00
Gregory Chanan	1e5b2559ac	Write out some set_ overloads instead of relying on code binding generation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28595 Test Plan: Imported from OSS Differential Revision: D18116720 Pulled By: gchanan fbshipit-source-id: a917e03aeb8d5513ad3882163642b800ae35dabe	2019-10-28 08:17:39 -07:00
Alexander Golynski	45dab56153	adding python all_gather coalesced functionality and testing. (#28634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28634 caveat 1: this only works in sync mode. caveat 2: this is going to go away and be replaced by c++ implementation Test Plan: buck test caffe2/test:distributed_gloo -- test_all_gather_coalesced Reviewed By: mrshenli Differential Revision: D18123422 fbshipit-source-id: cfb9950d5d54c6181a5240e7cc9fed88ed47f5d9	2019-10-28 08:12:36 -07:00
vishwakftw	aea94de067	Exclude more files in torch/csrc/distributed when USE_DISTRIBUTED=0 (#28621 ) Summary: Changelog: - Guard inclusion of certain files in torch/csrc/distributed included in caffe2/CMakeLists.txt when USE_DISTRIBUTED=0 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28621 Test Plan: - Builds should be successful - Tests should pass Differential Revision: D18145330 Pulled By: ezyang fbshipit-source-id: 7167a356b03ae783e6b0120f2ad3552db2b3ed86	2019-10-28 08:03:30 -07:00
Adam J. Stewart	4cf7277d62	Explain how to specify library location for MKL (#28779 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24334. I'm still kind of confused why `FindMKL.cmake` was unable to locate my MKL libraries. They are in the standard `/opt/intel/mkl` installation prefix on macOS. But at least with this more detailed error message, it will be easier for people to figure out how to fix the problem. zhangguanheng66 xkszltl soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/28779 Differential Revision: D18170998 Pulled By: soumith fbshipit-source-id: 47e61baadd84c758267dca566eb1fb8a081de92f	2019-10-28 08:00:54 -07:00
Richard Zou	5da932ad72	Return None correctly from `Tensor.names` (#28659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28659 Previously, we would return None from `Tensor.names` without bumping the refcount. This is a bug; the Python API requires the developer to increment the refcount on new references to None. This is because None is a singleton object and does not automatically have its reference count bumped when one uses Py_None (which is a pointer to the actual None singleton object). See the following for Python documentation on this: - https://docs.python.org/3/c-api/none.html#c.Py_RETURN_NONE - https://docs.python.org/3/extending/extending.html#back-to-the-example Fixes https://github.com/pytorch/pytorch/issues/28646 Test Plan: - New test. Differential Revision: D18140593 Pulled By: zou3519 fbshipit-source-id: 302a09021b68229e2e7b1b584b3549b30506bdab	2019-10-28 07:01:22 -07:00
Gerard Goossen	c60dee271d	addmm: Fix handling of case with empty tensor (#28613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28613 addmm: Fix handling of case with empty tensor. Currently these cause an error Recreation of D18085389 without stacked diffs Test Plan: test included Differential Revision: D18122004 fbshipit-source-id: 71513c02ace691902553bea5ce9dc2538cca4c99	2019-10-28 05:52:50 -07:00
Lu Fang	c89340f068	Extend HasElements to support multiple inputs (#28717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28717 Make HasElements support multiple inputs. Any input has element, then return true. Test Plan: to be added Reviewed By: BIT-silence Differential Revision: D17972759 fbshipit-source-id: 3ecdea74a30fcfaaa6490fef1debc6cde68db922	2019-10-27 23:00:07 -07:00
Owen Anderson	7df3366f8d	Eliminate some unnecessary tensor ref count bumps. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28695 Differential Revision: D18144971 fbshipit-source-id: 3d6ee1343a458a8363707ce468f4d9eab2784ebb	2019-10-27 20:02:34 -07:00
Will Feng	f43194ed9e	Move mode_t declaration in PadOptions (#28760 ) Summary: Based on the discussion in https://github.com/pytorch/pytorch/pull/28413#discussion_r338839489, putting anything that's not tagged as `public:` under a `TORCH_ARG` line would hide it under `private:`. To get around this problem, we should move the `mode_t` declaration at the top of the PadOptions declaration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28760 Differential Revision: D18165117 Pulled By: yf225 fbshipit-source-id: cf39c0a893822264cd6a64cd887729afcd84dbd0	2019-10-27 15:51:39 -07:00
Xiaomeng Yang	d5afd97569	Refactor qconv_prepack and qconv_unpack to support conv3d (#28481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28481 Refactor qconv_prepack and qconv_unpack to support conv3d Test Plan: buck test mode/dev-nosan caffe2/test:quantized -- "Conv" Reviewed By: dskhudia Differential Revision: D18023651 fbshipit-source-id: 8cbc9fe68f93bc4b247a4f41423c6d8c30a5ef90	2019-10-27 14:43:16 -07:00
henribru	764e0ee882	Improve `Tensor` type hints (#28578 ) Summary: I've typed some attributes from `ee920b92c4/torch/csrc/autograd/python_variable.cpp (L490)` that were not included in the stubs so that MyPy will be aware of them. I made sure to only add those attributes that are mentioned somewhere in the documentation. If there are attributes mentioned in the documentation that are not meant to be part of the public API (or the opposite), please let me know. I've also made sure that attributes that can't be set are typed as read-only properties. If setting `dtype`, `shape`, `device` or `names` directly is not part of the public API, let me know and I'll make them properties as well. I've also added `__len__`, `__iter__` and `__contains__`, which means MyPy will no longer complain about `len(t)`, `t1 in t2` and `for t1 in t2`. Shameless plug: I have another typing-related PR here that needs review: https://github.com/pytorch/pytorch/pull/27445 Fixes https://github.com/pytorch/pytorch/issues/28457 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28578 Reviewed By: lerks Differential Revision: D18113954 Pulled By: fmassa fbshipit-source-id: 0b69a2966d22054d8d87392f19ec5aa3918773bc	2019-10-27 04:43:51 -07:00
henribru	440b192078	Type hints: Return `Iterator` instead of `Iterable` from `__iter__` (#27445 ) Summary: `__iter__` methods are supposed to return iterators (https://docs.python.org/3/reference/datamodel.html#object.__iter__), but some of them are typed to return iterables, which is too general. This results in error messages such as `Iterable[Module[Any]]" has no attribute "__next__"` from Mypy. Technically this should also have caused a type error [here](`8f7020bbdb/torch/nn/modules/container.py (L115)`), but due to a bug in Mypy type checking isn't working correctly in untyped methods (this will be fixed in the next release though: https://github.com/python/mypy/pull/7530). Pull Request resolved: https://github.com/pytorch/pytorch/pull/27445 Reviewed By: lerks Differential Revision: D18113966 Pulled By: fmassa fbshipit-source-id: c6261ac866f86df4328e6d2fdfca0625aa2d2492	2019-10-27 04:40:55 -07:00
James Reed	f782500ee0	Abstract tracer::enter and tracer::exit into a function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28473 Test Plan: Imported from OSS Differential Revision: D18121007 Pulled By: jamesr66a fbshipit-source-id: 4c4a4344ad9bcc4630b945d2a645a0b05928933c	2019-10-26 18:41:14 -07:00
Vitaly Fedyunin	7ff272c6da	Back out D17980308-D17980313 (#28748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28748 Found D17980313 to break unit tests, backed out descendants too to avoid conflicts. Test Plan: Failed on master: buck test mode/dev-nosan language_technology/neural_mt/fb/pytorch_translate/test:test_onnx Passes with this diff. Differential Revision: D18157588 fbshipit-source-id: e2b56eac8c5bfccf3ce9a3a2993f6332ab1471e7	2019-10-26 13:08:49 -07:00
Tao Xu	e96ea288a8	Automation scripts for perf testing (#28622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28622 ### Summary As discussed in #28405 , this is the third PR. The`bootstrap.sh` script is mainly for those who want to do perf on iOS but don't want to touch XCode or any iOS code. But it does require you have valid iOS dev credentials installed on your machine. (You can easily acquire those stuff from any experienced iOS developers. Takes only 5 mins to setup ) All you need to do is run ```shell ./bootstrap -t ${TEAM_ID} -p ${PROFILE} ``` The testing app will be automatically installed on your device. The log of the benchmark function will be displayed on the screen. ### Test plan Don't break any CI jobs unless they're flaky. Test Plan: Imported from OSS Differential Revision: D18156178 Pulled By: xta0 fbshipit-source-id: cd7ba8d87bf26db885262888b9d6a5fd072309d1	2019-10-25 19:50:24 -07:00
davidriazati	dbf1996f79	Support MultiheadedAttention module (#28555 ) Summary: This makes MultiheadedAttention TorchScript compatible It also breaks BC-compatibility for old models that do not have `_qkv_same_embed_dim` as an attribute. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28555 Pulled By: driazati Differential Revision: D18124746 fbshipit-source-id: 5c5042fc6fc0e557db859a8ae05174cba5fce6a9	2019-10-25 17:28:53 -07:00
Mingzhe Li	e886450863	report p50 time instead of avg (#28722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28722 as title Test Plan: ```buck run mode/opt caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --operator sigmoid # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: sigmoid iters: 200, 462.6029555220157 iters: 400, 441.04792759753764 iters: 800, 441.81562116136774 iters: 1600, 440.79964311094955 iters: 3200, 436.3108493271284 iters: 6400, 440.87966314691585 iters: 12800, 452.29464218209614 # Mode: Eager # Name: sigmoid_M512_N512 # Input: M: 512, N: 512 Forward Execution Time (us) : 441.048 Reviewed By: hl475 Differential Revision: D18149525 fbshipit-source-id: 5fe70a35b790ee7ad3ff57c0cb0b1c29cb609b83	2019-10-25 17:22:27 -07:00
Negin Raoof	60d606094c	Export Meshgrid (#26037 ) Summary: Exporting meshgrid op in opset 9 symbolics Pull Request resolved: https://github.com/pytorch/pytorch/pull/26037 Reviewed By: hl475 Differential Revision: D17452325 Pulled By: houseroad fbshipit-source-id: d556b78e46594a232cdefd8c257cccd8b98221d6	2019-10-25 16:59:22 -07:00
Mike Ruberry	0c48092b22	Resets rnn _flat_weights on _apply (#28562 ) Summary: Currently when _apply() is called on RNNBase (or one of its children, like LSTM), the _flat_weights attribute may or may not be updated. In particular, when using .to() and sending a module like LSTM to XLA, a third party device type, the tensors in _flat_weights will not be updated and will remain on CPU. This causes the LSTM forward to fail since the forward call receives a mix of XLA and CPU tensors. This occurs because third party device types, like XLA, may not be a compatible shallow copy type to native tensors. When this is the case and _apply is called Module parameters are replaced, not updated. RNNBase would not sync _flat_tensors with its params in this case, and that caused the references in _flat_tensors to not reflect the module's current params. This small change forces a resync of the _flat_tensors and the actual params on each _apply. This lets .to('xla') work for LSTMs, for example. A test will be added to PyTorch/XLA (which runs in our CI) to validate this behavior after the change appears in PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28562 Differential Revision: D18138863 Pulled By: mruberry fbshipit-source-id: 284092cbe4ecff9dd334a9413c330cacdd5e04fd	2019-10-25 16:02:19 -07:00
Jerry Zhang	0eeda56632	Add nn.ReLU6 to default mapping (#28516 ) Summary: https://discuss.pytorch.org/t/quantized-hard-sigmoid/59013 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28516 Differential Revision: D18128717 Pulled By: jerryzh168 fbshipit-source-id: 4d06d1b54cf9f84a610d79fbadde2c8ef38c33f8	2019-10-25 14:52:44 -07:00
Gregory Chanan	2049e45999	Kill zero_dim_tensor_only codegen, it's not used anymore. (#28514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28514 Verified that there are no generated code changes after applying the diff. Test Plan: Imported from OSS Differential Revision: D18086966 Pulled By: gchanan fbshipit-source-id: 86c660ca78dfeeda2c888947d557cee2c4df08aa	2019-10-25 14:24:16 -07:00
Gregory Chanan	24f0bca8e2	Remove zero_dim_tensors only from _th_masked_fill_. (#28513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28513 This just ensures that (since we only have a Scalar implementation), if you pass in a Tensor that's not zero-dim you get a nice error message. Instead of doing this with codegen, we do this in code at the ATen level. Test Plan: Imported from OSS Differential Revision: D18086969 Pulled By: gchanan fbshipit-source-id: 83fe2c16046e243d573e033d033aa3844b03930a	2019-10-25 14:24:12 -07:00
Gregory Chanan	b0b852459e	Remove zero_dim_tensors only from _th_index_fill_. (#28512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28512 This just ensures that (since we only have a Scalar implementation), if you pass in a Tensor that's not zero-dim you get a nice error message. Instead, of doing this with codegen, we do this in code at the ATen level. Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D18086965 Pulled By: gchanan fbshipit-source-id: f3853bbbb0cf5816803a00877a2e94aa89e32c3b	2019-10-25 14:24:08 -07:00
Junjie Bai	d37c2d7c8d	Revert D17495965: TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test Test Plan: revert-hammer Differential Revision: D17495965 Original commit changeset: 3e8dbe8943f5 fbshipit-source-id: d47fcbec22b0d61df41d7dbf15cfdde196ac818f	2019-10-25 13:58:16 -07:00
Hyeonguk Ryu	110a931752	Change from HTTP to HTTPS Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28333 Differential Revision: D18143824 Pulled By: soumith fbshipit-source-id: 613fd2219814addc850c3b9fe7ebfd8510a5e5c8	2019-10-25 13:13:30 -07:00
Sergei Nikolaev	4996e3aca2	TensorRT 6.0 support and PyTorch->ONNX->TRT6 unit test (#26426 ) Summary: This PR makes Caffe2 compatible with TensorRT 6. To make sure it works well, new unit test is added. This test checks PyTorch->ONNX->TRT6 inference flow for all classification models from TorhchVision Zoo. Note on CMake changes: it has to be done in order to import onnx-tensorrt project. See https://github.com/pytorch/pytorch/issues/18524 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26426 Reviewed By: hl475 Differential Revision: D17495965 Pulled By: houseroad fbshipit-source-id: 3e8dbe8943f5a28a51368fd5686c8d6e86e7f693	2019-10-25 13:01:57 -07:00
Junjie Bai	b19bbde561	Migrate all the Caffe2 Centos builds to explicity use devltoolset (#28465 ) Summary: Continues https://github.com/pytorch/pytorch/pull/28431 with a new branch name that can trigger all the CI https://github.com/pytorch/pytorch/issues/28059 pytorch/ossci-job-dsl@b2c823a Pull Request resolved: https://github.com/pytorch/pytorch/pull/28465 Differential Revision: D18104647 Pulled By: bddppq fbshipit-source-id: 24decf44bdf73bd8a9c64d5fcaf34eec7a356f6e	2019-10-25 12:35:26 -07:00
Junjie Bai	0253e23d3f	Remove unused USE_ROCM environment variable (#28641 ) Summary: All USE_ROCM logics have been moved to cmake now Pull Request resolved: https://github.com/pytorch/pytorch/pull/28641 Differential Revision: D18139209 Pulled By: bddppq fbshipit-source-id: bbf0931aa6a3be963b7e0d09b6f99f088c92c94d	2019-10-25 12:33:06 -07:00
Pritam Damania	1322daa506	Improve error handling for distributed autograd engine. (#27940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27940 1) If we receive an error for outstanding rpcs, we enqueue an appropriate error on the local autograd engine. 2) Add an `exit_on_error` mode for the local autograd engine, where the computation stops if we see an error. ghstack-source-id: 92603377 Test Plan: Added unit tests to test failures. Differential Revision: D17916844 fbshipit-source-id: 199a7832f1033c36a9bbcc1e80d86576c04965d0	2019-10-25 12:07:27 -07:00
anjali411	dc17a2ecc5	Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28433 Differential Revision: D18138240 Pulled By: anjali411 fbshipit-source-id: 314e5902f103be1feb4cacde47c90204b3d353cc	2019-10-25 11:44:28 -07:00
Andreas Koepf	3f119a5f52	Port of multilabel_margin_loss from TH to ATen (CPU) [2nd try] (#28504 ) Summary: This is a port of the CPU version of the TH MultiLabelMarginCriterion to ATen. This reverts the revert of previous PR https://github.com/pytorch/pytorch/issues/28205 which caused a Windows Build to fail, please see comments in the original PR. I refactored the code so that the lambda-bodies of forward & backward of the AT_DISPATCH macro were extracted into separate functions. Similar code can be found at several cases in the ATen code base. Since I was not yet able to successfully compile PyTorch on Windows (due to other compile error) it would be great if somebody could launch a Windows test build for this PR to see if it now can be compiled successfully. Thanks in advance! Pull Request resolved: https://github.com/pytorch/pytorch/pull/28504 Differential Revision: D18115598 Pulled By: ezyang fbshipit-source-id: b62b6367966e0f6786794213b94eb0820092e572	2019-10-25 10:17:39 -07:00
Edward Yang	68ab162099	Don't clobber pytorch image with libtorch build. (#28581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28581 Fixes #28305 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18124450 Pulled By: ezyang fbshipit-source-id: 0d4bb99a6bdff9ddbfb4d25cc0f67cc261ed26ba	2019-10-25 10:13:57 -07:00
Rohan Varma	42423854f0	add test to ensure that dist autograd contexts are cleaned up incase of nested rpcs (#28485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28485 This diff adds a test to ensure that when we have multiple nested RPCs inside a dist autograd context, the context that is created as a result of a nested rpc is cleaned up after the node creating the context exits the context manager. For example, worker 0 might send an rpc to worker 1 that results in an rpc to worker 2, so worker 2 will have 0's context, even though worker 0 never directly talked to 2. This test ensures that the context on 2 would also be cleaned up. ghstack-source-id: 92611018 Test Plan: Ran the unit test. Differential Revision: D18079212 fbshipit-source-id: d49f0cda0bf2908747546e5c8a967256c848c685	2019-10-25 10:10:02 -07:00
Grigory Arutyunov	aac3998c27	msvc error C4805 fix (#28156 ) Summary: Fixes MSVC error message ``` 15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(173): error C4805: '\|=': unsafe mix of type 'bool' and type 'int' in operation 15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(173): error C4805: '\|': unsafe mix of type 'bool' and type 'int' in operation 15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(186): error C4805: '\|=': unsafe mix of type 'bool' and type 'int' in operation 15>d:\pytorch-scripts\caffe2_builders\v141\pytorch\torch\csrc\jit\register_string_ops.cpp(186): error C4805: '\|': unsafe mix of type 'bool' and type 'int' in operation ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28156 Differential Revision: D18115151 Pulled By: ezyang fbshipit-source-id: ed67a2b1330dfd4c12858ae9ca181163c0c72e51	2019-10-25 09:24:25 -07:00
Jeremy Lilley	e212543681	Improve float pickling speed. (#28553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28553 This change improves double pickling in 1M double list microbenchmark by roughly 40% (33msec -> 20msec). The main benefit is avoiding per-byte bounds checks, so we only bounds-check 2 times rather than 9 times. Unpickle is already doing something reasonable, so no need to change. fwiw, putting the swapping logic in a separate func/lambda provided roughly 20% better results, consistently when microbenchmarking. Looking at the objdump disassembly, gcc somehow generates better code when it's separated. ghstack-source-id: 92585739 Test Plan: Benchmarks: buck build mode/opt experimental/jeremyl/c2:SerializationBench buck-out/opt/gen/experimental/jeremyl/c2/SerializationBench --bm_regex=.Float. Correctness: buck build mode/dev-nosan caffe2/test/... Differential Revision: D18089481 fbshipit-source-id: a5f39e5d38c432893844241a7cce244831037e1f	2019-10-25 08:14:07 -07:00
Vitaly Fedyunin	9732c81da4	Cleanup testing of _like operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27891 Test Plan: Imported from OSS Differential Revision: D17980308 Pulled By: VitalyFedyunin fbshipit-source-id: 268b6a0875c8970885604498eb0991a8cd410b21	2019-10-25 07:29:28 -07:00
Vitaly Fedyunin	69b0e06a49	Add memory format support to `randn_like` operator (#27890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27890 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980314 Pulled By: VitalyFedyunin fbshipit-source-id: a2cf3b1b2df1a4956da971fd47ce69487b2c09e9	2019-10-25 07:29:24 -07:00
Vitaly Fedyunin	02917dd1f4	Add memory format support to `randint_like` operator (#27889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27889 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980307 Pulled By: VitalyFedyunin fbshipit-source-id: f1766c2bcb015ef870bfb92c16b4cd363b3cbc14	2019-10-25 07:29:20 -07:00
Vitaly Fedyunin	c258cd039a	Add memory format support to `zeros_like` operator (#27562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27562 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980313 Pulled By: VitalyFedyunin fbshipit-source-id: 9ca8453dc1a554ceea93c6949e01263cc576384b	2019-10-25 07:29:16 -07:00
Vitaly Fedyunin	04f5325583	Add memory format support to `rand_like` operator (#27561 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27561 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980316 Pulled By: VitalyFedyunin fbshipit-source-id: 2a1d47571268673de0c6f5ae1b6d4f9110962ab0	2019-10-25 07:29:12 -07:00
Vitaly Fedyunin	2c339a24ec	Add memory format support to `ones_like` operator (#27270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27270 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980312 Pulled By: VitalyFedyunin fbshipit-source-id: 5da9530f6b239306dbb66d1dfeefe88237f13bbd	2019-10-25 07:29:08 -07:00
Vitaly Fedyunin	85d5aee863	Add memory format support to `full_like` operator (#27262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27262 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980309 Pulled By: VitalyFedyunin fbshipit-source-id: 1761a9939aa7c5ab23e927b897e25e225089a8e7	2019-10-25 07:29:04 -07:00
Vitaly Fedyunin	baf8488dbd	Add memory format support to `empty_like` operator (#27244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27244 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D17980310 Pulled By: VitalyFedyunin fbshipit-source-id: 00a39b40daa4b8ee63c32e60d920222f8be2d6a1	2019-10-25 07:29:00 -07:00
Gregory Chanan	bfbb3e0579	Kill _th_fill binding, which isn't used anymore. (#28511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28511 We still keep the function in TH, since it's called from within TH. Test Plan: Imported from OSS Differential Revision: D18086967 Pulled By: gchanan fbshipit-source-id: de026fbb076c8bf9d054ed4cf93eba9c7bcfb161	2019-10-25 07:12:32 -07:00
Gregory Chanan	c6628b29a7	unfold: turn off device_guard (#28510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28510 It was off in TH, it can be off in ATen. Test Plan: Imported from OSS Differential Revision: D18086968 Pulled By: gchanan fbshipit-source-id: 9be9a61da1dc82224f04a22008629db982f65230	2019-10-25 07:12:27 -07:00
Gregory Chanan	7ab0a28b21	Port TH/THC implementation of unfold to ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28475 Test Plan: Imported from OSS Differential Revision: D18074672 Pulled By: gchanan fbshipit-source-id: 32e44330bf67728af47a6652b1fb70733a06ba20	2019-10-25 07:12:23 -07:00
Gregory Chanan	2793d41a9c	Fix scalar handling of unfold. (#28462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28462 Unfold is implemented in TH (as _th_unfold), and uses the standard scalar checks. That means, even though torch.tensor(5).unfold(dim=0, size=1, step=1) should produce: torch.tensor([5]), it actually produces torch.tensor(5) because the scalar_check infers it's a scalar. We can fix this by just turning off the scalar_check. Test Plan: Imported from OSS Differential Revision: D18074671 Pulled By: gchanan fbshipit-source-id: 5db09d614692830d66d6e6d8aba799ebe8144cf5	2019-10-25 07:12:18 -07:00
svcscm	1a5d32d894	Updating submodules Summary: GitHub commits: `59613a5631` `ff6fbc6607` `4dd4f00512` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 5637007829adcab490bef01db9b1bd60fd856405	2019-10-24 22:28:09 -07:00
Michael Suo	2181dd516e	fix handling of function attributes. (#28569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28569 Previously, the inclusion of function attributes would "poison" a ConcreteModuleType, because we did not have a way of checking whether they are actually the same function. This PR uses the Python function object to perform that check. This improves our ability to reuse JIT types between modules. Also this PR fixes a bug where we weren't properly adding modules as attributes when converting from ConcreteType -> JIT type (we were adding them after the fact--another reason to switch from using `register_x` to `set_x` during module construction, which is on my to-do list after this). Fixes https://github.com/pytorch/pytorch/issues/28559 Test Plan: Imported from OSS Differential Revision: D18111331 Pulled By: suo fbshipit-source-id: ec2cccf832d3ddd4cd4d28fe19cb265f1275325a	2019-10-24 22:23:37 -07:00
Sebastian Messmer	01aea1f268	Delete ATenDispatch (#28468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28468 We don't need this anymore. ghstack-source-id: 92595388 Test Plan: unit tests Differential Revision: D18073339 fbshipit-source-id: d0ef1332c83e47117fe0a5eadc8faedb259cfba0	2019-10-24 22:15:00 -07:00
Sebastian Messmer	ed503596ce	Remove c10->ATen registration forwarding (#28186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28186 Since now all ops are on c10, we don't need to forward any registrations to globalATenDispatch anymore. ghstack-source-id: 92586962 Test Plan: waitforsandcastle Differential Revision: D17969011 fbshipit-source-id: 30e6cb072c934b3d24089055754ed3695f8ea693	2019-10-24 22:14:56 -07:00
Will Feng	d04973beda	Use c10::variant-based enums for EmbeddingBag mode (#28330 ) Summary: This PR is BC-breaking in the following way: Previous, we require the use of `std::string` to specify the mode for `EmbeddingBag`. After this PR, we use variant-based enums such as `torch::kSum` / `torch::kMean` / `torch::kMax` to specify the mode for `EmbeddingBag`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28330 Differential Revision: D18127116 Pulled By: yf225 fbshipit-source-id: 15cd86c764777f4d399587be92cda15b6ce8524b	2019-10-24 17:47:42 -07:00
Owen Anderson	60a1efe138	Eliminate some unnecessary tensor refcount bumps. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28355 Differential Revision: D18129161 fbshipit-source-id: 493cf0c1d754a375ec6c73dd57cd985639c849b7	2019-10-24 17:19:07 -07:00
Sebastian Meßmer	4182c1183b	Add custom op documentation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28557 Differential Revision: D18127241 Pulled By: smessmer fbshipit-source-id: 684e1dde15520d08aeab603623614dedd1e0cbfc	2019-10-24 16:18:14 -07:00
David Reiss	1dfb8752a6	Define std::strtoll for older Android (#28603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28603 This symbol isn't availble in older Android configs, so import it from the global namespace in the same file as the rest of our Android string compatibility hacks. Test Plan: Internal android build. Reviewed By: jerryzh168 Differential Revision: D18099515 fbshipit-source-id: f8b0c80ea7344e05975a695afb359b339b6d9404	2019-10-24 15:52:09 -07:00
David Reiss	da6b8a905a	Use c10::to_string in more places (#28605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28605 This was added because std::to_string isn't available in libstc++ on Android. Use it in more places to get the PyTorch Android build working with libstdc++. Test Plan: Internal android build. Reviewed By: jerryzh168 Differential Revision: D18099520 fbshipit-source-id: 17a2b617c2d21deadd0fdac1db849823637981fc	2019-10-24 15:52:05 -07:00
David Reiss	df81cb22b8	Delete move constructor from TaggedStringStream (#28604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28604 This isn't used anywhere, and it doesn't work with older libstdc++ because std::ostringstream is not copyable or movable. Test Plan: Internal android build. Reviewed By: jamesr66a Differential Revision: D18099511 fbshipit-source-id: 1ffb49303aa5d7890ca7f057b21886f88c04ce20	2019-10-24 15:52:01 -07:00
David Reiss	52e0a94661	Fix spelling in some comments Test Plan: CI Reviewed By: xcheng16, linbinyu Differential Revision: D18099518 fbshipit-source-id: 3fbf654dc30261eb27b923db0974d8088a3a5783	2019-10-24 15:51:56 -07:00
Shen Li	261a13a84b	Enable dist autograd tests (#28606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28606 Without passing setup_model_parallel=True to dist_init, it the decorator actually takes function object as the value for the flag. Test Plan: Imported from OSS Differential Revision: D18120507 Pulled By: mrshenli fbshipit-source-id: afbaa381647e8f284e28fa9dbdd2a7c411073b3f	2019-10-24 15:30:27 -07:00
Brian Vaughan	70e4548fd7	Compute correct strides after type promotion (#28253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28253 Instead of trying to fix strides after changing dtypes, wait until after promotion to set them. fixes: https://github.com/pytorch/pytorch/issues/27824 fixes: https://github.com/pytorch/pytorch/issues/28502 Test Plan: Imported from OSS Differential Revision: D18124950 Pulled By: nairbv fbshipit-source-id: e4db90b2a6bb0f5d49cb388e0cd1971303c6badd	2019-10-24 15:18:01 -07:00
lsrock1	e885ce6130	C++ parity, grid_sample functional (#28354 ) Summary: https://github.com/pytorch/pytorch/issues/25883 I put grid_sample in vision.h with affine grid. I have a question in string argument(interpolation mode, padding mode) I reuse torch::native::detail::GridSamplerInterpolation in GridSampler.h instead of using string. It follows the way that uses reduction enum in loss functions. I am not sure this is right. yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28354 Differential Revision: D18109333 Pulled By: yf225 fbshipit-source-id: 1bf972b671b107464f73b937bbe0de76fb259fbf	2019-10-24 15:14:37 -07:00
Will Feng	92b39434a2	C++ nn::ConstantPad{1,2,3}d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28541 Test Plan: Imported from OSS Differential Revision: D18115607 Pulled By: yf225 fbshipit-source-id: 736df791ddc3cd30ad9af89eacfb4a0c6b53f2cd	2019-10-24 15:10:27 -07:00
Hong Xu	5cf644157c	Speed up fill for half and bfloat16 on CPU. (#28397 ) Summary: This is done by replacing Vec<uint16_t> with Vec<int16_t>, which has all sorts of AVX optimization available. Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136): ```python import timeit for dtype in ('torch.bfloat16', 'torch.half'): for n, t in [(40_000, 600000), (400_000, 60000)]: print(f'a.fill_(10) for {t} times, a=torch.empty({n}, dtype={dtype})') print(timeit.timeit(f'a.fill_(10)', setup=f'import torch; a=torch.empty({n}, dtype={dtype})', number=t)) ``` Before: ``` a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16) 11.064065577999827 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16) 10.618151295000189 a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half) 10.989039544000207 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half) 10.602233665999847 ``` After: ``` a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.bfloat16) 1.530125006000162 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.bfloat16) 1.4807136570002513 a.fill_(10) for 600000 times, a=torch.empty(40000, dtype=torch.half) 1.3946152990001792 a.fill_(10) for 60000 times, a=torch.empty(400000, dtype=torch.half) 1.457788402999995 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28397 Differential Revision: D18125171 Pulled By: ezyang fbshipit-source-id: bfb2da13f10bc582e9848073e428af9e36656b13	2019-10-24 15:03:11 -07:00
Will Feng	7f9941c4ea	C++ nn::ZeroPad2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28540 Test Plan: Imported from OSS Differential Revision: D18115610 Pulled By: yf225 fbshipit-source-id: ced7c0917f4712838e753cd2e9fc4fa79fd5d310	2019-10-24 14:23:57 -07:00
Lara	d762ad09df	Enable Interpolate Tests for ONNX Opset 11 (#28560 ) Summary: - Enable tests for Interpolate in opset 11 for nearest and linear2d modes (linear1d/3d not implemented yet) - Fix bugs found after enabling tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/28560 Reviewed By: hl475 Differential Revision: D18110680 Pulled By: houseroad fbshipit-source-id: 7f8811e40dc5cedaba6389460dcca52daa048f5f	2019-10-24 14:21:13 -07:00
Rohan Varma	a783563738	Skip ProcessGroupNCCLTest if CUDA is not available (#28393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28393 We should skip this test if CUDA is not available and alert the user. Previously, if this test was ran on cpu it would fail with: ``` terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (3) : This binary is linked with CUDA lazy stubs and underlying .so files were not loaded. CUDA functionality is disabled. Set env variable CUDA_LAZY_DEBUG to get messages during startup ``` Test Plan: Build on CPU and verify that that are no errors when running, we should get the message: `CUDA not available, skipping test`. Previously, we would get an error: ``` terminate called after throwing an instance of 'std::runtime_error' what(): cuda runtime error (3) : This binary is linked with CUDA lazy stubs and underlying .so files were not loaded. CUDA functionality is disabled. Set env variable CUDA_LAZY_DEBUG to get messages during startup. at caffe2/aten/src/THC/THCGeneral.cpp:54 ``` Differential Revision: D18054369 fbshipit-source-id: f1d06af88b780a24ca3373a7a133047a2cfe366e	2019-10-24 14:02:09 -07:00
Pavel Belevich	46f96d1538	C++ API parity: at::Tensor::requires_grad_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26332 Test Plan: Imported from OSS Differential Revision: D17427575 Pulled By: pbelevich fbshipit-source-id: 5500169a4fa0ef9cc2a7272e13b6e2d89df09260	2019-10-24 13:24:18 -07:00
Jeremy Lilley	78039627ae	Minor followup on stringstream cleanups (#28300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28300 - Remove trivial stringstream from ScriptModuleSerializer::writeCode; I didn't include this in earlier changes to avoid a merge conflict with an earlier change. - Remove underscore from QualifiedName var ref; no difference in current use, but more correct. ghstack-source-id: 92206909 Test Plan: Benchmark: buck build mode/opt experimental/jeremyl/c2: Correctness: buck test mode/dev-nosan caffe2/test/... Differential Revision: D18012511 fbshipit-source-id: 7db057d77741cf69c4f2fed560771c3201da19ed	2019-10-24 13:05:46 -07:00
Will Feng	303527d733	C++ nn::ReplicationPad{1,2,3}d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28539 Test Plan: Imported from OSS Differential Revision: D18115609 Pulled By: yf225 fbshipit-source-id: 15f4ab6a114279bb06bf62f1265b62aa12f8700f	2019-10-24 12:49:41 -07:00
Will Feng	78375c02b8	C++ nn::ReflectionPad1d and nn::ReflectionPad2d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28538 Test Plan: Imported from OSS Differential Revision: D18115608 Pulled By: yf225 fbshipit-source-id: 3a48d8c11721f013076db2965f5f75b71662c78e	2019-10-24 12:02:51 -07:00
Jie	e263dd3853	(#24396 ) Summary: Initial kernel support added for optimized NHWC tensor. TODO: currently backwards kernel spits out tensor with NHWC stride. Unfortunately autograd restores grad to contiguous (in either copy or add). This makes real perf tuning annoying to do. (since I cannot easily measure end-to-end time in my python script) My current kernel is blazing fast comparing to the original NCHW kernel in fp16, since I avoided atomicAdd. I'll finish perf tuning after we merged some future PR expanding NHWC support in the core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24396 Differential Revision: D18115941 Pulled By: VitalyFedyunin fbshipit-source-id: 57b4922b7bf308430ffe1406681f68629baf8834	2019-10-24 11:57:15 -07:00
Igor Fedan	2020cc0cd1	Fix compute_non_overlapping_and_dense() (#28551 ) Summary: There are some cases when compute_non_overlapping_and_dense() doesn't work properly: Example: ``` Tensor t = at::tensor(1).expand({1, 3, 2}); EXPECT_FALSE(t.is_contiguous()); EXPECT_FALSE(t.is_non_overlapping_and_dense()); //FAIL!!! ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28551 Differential Revision: D18115570 Pulled By: ifedan fbshipit-source-id: 35b1a9473a28037d41f7177a8de23ffefa7faa13	2019-10-24 11:53:52 -07:00
Sebastian Messmer	8de8cab247	Migrate remaining ops to the c10 dispatcher (#27978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27978 - ghstack-source-id: 92469187 Test Plan: waitforsandcastle Differential Revision: D17929697 fbshipit-source-id: 01f4f67cd676c719d9d1fb13bdd43aca3dfa1c8a	2019-10-24 11:40:57 -07:00
Tristan Rice	d8c66c1576	autograd/profiler: make python record_function use JIT methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28264 Test Plan: buck test caffe2/test:autograd caffe2/test/cpp/jit:jit Reviewed By: bddppq Differential Revision: D17997612 fbshipit-source-id: 8a29ae50c28ce905f63c732fe0aa49edfc9d99e3	2019-10-24 10:28:32 -07:00
Dylan Bespalko	f8b758b141	CPU-Strided-Complex Support for reduce ops and linpack ops (#27653 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: [pytorch-cpu-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cpu-strided-complex) Changes so far: - [x] Renamed references to variable "I" that may be confused for "I" defined in complex.h. I did this to avoid crazy CI failures messages as complex.h is included by more source files. - aten/src/ATen/native/cpu/Loops.h (Renamed I to INDEX) - aten/src/ATen/native/cuda/Loops.cuh (Renamed I to INDEX) - aten/src/ATen/core/ivalue_inl.h (Renamed I to INDEX) - c10/util/Array.h (Renamed I to INDEX) - c10/util/C++17.h (Renamed I to INDEX) - c10/util/Metaprogramming.h (Renamed I to INDEX) - c10/util/SmallVector.h (custom renaming) - [x] Added complex support of Linear Algebra Ops. - SVD needed to be modified to support mixed data types - Example U(std::complex<double)), S(double), V(std::complex<double>) - See before and after benchmark below (No observable change in performance). - [x] Added complex support of Reduce Ops. - var/std computations could have been faster if it was possible to interpret std::complex<double> Tensor as a double Tensor. - [x] Added complex derivative support for autograd functionality. - derivatives are the same as defined by numpy autograd library for real(), imag(), conj(), angle(). These functions only affect complex numbers. - derivative of abs() has not been modified to not interfere with existing code. - Autograd defines abs() for complex numbers and fabs() for real numbers. I will look into this further down the road. ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks Before Changes ---------------------------------------- Tag : short Benchmarking PyTorch: svd Mode: Eager Name: svd_M512_N512 Input: M: 512, N: 512 Forward Execution Time (us) : 162339.425 Forward Execution Time (us) : 162517.479 Forward Execution Time (us) : 162847.775 ---------------------------------------- PyTorch/Caffe2 Operator Micro-benchmarks After Changes ---------------------------------------- Tag : short Benchmarking PyTorch: svd Mode: Eager Name: svd_M512_N512 Input: M: 512, N: 512 Forward Execution Time (us) : 162032.117 Forward Execution Time (us) : 161943.484 Forward Execution Time (us) : 162513.786 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27653 Differential Revision: D17907886 Pulled By: ezyang fbshipit-source-id: a88b6d0427591ec1fba09e97c880f535c5d0e513	2019-10-24 09:31:06 -07:00
Anjali Chourdia	136bb07a93	torch.histc added a finite range check to resolve segfaults if tensor has inf. also added checks for nan values, min>max (#27712 ) Summary: https://github.com/pytorch/pytorch/issues/27464 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27712 Differential Revision: D18064544 Pulled By: anjali411 fbshipit-source-id: c9c6d8eb4d55f2b5320409ba238bf44b0be8902e	2019-10-24 09:28:45 -07:00
Gregory Chanan	ae05e48fe8	Kill TH(C)Tensor_squeeze which isn't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28435 Test Plan: Imported from OSS Differential Revision: D18066779 Pulled By: gchanan fbshipit-source-id: b58180151a92999386085618ff00b56b993b41bb	2019-10-24 09:15:06 -07:00
Gregory Chanan	4f0a3504e1	Port is_set_to from TH/THC to ATen. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28425 Test Plan: Imported from OSS Differential Revision: D18063328 Pulled By: gchanan fbshipit-source-id: 86af01a630d88c30947b8c85d1fac86dd7b40585	2019-10-24 09:15:03 -07:00
Nathan Goldbaum	139fec2d14	remove type information from docstrings of quantization functions (#28556 ) Summary: Following from https://github.com/pytorch/pytorch/issues/28479 let's remove the type information from the docstrings of these functions as well, making them valid python signatures matching the other signatures in the docstrings for the torch API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28556 Differential Revision: D18115641 Pulled By: ezyang fbshipit-source-id: e4c3d56981b16f5acabe8be7bfbe6ae506972d7f	2019-10-24 08:13:48 -07:00
Pavel Belevich	dd277e9086	C++ API parity: Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27382 Test Plan: Imported from OSS Differential Revision: D17766735 Pulled By: pbelevich fbshipit-source-id: c7a66daeb17550eb9a5d26944427723d4ebdc6c8	2019-10-24 07:11:51 -07:00
Shihao Xu	59402f51cf	Make init_method url appending step re-usable by both init_process_group and init_model_parallel(init_rpc) (#28226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28226 # Goal Rendezvous step should be the first step not only for `init_process_group` but also for `init_model_parallel`. The road block is that there is special step in `init_process_group` where arguments `rank`, `world_size` passed to `init_process_group(..)` are appended to `init_method` url string. We need to make this argument appending step common and re-usable for both `init_process_group` and `init_model_parallel`. # Solution - Put argument appending inside of `rendezvous` function. - Remove manual `init_method` url construction. Delegate the responsibility to the `rendezvous` function. - Use the `rendezvous` function for any `RpcAgent`. Test Plan: ``` buck test mode/dev-nosan caffe2/test:c10d ``` ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_invalid_names buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_worker_id ``` ``` buck test mode/dev-nosan caffe2/torch/fb/distributed/pytorch/tests:test_rpc -- test_sync_rpc ``` ``` buck test mode/dev-nosan caffe2/torch/fb/rendezvous:zeus_test ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling -- test_single_trainer_multiple_pss ``` Differential Revision: D5524494 fbshipit-source-id: 50be58ec3c928621b0874b044ef4a1640534d8ef	2019-10-23 21:51:08 -07:00
Shen Li	e31adeb4f3	Make RRef::LocalValue return Future (#28025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28025 Add a PyFuture type which is wrapper of either an OwnerRRef or a jit::Future. The difference between PyFuture and jit::Future is that PyFuture can return an custom py::object type. Test Plan: Imported from OSS Differential Revision: D17936746 Pulled By: mrshenli fbshipit-source-id: a7451af3993d98aeab462ffd5318fc6d28f915c8	2019-10-23 17:07:16 -07:00
Shen Li	58873776ff	Make RRef::toHere() return a jit::Future (#27943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27943 This is step 1 to make PyRRef::toHere() non-blocking on caller. Test Plan: Imported from OSS Differential Revision: D17936747 Pulled By: mrshenli fbshipit-source-id: 7cf60e5804e72bdc28f0135fed4d7fdce05ea38a	2019-10-23 17:07:11 -07:00
svcscm	9c345473d8	Updating submodules Summary: GitHub commits: `8ac79dbfad` `f97c8b2a91` `686dbde63b` `6a32e3b562` `9e79c99421` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: e50b9ecdf91e98932bd82faa210c012cc8b9d48f	2019-10-23 16:52:47 -07:00
Michael Ranieri	61d40b80d3	static initialization order with mutex (#28243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28243 When building static libs version of pytorch 1.3 on windows (msvc v141), program crashes with bad memory reference because `fusion_backends_lock_` has not been initialized yet. Test Plan: sandcastle green, tested locally on MSVC static builds that this fixes initialization. Differential Revision: D17985919 fbshipit-source-id: ebd6178dedf5147d01c2c1754a0942a1bbbc7e34	2019-10-23 16:30:19 -07:00
Mingbo Wan	8008322336	workaround for raw string bug in VS2019 (#28349 ) Summary: reported the problem to microsoft [Developer Community](https://developercommunity.visualstudio.com/content/problem/782476/e-preprocess-to-stdout-cant-handle-raw-string-corr.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28349 Differential Revision: D18074620 Pulled By: mingbowan fbshipit-source-id: 89c2583a0301b1e3055b1f8cd9d493fdb2567b42	2019-10-23 16:30:15 -07:00
Tao Xu	896b5d9113	Scripts for setting up benchmark projects (#28469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28469 ### Summary As described [here](https://github.com/pytorch/pytorch/pull/28405), This PR is the second one that contains scripts for setting up the benchmark projects. ### Test Plan Don't break CI jobs unless they are flaky. Test Plan: Imported from OSS Differential Revision: D18097248 Pulled By: xta0 fbshipit-source-id: 6f9d1275a07aecae21afd81d5e90a89a75d0270f	2019-10-23 16:16:57 -07:00
なるみ	d83389d327	Ignore F401 in all __init__.py without putting noqa (#25823 ) Summary: By adding `per-file-ignores = __init__.py: F401` into `.flake8` with `flake8>=3.7`, we can ignore F410 in all `__init__.py` without putting `# noqa: F401` line by line. http://flake8.pycqa.org/en/latest/user/options.html?highlight=per-file-ignores#cmdoption-flake8-per-file-ignores Pull Request resolved: https://github.com/pytorch/pytorch/pull/25823 Differential Revision: D17252182 Pulled By: soumith fbshipit-source-id: 87b174075b79e4078953a7521bd1a8f82405646b	2019-10-23 15:28:13 -07:00
neginraoof	76d262d4b7	export group_norm (#27071 ) Summary: Updated group_norm symbolic Pull Request resolved: https://github.com/pytorch/pytorch/pull/27071 Reviewed By: hl475 Differential Revision: D17792249 Pulled By: houseroad fbshipit-source-id: 08be6071952ca2c256d2c6a0a6bbc19a8442f1fe	2019-10-23 15:14:31 -07:00
Robert	d081de67cf	fix the document of kaiming initialization (#25638 ) Summary: Based on https://github.com/pytorch/pytorch/issues/25549, I modified the comments for kaiming initialization in torch.nn.init.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/25638 Differential Revision: D17915392 Pulled By: vincentqb fbshipit-source-id: 40f60c65d14790696ec03d7d91c764875efd6cf1	2019-10-23 14:19:38 -07:00
Vincent Quenneville-Belair	cbddc77ac5	fix docs for lr (#28026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28026 Documentation for learning rate does not render well. #27730. Test Plan: Imported from OSS Differential Revision: D17953395 Pulled By: vincentqb fbshipit-source-id: 9e84df3e7de43f11399a67bc99c76ef241b1120f	2019-10-23 13:49:34 -07:00
Gregory Chanan	bee4aca259	is_set_to: unify TH/THC implmentation and genericize test_is_set_to. (#28422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28422 The TH implementation had two differences: 1) It explicitly checked for null storages; this isn't supported anymore so can be removed. 2) It collapsed all empty tensors to the same shape for the purpose of checking. This was introduced to keep BC when we introduced N-dimensional empty tensors, but since it's been quite a long time since we've had N-dimensional empty tensors and the CUDA implementation didn't support this, we should get rid of it. Test Plan: Imported from OSS Differential Revision: D18061916 Pulled By: gchanan fbshipit-source-id: 1a54cf9ea4fcb35b358a9ab57f84eff059ff1e7b	2019-10-23 13:46:52 -07:00
Thomas Viehmann	09ad464d68	Change activation modules in C++ from using Tensor& to Tensor (#28501 ) Summary: Sequential does not like modules added to it to take Tensor& (const Tensor& and Tensor are both OK). Functional and others use Tensor when they want to potentially change things in-place. This changes ReLU and friends to also do that. Unfortunately, this seems to be BC breaking on the ABI level. On the other hand, use of the module ReLU seems rare enough outside Sequential (in particular in C++ models, the standard seems to be to use torch::relu instead). is the BC breaking OK here? (yf225 or anyone else) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28501 Differential Revision: D18089978 Pulled By: yf225 fbshipit-source-id: ac9aba6dc2081117dece57cd8a15bafe14ec8f51	2019-10-23 13:42:22 -07:00
Timothy Man	1c53a74e26	Fixed behavior of div_factor parameter in optim.lr_scheduler.OneCycleLR (#28217 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28216 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28217 Differential Revision: D18070759 Pulled By: vincentqb fbshipit-source-id: ed032190c0e3eab834fc9a8f408b75b56f0f35ec	2019-10-23 13:39:05 -07:00
svcscm	76c70559c9	Updating submodules Summary: GitHub commits: `3d32597779` `2c45426e8b` `db7733cb24` `9df8a63117` `61a3c68470` `596b1a7c1c` `06751015b3` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 4a48ce9ed3124fc5a37e02c1eb3081a358bb1fb6	2019-10-23 12:42:01 -07:00
vishwakftw	657430e1f0	Return 0-numel empty tensor from symeig when eigenvectors=False (#28338 ) Summary: Changelog: - Changes the behavior of returning a zero tensor when eigenvectors=False, matching behavior of torch.eig Pull Request resolved: https://github.com/pytorch/pytorch/pull/28338 Test Plan: - test_symeig has been modified appropriately for this change Differential Revision: D18085280 Pulled By: ezyang fbshipit-source-id: 43129a96dd01743997157974100e5a7270742b46	2019-10-23 11:44:57 -07:00
Vincent Quenneville-Belair	e4f40bf3b2	Add multiplicative lr. (#27254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27254 `MultiplicativeLR` consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. Test Plan: Imported from OSS Differential Revision: D17728088 Pulled By: vincentqb fbshipit-source-id: 1c4a8e19a4f24c87b5efccda01630c8a970dc5c9	2019-10-23 11:38:45 -07:00
Vincent Quenneville-Belair	d1d2358d31	Correct math formatting for lr scheduler (#28467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28467 Correcting formatting error from #27874. Also making size of parenthesis more natural. ![Screen Shot 2019-10-22 at 5 38 22 PM](https://user-images.githubusercontent.com/3047868/67336492-76ddfa00-f4f3-11e9-9d79-70a0aa4f6d29.png) Closes #27874 Test Plan: Imported from OSS Differential Revision: D18076085 Pulled By: vincentqb fbshipit-source-id: cb7c52b347d6d11ea4a2d3c94d00a42f849c0a83	2019-10-23 11:11:25 -07:00
Nathan Goldbaum	9d767db493	remove extraneous type information from torch.matrix_rank documentation (#28479 ) Summary: The types don't appear in the docstrings for other functions in the `torch` namespace so I think this was included here because of a copy/paste error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28479 Differential Revision: D18086150 Pulled By: ezyang fbshipit-source-id: 2481bccba6df36b12779a330f8c43d4aea68495f	2019-10-23 11:08:30 -07:00
svcscm	e80e42cb2c	Updating submodules Summary: GitHub commits: `c535a02822` `cf7a2fb510` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: f74d6f4de7a2a4ffe6d9f3689a7e08a429e79ae7	2019-10-23 09:42:44 -07:00
Xinyi Zhang	2f16284231	change empty range tolorrance logging Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28489 Differential Revision: D18067322 fbshipit-source-id: 2096d1cce820f4ebe28db0045a2ddacc022e07da	2019-10-23 09:39:39 -07:00
cerberusdevops	e9336b04fc	Update Dockerfile Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28358 Differential Revision: D18041193 Pulled By: ngimel fbshipit-source-id: d96ce2a01af9c06bd831ddb85fe8807fabacb8a3	2019-10-23 09:29:55 -07:00
Will Feng	e28e38e851	Update C++ torch::nn parity table for LayerNorm (#28484 ) Summary: Since now we have merged https://github.com/pytorch/pytorch/pull/28032 (thanks anjali411!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28484 Differential Revision: D18085844 Pulled By: yf225 fbshipit-source-id: 4be972687addea8f57f48dfe9707837196593062	2019-10-23 09:25:41 -07:00
Jerry Zhang	e280f93e31	Prepack folding for conv2d (#27119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27119 att Test Plan: python test/test_jit.py 'TestJit.test_fold_prepack' Imported from OSS Differential Revision: D17717636 fbshipit-source-id: 97e9f8d927f7eacedf09f47b8ae1bf8216b8cad4	2019-10-23 09:03:14 -07:00
Andreas Koepf	be3808d3b1	Migrate `smooth_l1_loss` from the TH to Aten (CPU & CUDA) (#27962 ) Summary: This is a port of the TH `SmoothL1Criterion` to ATen using TensorIterator. The forward implementation has been placed in BinaryOpsKernel.cpp/.cu while the backward version was added to PointwiseOpsKernel.cpp/.cu. CPU performance has improved for both forward & backward path. With CUDA the performance of the forward pass has slightly degraded compared to the TH implementation (see benchmark results). ### Questions: 1. Is the storage location of the implementation ok (I followed https://github.com/pytorch/pytorch/pull/26529) or should we create a separate .cpp/.h file pair for each operator implementation (e.g. to keep things together)? 2. The GPU forward-pass now seems to take consistently longer than the old version. Any ideas what we could try to bring it on par with the old impl? ## WITH patch benchmark result: ``` CPU warmup 1000 took 0.00018124299822375178 CPU warmup 10000 took 0.00021713999740313739 CPU warmup 100000 took 0.0016273759974865243 CPU warmup TOTAL time 0.0020758909959113225 CPU forward 1000 took 6.229899736354128e-05 CPU forward 10000 took 0.00013340599980438128 CPU forward 100000 took 0.0008730469999136403 CPU forward 1000000 took 0.011010036003426649 CPU forward 10000000 took 0.11133221499767387 CPU forward 100000000 took 1.0425375220002024 CPU forward TOTAL time 1.1660894790038583 CPU for- & backward 1000 took 0.0002662249971763231 CPU for- & backward 10000 took 0.00023712700203759596 CPU for- & backward 100000 took 0.002531945996452123 CPU for- & backward 1000000 took 0.010394354998425115 CPU for- & backward 10000000 took 0.23814761800167616 CPU for- & backward 100000000 took 1.2651235049997922 CPU for- & backward TOTAL time 1.516897434994462 GPU warmup 1000 took 0.00020941899856552482 GPU warmup 10000 took 8.128300396492705e-05 GPU warmup 100000 took 8.551499922759831e-05 GPU warmup TOTAL time 0.0004199420000077225 GPU forward 1000 took 7.060499774524942e-05 GPU forward 10000 took 7.116600318113342e-05 GPU forward 100000 took 9.825800225371495e-05 GPU forward 1000000 took 0.000499356996442657 GPU forward 10000000 took 0.002032470001722686 GPU forward 100000000 took 0.018638986002770253 GPU forward TOTAL time 0.02148268099699635 GPU for- & backward 1000 took 0.00035967300209449604 GPU for- & backward 10000 took 0.00032710300001781434 GPU for- & backward 100000 took 0.0003689270015456714 GPU for- & backward 1000000 took 0.0007732619997113943 GPU for- & backward 10000000 took 0.02127284000016516 GPU for- & backward 100000000 took 0.2022330649997457 GPU for- & backward TOTAL time 0.2254496300010942 ``` ## WITHOUT patch benchmark result: ``` CPU warmup 1000 took 0.00011545199959073216 CPU warmup 10000 took 0.00016227000014623627 CPU warmup 100000 took 0.0013456509987008758 CPU warmup TOTAL time 0.001648657998885028 CPU forward 1000 took 2.627600042615086e-05 CPU forward 10000 took 0.00015939700097078457 CPU forward 100000 took 0.001139313004387077 CPU forward 1000000 took 0.013769682998827193 CPU forward 10000000 took 0.13163026500114938 CPU forward 100000000 took 1.321879123999679 CPU forward TOTAL time 1.4687001089987461 CPU for- & backward 1000 took 0.0002569290008977987 CPU for- & backward 10000 took 0.00033315900509478524 CPU for- & backward 100000 took 0.0016096779945655726 CPU for- & backward 1000000 took 0.014474845003860537 CPU for- & backward 10000000 took 0.1564881520025665 CPU for- & backward 100000000 took 1.5787935900007142 CPU for- & backward TOTAL time 1.7521004869995522 GPU warmup 1000 took 0.00025611399905756116 GPU warmup 10000 took 0.00014123699656920508 GPU warmup 100000 took 0.00012580600014189258 GPU warmup TOTAL time 0.0005591579974861816 GPU forward 1000 took 0.00031183200189843774 GPU forward 10000 took 0.00011483799607958645 GPU forward 100000 took 0.00010807999933604151 GPU forward 1000000 took 0.0007842139966669492 GPU forward 10000000 took 0.0017624700049054809 GPU forward 100000000 took 0.01519905700115487 GPU forward TOTAL time 0.018341148999752477 GPU for- & backward 1000 took 0.00047569099842803553 GPU for- & backward 10000 took 0.0003539700046530925 GPU for- & backward 100000 took 0.000808880002296064 GPU for- & backward 1000000 took 0.001639469999645371 GPU for- & backward 10000000 took 0.021154599002329633 GPU for- & backward 100000000 took 0.19268552300491137 GPU for- & backward TOTAL time 0.2172460189976846 ``` ### Code used for perforrmance testing ``` import torch import torch.nn.functional as F import torch.nn as nn from timeit import default_timer torch.manual_seed(0) cpu = torch.device('cpu') gpu = torch.device('cuda') loss_fn = F.smooth_l1_loss def run_benchmark(name, depth, require_grad, device, fn): total_start = default_timer() y = None a = None for i in range(3, 3 + depth): start = default_timer() n = 10 ** i a = torch.rand(n, requires_grad=require_grad, device=device) b = torch.rand(n, device=device) y = fn(a, b) y.cpu() # get result (potentially wait for gpu) if a.grad is not None: a.grad.cpu() end = default_timer() print('{} {} took {}'.format(name, n, end-start)) total_end = default_timer() print('{} TOTAL time {}'.format(name, total_end-total_start)) def fwd_only(a, b): out = loss_fn(a, b) return out def fwd_bck(a, b): out = loss_fn(a, b) out.backward() return out def sanity_check(name, device): print('{} Operator sanity check:'.format(name)) a = torch.randn(16, requires_grad=True, device=device) b = torch.randn(16, device=device) * 2 out = loss_fn(a, b) print('out', out) out.backward() print(a.grad) print('double backward') loss = loss_fn(a, b) loss2 = torch.autograd.grad(loss, a, create_graph=True) z = loss2[0].sum() print(z) z.backward() print('ok') print() print('PyTorch version:', torch.__version__) sanity_check('CPU', cpu) if torch.cuda.is_available(): sanity_check('GPU', gpu) print() run_benchmark('CPU warmup', 3, False, cpu, fwd_only) run_benchmark('CPU forward', 6, False, cpu, fwd_only) run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck) print() if torch.cuda.is_available(): run_benchmark('GPU warmup', 3, False, gpu, fwd_only) run_benchmark('GPU forward', 6, False, gpu, fwd_only) run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27962 Differential Revision: D18061942 Pulled By: ezyang fbshipit-source-id: 0d1fc528b59d47d4773b03240c3368db021cb9db	2019-10-23 07:56:57 -07:00
Sebastian Messmer	ee920b92c4	Move complex extension test to c10 (#28208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28208 Backend extensions should call torch::RegisterOperators, not globalATenDispatch(). If the op is still on globalATenDispatch, then torch::RegisterOperators will do the right thing and forward it to globalATenDispatch. ghstack-source-id: 92436988 Test Plan: waitforsandcastle Differential Revision: D17975369 fbshipit-source-id: 0d4bd5e4e5b86e6dcfba527a7d11c25508896ac1	2019-10-23 01:33:47 -07:00
Sebastian Messmer	0f556b62e0	Fix codegen for out operators (#28184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28184 Out overloads of operators have a different `name` and `operator_name`. Fix the codegen for them. ghstack-source-id: 92436987 Test Plan: A diff stacked on top enables `use_c10_dispatcher` for out operators. Doesn't work without but works with this diff. Differential Revision: D17969013 fbshipit-source-id: 7b1118c9a4a36997e7375fac8d870ff08e7ff453	2019-10-23 01:33:43 -07:00
Sebastian Messmer	b47d658d04	Allow migrating factory methods to c10 (#28183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28183 - ghstack-source-id: 92436986 Test Plan: waitforsandcastle Differential Revision: D17969015 fbshipit-source-id: 0e2eac09c9622fc6c6e90bb80d2a250f37bbd148	2019-10-23 01:33:39 -07:00
Sebastian Messmer	005d6ea495	Fix overload names (#28182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28182 They haven't been unique. Fixing it... ghstack-source-id: 92436985 Test Plan: waitforsandcastle Differential Revision: D17969010 fbshipit-source-id: 1aacbfb3c18a75ca6743b03cc2eea5fc4d3685c9	2019-10-23 01:33:35 -07:00
Sebastian Messmer	a94bf1d326	Add unsupported types to schema type parser (#28181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28181 These types are needed to parse the schemas from native_functions.yaml. Note: This doesn't actually add the functionality to JIT, it only makes the parser pass. ghstack-source-id: 92436989 Test Plan: waitforsandcastle Differential Revision: D17969014 fbshipit-source-id: 41ebe256baec81ed8fb165e7b7cffa5160d285c3	2019-10-23 01:33:31 -07:00
svcscm	b05d0fa671	Updating submodules Summary: GitHub commits: `4341008007` `d3174ece89` `88db55e055` `d32d4344ec` `00dfc2c82e` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 5d1b8300a428c65bc35f222ae19c656585ba897b	2019-10-22 23:32:58 -07:00
Wanchao Liang	4beaf1cf1c	add typing runtime dependency for py2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28442 Test Plan: Imported from OSS Differential Revision: D18075498 fbshipit-source-id: 075f63b1ed2c83d9a64eb81224e0d67c6a63b22c	2019-10-22 22:02:08 -07:00
Sebastian Messmer	0d4009d777	Fix avx for c++14 (#28207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28207 Enabling c++14 causes these lines to fail with the error "error: the last argument must be an 8-bit immediate". So let's make them an 8 bit immediate before we enable C++14. ghstack-source-id: 92419812 Test Plan: Enabling C++14 before this PR shows the error, after this PR does not. Differential Revision: D17975236 fbshipit-source-id: aa53cdb2d38d89ede2212ed7374fedeb5896f254	2019-10-22 21:49:07 -07:00
Joseph Spisak	0c4878d550	Update index.rst	2019-10-22 21:43:58 -07:00
neginraoof	d2eb08d17b	Fix tracing slice/select with dynamic inputs (#26549 ) Summary: Fix Slice/Select trace arguments. This PR stashes arguments to functions in order to avoid tracing them as constants. This PR depends on a fix for select op in PR: https://github.com/pytorch/pytorch/pull/25273 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26549 Reviewed By: hl475 Differential Revision: D17623851 Pulled By: houseroad fbshipit-source-id: ae314004266688d2c25c5bada2dcedbfc4f39c5b	2019-10-22 17:09:40 -07:00
Jason Fried	9705d60a2f	get rid of deprecated thread.isAlive() to use py2.6 modern form is_alive() Summary: Codemod to remove all thread.isAlive() since it throws a warning that is breaking some tests that monitor the output of their cli's is_alive() was added in python 2.6 this is super safe This is a codemod I don't care if the code supports python3, just that its python code Test Plan: unittests Reviewed By: cooperlees Differential Revision: D18069520 fbshipit-source-id: 4ca4dcb541c0b0debeb194aba5d060152ad0ef0e	2019-10-22 15:37:31 -07:00
Xingying Cheng	177c95e9bc	Migrate return type void to () for native functions. (#28290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28290 ghstack-source-id: 92368250 Test Plan: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28290 ghstack-source-id: 92368250 Differential Revision: D17565528 fbshipit-source-id: f4870bb9ee4f4e7c48df4d68508b512d25ed277c	2019-10-22 15:23:20 -07:00
Bram Wasti	f94b6cef43	Use FunctionSchema instead of char* for dispatch (#28295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28295 Previous PR was landed in a broken state Test Plan: Imported from OSS Differential Revision: D18066217 Pulled By: bwasti fbshipit-source-id: 665de7b28145885d6b01f5f212897ac3f8f6270f	2019-10-22 14:38:43 -07:00
Supriya Rao	2cc0f1bbc6	Run pytorch mobile benchmark in PEP (#28437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28437 Add target to build speed_benchmark_torch for PEP. Added a new argument `--report_pep` to print total runtime information for PEP. Can add per-op stats under this later. Test Plan: https://our.intern.facebook.com/intern/aibench/details/664440309179004 Reviewed By: hl475 Differential Revision: D18062059 fbshipit-source-id: ca80e980ce8e48604782a15ac44dd8d403832817	2019-10-22 14:21:49 -07:00
Jiakai Liu	5f1563296b	remove AutoNonVariableTypeMode from jit-op-registry (#28402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28402 Revert PR #27274 as it's absorbed by PR #28398. Test Plan: - make sure all mobile models can load and run Differential Revision: D18055993 Pulled By: ljk53 fbshipit-source-id: 0d0ffdf2cfae18577189d3b69de15fa892210916	2019-10-22 14:08:58 -07:00
Jiakai Liu	d0d8b8c31c	change detach() & detach_() to no-op for USE_STATIC_DISPATCH mode (#28400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28400 This is yet-another fix to issue #26764. Some mobile models call tensor.detach() which won't work with static-dispatch mode. We disable autograd for static-dispatch / mobile build anyway so it seems fine to make these op-ops. Test Plan: - With stacked PRs, confirmed it can run failed models now. Differential Revision: D18055852 Pulled By: ljk53 fbshipit-source-id: bff3a55fee2ca68ac5333fb4978c11fd18dfcc91	2019-10-22 14:08:54 -07:00
Jiakai Liu	04bfc213ab	remove AutoNonVariableTypeMode guard around forward() call (#28399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28399 This is also to address issue #26764 Turns out it's incorrect to wrap the entire forward() call with NonVariableTypeMode guard as some JIT passes has is_variable() check and can be triggered within forward() call, e.g.: jit/passes/constant_propagation.cpp Since now we are toggling NonVariableTypeMode per method/op call, we can remove the guard around forward() now. Test Plan: - With stacked PRs, verified it can load and run previously failed models. Differential Revision: D18055850 Pulled By: ljk53 fbshipit-source-id: 3074d0ed3c6e05dbfceef6959874e5916aea316c	2019-10-22 14:08:49 -07:00
Jiakai Liu	38433e33a1	Make static dispatch turn off variable before entering the kernel (#28398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28398 Redo PR #26908 for issue #26764 Test Plan: - make sure quantized mobilenetv2 no longer suffers from perf regression Differential Revision: D18055851 Pulled By: ljk53 fbshipit-source-id: d533bc8979b1d2892adfb39924678a3f9b591855	2019-10-22 14:08:45 -07:00
Hong Xu	a5354adb08	Eliminate the use of CUDA_HOME in setup.py. (#28373 ) Summary: Variables read from CMakeCache.txt are more reliable. Close https://github.com/pytorch/pytorch/issues/28365 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28373 Differential Revision: D18061855 Pulled By: ezyang fbshipit-source-id: c550a365e23464411d75eca167f7e6e053f94872	2019-10-22 14:04:54 -07:00
Hong Xu	30712f6e30	Move the CUDA implementation of sqrt to ATen. (#27372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27372 Fix #24638 Test Plan: Imported from OSS Differential Revision: D18037944 Pulled By: VitalyFedyunin fbshipit-source-id: d3dbbc167954c7bbee25be13b5b669433bca6ee5	2019-10-22 14:01:07 -07:00
Hong Xu	19aeb472aa	Move the CUDA implementation of log1p to ATen. (#26923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26923 Fix #24588 Test Plan: Imported from OSS Differential Revision: D17984184 Pulled By: VitalyFedyunin fbshipit-source-id: 3bc2be4f08e800b1de274940f2bd3d5b418b45ee	2019-10-22 14:00:59 -07:00
Negin Raoof	4f70b5a4de	Export det (#26958 ) Summary: Added symbolic to export det in opset 11 Updating ONNX submodule is required for det export Pull Request resolved: https://github.com/pytorch/pytorch/pull/26958 Reviewed By: hl475 Differential Revision: D17844887 Pulled By: houseroad fbshipit-source-id: 224ae3ff82939dc7ae8584c5a30a31fe6afa05f6	2019-10-22 13:33:15 -07:00
Lara Haidar	456d9a0dbe	Enable Scatter/Gather ORT Test for opset 11 (#27876 ) Summary: Enable ONNX Runtime Test for scatter in opset 11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27876 Reviewed By: hl475 Differential Revision: D18063347 Pulled By: houseroad fbshipit-source-id: f26104770b9c0d0dfe6a4111189436bea13e9460	2019-10-22 13:27:00 -07:00
Will Feng	2a2cdc8aeb	Revert D18001407: Port of multilabel_margin_loss from TH to ATen (CPU) Test Plan: revert-hammer Differential Revision: D18001407 Original commit changeset: 68cbd9ce0aac fbshipit-source-id: b43a83bfa087ea017b2b8bd09050c78c725ecd9e	2019-10-22 13:26:56 -07:00
Tao Xu	636fbcdd0a	add benchmark code to iOS TestApp (#28405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28405 ### Summary As discussed with AshkanAliabadi and ljk53, the iOS TestApp will share the same benchmark code with Android's speed_benchmark_torch.cpp. This PR is the first part which contains the Objective-C++ code. The second PR will include the scripts to setup and run the benchmark project. The third PR will include scripts that can automate the whole "build - test - install" process. There are many ways to run the benchmark project. The easiest way is to use cocoapods. Simply run `pod install`. However, that will pull the 1.3 binary which is not what we want, but we can still use this approach to test the benchmark code. The second PR will contain scripts to run custom builds that we can tweak. ### Test Plan - Don't break any existing CI jobs (except for those flaky ones) Test Plan: Imported from OSS Differential Revision: D18064187 Pulled By: xta0 fbshipit-source-id: 4cfbb83c045803d8b24bf6d2c110a55871d22962	2019-10-22 12:52:30 -07:00
Anjali Chourdia	7b59174882	torch::nn::LayerNorm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28032 Differential Revision: D18047371 Pulled By: anjali411 fbshipit-source-id: fb61aea52d6622a67ec1d84950e17e85686461ae	2019-10-22 12:50:22 -07:00
Brian Vaughan	3fce612cb1	preserve original tensoriterator behavior when not explicitly promoting (#28231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28231 Fix: https://github.com/pytorch/pytorch/issues/28010 A mixed-type index assignment that would have been an error in 1.2 was unintentionally made possible (with incorrect results) in 1.3. This PR restores the original behavior. This is BC-breaking because: ``` a = torch.ones(5, 2, dtype=torch.double) b = torch.zeros(5, dtype=torch.int) a[:, [1]] = b.unsqueeze(-1) ``` now raises an error (as in 1.2) whereas it did not in 1.3. Test Plan: Imported from OSS Differential Revision: D18049637 Pulled By: nairbv fbshipit-source-id: 11a37dc98364ae70aac0e9dbc090d2a500aa7ccc	2019-10-22 10:38:27 -07:00
Zachary DeVito	6d689e27c7	clean up NamedTuple creation API (#28189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28189 This makes it a separate createNamed function. The existing API resulted in poor usage in fbcode, which in turn caused bugs in TorchScript programs. Test Plan: Imported from OSS Differential Revision: D17970220 Pulled By: zdevito fbshipit-source-id: 59b082a726f56bec1c8d10d410db829f4aa271ea	2019-10-22 10:18:07 -07:00
Peter Bell	03d24dba6c	Fix static linking cuDNN without static CUDA (#28378 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/27887#issuecomment-544649765 The logs show that `USE_STATIC_CUDNN` is used but not `CAFFE2_STATIC_LINK_CUDA`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28378 Differential Revision: D18061841 Pulled By: ezyang fbshipit-source-id: 3b9b49953094e02f808ff12107ba4226688d9986	2019-10-22 10:08:09 -07:00
Andreas Koepf	682da8eb43	Port of multilabel_margin_loss from TH to ATen (CPU) (#28205 ) Summary: This is a port of the CPU version of TH MultiLabelMarginCriterion to ATen. Benchmark results ([source of script used](https://gist.github.com/andreaskoepf/ce96eedb09e9480ae2263d31822ef26e)): Slightly slower forward (probably acceptable), slightly faster forward & backward combination. ### WITH patch: ``` CPU forward 1000 took 0.0002544010058045387 CPU forward 10000 took 0.0022866200015414506 CPU forward 100000 took 0.02240650000749156 CPU forward 1000000 took 0.22985397902084514 CPU forward 10000000 took 2.227811124001164 CPU forward TOTAL time 4.282580643019173 CPU for- & backward 1000 took 0.0006969539972487837 CPU for- & backward 10000 took 0.004804529016837478 CPU for- & backward 100000 took 0.07736711099278182 CPU for- & backward 1000000 took 0.5985556179948617 CPU for- & backward 10000000 took 4.761040163983125 CPU for- & backward TOTAL time 7.318476865999401 ``` ### WITHOUT patch: ``` CPU forward 1000 took 0.00026982801500707865 CPU forward 10000 took 0.002569925010902807 CPU forward 100000 took 0.024335263995453715 CPU forward 1000000 took 0.2151200629887171 CPU forward 10000000 took 2.114590842014877 CPU forward TOTAL time 4.184845258976566 CPU for- & backward 1000 took 0.0007158009975682944 CPU for- & backward 10000 took 0.005468863993883133 CPU for- & backward 100000 took 0.05931608600076288 CPU for- & backward 1000000 took 0.5732014369859826 CPU for- & backward 10000000 took 5.2500802429858595 CPU for- & backward TOTAL time 7.7646528169861995 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28205 Differential Revision: D18001407 Pulled By: ezyang fbshipit-source-id: 68cbd9ce0aacf99dd8c44fb4da9c09b3ffc1e59a	2019-10-22 09:37:59 -07:00
Will Feng	c1bb2676f3	Update C++ torch::nn parity table (#28419 ) Summary: This PR updates `test/cpp_api_parity/parity-tracker.md` to reflect changes in https://github.com/pytorch/pytorch/issues/25883. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28419 Differential Revision: D18061479 Pulled By: yf225 fbshipit-source-id: dbdc2e44e835f6125a42cf11e59723ef61903cff	2019-10-22 09:34:10 -07:00
svcscm	30d6cf7bc1	Updating submodules Summary: GitHub commits: `913ad446c7` Test Plan: n/a Reviewed By: cdelahousse fbshipit-source-id: 3eba9fdc3c588489b516e6f87bee4954f4295da6	2019-10-22 09:14:24 -07:00
iurii zdebskyi	5e73e1fff8	Enabled torch.unique for bool tensors (#28374 ) Summary: Enabled torch.unique for bool tensors. Tested via unit tests [issue](https://github.com/pytorch/pytorch/issues/27691) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28374 Differential Revision: D18043413 Pulled By: izdeby fbshipit-source-id: 295ff03b9b61d33bbd2e05e6211c4f35a0ee23ea	2019-10-22 09:09:46 -07:00
Will Feng	373e9096c2	Revert D18012804: Use FunctionSchema instead of char* for dispatch Test Plan: revert-hammer Differential Revision: D18012804 Original commit changeset: 9b6acdeb0656 fbshipit-source-id: ca2c89c87dc3757083bae8466e6c9ab17266f07f	2019-10-22 08:18:18 -07:00
Alexander Melnikov	73c1030328	Support logging tensorboard embedding visualizations to generic filesystem (#27716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27716 This uses the gfile filesystem abstraction that allows for writing to any filesystem that satisfies the interface (including S3). Test Plan: Tested with local files and using internal S3 equivalent. Reviewed By: natalialunova Differential Revision: D17530694 fbshipit-source-id: c1f88c035fc03d91186b39092e42489f1c03d2cd	2019-10-22 08:12:25 -07:00
Shen Li	95650b152a	remove deprecated torch.Tensor in test_distributed.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28316 Test Plan: Imported from OSS Differential Revision: D18019147 Pulled By: mrshenli fbshipit-source-id: eb0fb08031d810ea85fb6ea54b1b25791178131b	2019-10-22 07:47:36 -07:00
Shen Li	db298732c1	remove deprecated torch.Tensor in test_c10d.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28315 Test Plan: Imported from OSS Differential Revision: D18019148 Pulled By: mrshenli fbshipit-source-id: 9aff891c6df0b1cfa5ff01e7551973a16d512909	2019-10-22 07:47:33 -07:00
Will Feng	079b3cc02c	Add C++ nn::functional pad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26601 Test Plan: Imported from OSS Differential Revision: D17517468 Pulled By: yf225 fbshipit-source-id: 9ee8b93b88a60f91f2ae78c242f9eaa246b3293c	2019-10-21 22:20:38 -07:00
Raghuraman Krishnamoorthi	94757e035d	Do not insert observers for empty sequential modules (#28384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28384 ghstack-source-id: 92340259 Test Plan: buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train $test_quantization\.FusionTest$' --print-passing-details buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval $test_quantization\.FusionTest$' --print-passing-details Differential Revision: D18047293 fbshipit-source-id: 7e18b1aa76cc0fd26e8ee48a70c3a45688e73549	2019-10-21 20:32:13 -07:00
Tao Xu	d403410e0d	Fastlane update (#28356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28356 ### Summary I'm working on setting up a benchmark test project for iOS, which will reuse this Fastlane file. This PR removes the "cert install" code from "before_all" to a standalone lane target. ### Test Plan - don't break any existing CI jobs Test Plan: Imported from OSS Differential Revision: D18053675 Pulled By: xta0 fbshipit-source-id: e4760a8494916c410af19ca43f040fc463551d11	2019-10-21 19:31:55 -07:00
Zafar Takhirov	783c9c8445	Adding docstring to the observers (#27791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27791 This is the first part of the change. The next ones will amend more :) Test Plan: Imported from OSS Differential Revision: D17889913 Pulled By: z-a-f fbshipit-source-id: ff74007903dd789d4c68684e83b50c0c86a25149	2019-10-21 19:09:50 -07:00
Shen Li	0ddb50010e	enable test_invalid_names test in rpc_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28376 Test Plan: Imported from OSS Differential Revision: D18045158 Pulled By: mrshenli fbshipit-source-id: 42821ef40afbdff8662abacd447e307ccf4853d3	2019-10-21 18:43:37 -07:00
Bram Wasti	d9bca33d2c	Use FunctionSchema instead of char* for dispatch (#28295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28295 Previous PR was landed in a broken state Test Plan: Imported from OSS Differential Revision: D18012804 Pulled By: bwasti fbshipit-source-id: 9b6acdeb0656d2d7911b0ed63f4d47ecca5473b9	2019-10-21 18:24:52 -07:00
Pritam Damania	6335d91c38	Disable tsan for test_c10d multiprocess test cases. (#28385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28385 TSAN doesn't work with multiprocessing with fork() since we end up forking in a multithreaded environment which is dangerous. As a result, I'm disabling TSAN in this change. Similar to https://github.com/pytorch/pytorch/pull/27410 and https://github.com/pytorch/pytorch/pull/25005 ghstack-source-id: 92319347 Test Plan: waitforbuildbot Differential Revision: D18047778 fbshipit-source-id: 6c4e251639f74f4c772bd09bc6f2dfa83cf18fad	2019-10-21 18:14:38 -07:00
Jiyan Yang	07a181da1d	Add more logging in net modifier Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28327 Test Plan: Failed as expected and the full protobuf is logged f145060005 Reviewed By: ffjiang, wx1988 Differential Revision: D17975560 fbshipit-source-id: 5375acffc1f9dede16622b06eb58b6c3a26ebe5a	2019-10-21 17:53:00 -07:00
Michael Suo	4e033b0040	split TestLogging, TestDict, TestList Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28038 Test Plan: Imported from OSS Differential Revision: D17954441 Pulled By: suo fbshipit-source-id: 4703fb577adea3aa00fabb13c577b055e9ab4d7c	2019-10-21 17:15:15 -07:00
Michael Suo	f36497e687	split test_type_sharing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28037 Test Plan: Imported from OSS Differential Revision: D17954442 Pulled By: suo fbshipit-source-id: 6edee4d7dee0e52b58e71d3b520c0503fb7bd0ed	2019-10-21 17:15:11 -07:00
Michael Suo	0a364108d2	use base sha in clang-tidy instead of base ref (#28388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28388 The clang-tidy script diffs the PR head ref against the base ref so that it works only on changed lines. If the base ref is a stale `master`, then the script will fetch upstream `master` and potentially report unrelated changes in the diff Use the base sha instead of ref so that the revision that the script diffs against is stable. Test Plan: Imported from OSS Differential Revision: D18051363 Pulled By: suo fbshipit-source-id: 80ead2f837e2d6244245ed7b576e84a99f0ea035	2019-10-21 17:07:57 -07:00
Xinyi Zhang	06bb74ce96	Tolerate small amount of embedding corruptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28371 Reviewed By: xianjiec Differential Revision: D18031155 fbshipit-source-id: a51d2a62a919f032dc04372b30cf9071aa2dd629	2019-10-21 16:23:25 -07:00
Sebastian Messmer	70e9ef518f	c10::string_view (#26616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26616 Implement C++17 std::string_view for C++11. This is useful for compile time type name retrievaly which I'm going to stack on top of this. It is also useful to replace `const std::string&` with throughout our codebase. ghstack-source-id: 92100314 Test Plan: unit tests Differential Revision: D17518992 fbshipit-source-id: 48e31c677d51b0041f4b37e89a92bd176d4a0b08	2019-10-21 16:10:40 -07:00
nuka137	9ea42f8d7c	C++ API: torch::nn::LPPool1d (#27800 ) Summary: Add torch::nn::LPPool1d module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27800 Differential Revision: D18045040 Pulled By: yf225 fbshipit-source-id: e61fefe9efec3423f7a93dd1e946f3e380122927	2019-10-21 15:33:51 -07:00
Edward Yang	a3902c901a	Revert "Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 )" (#28310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28310 This reverts commit 3d3bff5ff1bc277306d15a3caa96c2a6fdb924bb. Test Plan: Imported from OSS Differential Revision: D18042859 Pulled By: ezyang fbshipit-source-id: cded781dda6fcc04199af6abd07ac09fdc0405de	2019-10-21 14:45:17 -07:00
Sameer Deshmukh	ba59d720cd	Change error message for torch.linspace(). (#28274 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/25810 Basically moves the error checking from the device-specific function to the native function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28274 Differential Revision: D18032189 Pulled By: ezyang fbshipit-source-id: 9072b5980aa2057274e79bc7241db853bfc36f11	2019-10-21 13:03:02 -07:00
Igor Fedan	bc57967e07	max_pool2d cuda should have channel last optimized kernels[Performance improvement] (#24872 ) Summary: max_pool2d_with_indices_cuda and max_pool2d_with_indices_backward_cuda should have channel last optimized kernels(https://github.com/pytorch/pytorch/issues/23815) Pull Request resolved: https://github.com/pytorch/pytorch/pull/24872 Differential Revision: D16964577 Pulled By: ifedan fbshipit-source-id: 296dfef8e511a7ae2ed423e34e902d5401b3becb	2019-10-21 11:28:12 -07:00
Lingyi Liu	4d9c017dee	Fix the padding issue of quantized average pool operator (#28260 ) Summary: This is actually a bug in both testing and the average pool implementation. In testing, we used the quantized value as float input and failed to padding the value with zero_point. In op implementation, the size for averaging is not correct for padding case when count_include_pad is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28260 Differential Revision: D18039960 Pulled By: lly-zero-one fbshipit-source-id: 7b5d34498b60f5d574a276a22798c9f576944734	2019-10-21 11:06:31 -07:00
Rohan Varma	d9b4788e5d	cleanup dist autograd context on other nodes when it is released on one node (#27951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27951 we want to clean up the distributed autograd context across the other nodes when a single node is done (here done means exited the context manager `with dist_autograd.context() as context_id: ...`). This PR does a few things to implement the above: 1) Add classes to encapsulate messages for requesting this context release and the response 2) Handling of this request in `request_callback_impl.cpp`. When we receive this request, we get the context from a given context_id and release it. 3) RPC call in `DistAutogradContainer::releaseContext` to send this command. This currently does not wait for an ack or implement any sort of retrying. We send the RPC to all the workerIds we have come into contact with (implemented in https://github.com/pytorch/pytorch/pull/26324) 4) Relevant unit tests In follow up PRs, we will add error checking + retries for this call. ghstack-source-id: 92269279 Test Plan: Added/modified unit tests in `test/dist_autograd_test.py` Differential Revision: D17920137 fbshipit-source-id: 7403512ab5fcbc28d21c548b2e45319dd472e26a	2019-10-21 07:34:08 -07:00
svcscm	f6c0a89acc	Updating submodules Summary: GitHub commits: `c8a45b6945` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ad3b35d4ac5a168a316ee60450d1e825760e1433	2019-10-20 18:55:40 -07:00
svcscm	e8165f4b00	Updating submodules Summary: GitHub commits: `c2ee2f1935` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: b8c9176484ed9670583574e9465b5517cef1b71b	2019-10-20 18:55:36 -07:00
svcscm	6301d62e0b	Updating submodules Summary: GitHub commits: `a797bf1e3d` `963d2bf4c4` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: f17da4f8eb54b7f317714139770ccd08fdb4dab6	2019-10-20 11:48:28 -07:00
Supriya Rao	15be189f0d	Add quantized torch mean implementation (#27675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27675 This leverages QNNPACK global average pooling to perform torch.mean on input feature maps Currently can only support mean along HxW plane in NCHW tensor. Test Plan: python test/test_quantized.py TestQuantizedOps.test_mean Imported from OSS Differential Revision: D17989336 fbshipit-source-id: 8d4cbcbed5f146290b1580d26e5b45359d293761	2019-10-19 19:20:59 -07:00
Jiang Wu	29f56eb920	Revert D17937850: Tolerate small amount of embedding corruptions Test Plan: revert-hammer Differential Revision: D17937850 Original commit changeset: e9c633768d98 fbshipit-source-id: 5c2c837c7867504392b19965d91a60cadd3b8101	2019-10-19 14:17:01 -07:00
Yanli Zhao	56eb4f7daa	Add autograd hook for python rpc call (#28312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28312 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) ghstack-source-id: 92240367 Test Plan: unit tests Differential Revision: D18017554 fbshipit-source-id: dbe79a5171063901a78a9b3322b9b31c159d098d	2019-10-19 07:38:14 -07:00
Jeremy Lilley	6fcefc917e	Minor tweaks to rpc message api (#28326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28326 - Message::type() should return a MessageType, not const MessageType&, since MessageType is just an enum. - Add moveTensors() method for parallelism with movePayload(). ghstack-source-id: 92236443 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D18021692 fbshipit-source-id: 5b2f5806f104a221de8df0282f3e395d15e5bfe4	2019-10-18 23:18:26 -07:00
Pritam Damania	99271ad411	Split out data_parallel tests from test_nn.py into a separate (#28297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28297 Splitting data parallel tests out of test_nn.py since its easier to manage and track these tests separately and failures can be routed to appropriate POCs. Test Plan: waitforbuildbot Differential Revision: D18011663 fbshipit-source-id: 17ebf7c04e7dc7ff4c8d38458daab5b911bed75d	2019-10-18 17:48:40 -07:00
Will Feng	eb4bb00a9c	Use c10::variant-based enums for Nonlinearity and FanMode Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27933 Test Plan: Imported from OSS Differential Revision: D18009044 Pulled By: yf225 fbshipit-source-id: e88229ee30badf7a699f62af61d1e88debc0dc7d	2019-10-18 17:48:34 -07:00
Carlos Miranda	a1e14a6626	PixelShuffle module and functional (#28140 ) Summary: Added `PixelShuffle` module and functional https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28140 Differential Revision: D18008474 Pulled By: yf225 fbshipit-source-id: f482495bb56998701c79a61ef065a121bf5a5154	2019-10-18 15:54:14 -07:00
Xinyi Zhang	ca6ba06f95	Tolerate small amount of embedding corruptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28299 Reviewed By: Wakeupbuddy Differential Revision: D17937850 fbshipit-source-id: e9c633768d9819fd734ddd59017c33688ebbdcca	2019-10-18 14:59:06 -07:00
Hong Xu	9cb003a94f	Add typing check of alpha for torch.sub and code clean up. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28298 Differential Revision: D18017923 Pulled By: nairbv fbshipit-source-id: 2c4b3f96eb005dfb70e1b7ff87d28eb79b9300dd	2019-10-18 14:49:42 -07:00
Hong Xu	b4db590e3b	Fix type promotion of complex32 and complex32 (#27929 ) Summary: torch.promote_types(torch.complex32, torch.complex32) reports RuntimeError. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27929 Differential Revision: D18013017 Pulled By: nairbv fbshipit-source-id: 14de1adb7e81694d0f1463b11f8d4c284b25502b	2019-10-18 14:45:25 -07:00
Mikhail Zolotukhin	0aa694ebe5	Move Method::lowered_graph to a separate pass out of the Method class. (#28242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28242 There is no reason to have it in a general API of Module/Method - it's just another graph pass. It was there because some time ago modules were not first class and all graphs were lowered. After that changed, this API was added for easier transition, but now we don't need it anymore. Test Plan: Imported from OSS Differential Revision: D17986724 Pulled By: ZolotukhinM fbshipit-source-id: 279a1ec450cd8fac8164ee581515b09f1d755630	2019-10-18 12:48:40 -07:00
Jessica Lin	c813503f05	Update hyperlink syntax for XLA, torchaudio, torchtext, and C++ API (#28019 ) Summary: Tested locally. Should render as such: ![image](https://user-images.githubusercontent.com/8042156/66861657-4373fc00-ef44-11e9-8a5b-52abc3ddcd51.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28019 Differential Revision: D18012303 Pulled By: brianjo fbshipit-source-id: 4b3bd9f63f5d94d474ab13bb06220a112185e924	2019-10-18 12:15:17 -07:00
Yanli Zhao	af88537483	Back out "Add autograd hook for python rpc call" Summary: Original commit changeset: 070324c57312 Test Plan: revert Reviewed By: pritamdamania87 Differential Revision: D18011308 fbshipit-source-id: 4185e4c6f51c1d11b23b8ab44e6e958b09f27c53	2019-10-18 11:53:39 -07:00
Sebastian Messmer	243298668c	Remove confusing torch::jit::RegisterOperators for custom ops (#28229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28229 We have `torch::RegisterOperators` for custom ops. `torch::jit::RegisterOperators` had a dual state of being able to register custom ops if called one way and being able to register pure JIT ops if called another way. This is confusing because you end up in different operator libraries depending on which API exactly you're using. This PR removes the ability for torch::jit::RegisterOperators to register custom ops and forces people to use the new torch::RegisterOperators. This was already deprecated before but we now remove it. ghstack-source-id: 92137305 Test Plan: unit tests Differential Revision: D17981895 fbshipit-source-id: 0af267dfdc3c6a2736740091cf841bac40deff40	2019-10-18 10:46:31 -07:00
Ailing Zhang	d2eceee54b	Fix hub when branch name contains slash. (#27960 ) Summary: fixes https://github.com/pytorch/pytorch/issues/27844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27960 Differential Revision: D17964360 Pulled By: ailzhang fbshipit-source-id: f5054fc251d2ebbf09ea4ea9fa4d1ce87db5fc52	2019-10-18 10:18:12 -07:00
Xingying Cheng	109c467559	Add generate-wrapper.py with its generated wrapper files. (#28285 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28285 1. Add generate-wrapper.py to route different code path base on different platform. 2. Append all the generated wrapper files by running generate-wrapper.py, and they will be used in the next diff for buck build targets. ghstack-source-id: 92071247 Test Plan: Will be tested in the next diff when these files are linked. Reviewed By: dreiss Differential Revision: D17967339 fbshipit-source-id: 8af88af9e8d2e4640bcf9d29c4daf10666aa88dc	2019-10-18 10:13:54 -07:00
Yanli Zhao	56c4215fcc	Add autograd hook for python rpc call (#27576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27576 1. currently if autograd context is valid, even tensors do not require grads and grads function are not attached. it still send rpc with autograd meta. This is not ideal. This diff makes some change to make sure rpc with autograd meta is sent only if autograd context is valid and tensors require grads 2. meanwhile create a utiliy to attach autograd info and functions as needed 3. add autograd send/recv functions for python rpc call 4. make changes to support nested python rpc calls 5. disallow nested dist autograd context (was landed in #27022) ghstack-source-id: 92154535 Test Plan: unit tests Differential Revision: D17819153 fbshipit-source-id: 37d8a85855bf591f2f2da48d475a06e870a30ea1	2019-10-18 10:11:45 -07:00
Peiyao Zhou	46fefc98e2	Change dper3 loss module to match dper2 (#28265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28265 Fix the difference in dper3 and dper2 when regressionLoss is used. Test Plan: test using dper2 model id f134632386 Comparison tool output before change: ``` FOUND OP DIFFERENT WITH DPER2!!! OP is of type ExpandDims OP inputs ['supervision:label'] OP outputs ['sparse_nn/regression_loss/mean_squared_error_loss/ExpandDims:0'] =============================== Finished all dper3 ops, number of good ops 11, bad ops 1, skipped 26 run_comparison for dper2 / dper3 nets running time: 0.0020143985748291016 result type: <class 'NoneType'> result: None ``` After change: ``` FOUND OP DIFFERENT WITH DPER2!!! OP is of type ExpandDims OP inputs ['sparse_nn_2/regression_loss_2/mean_squared_error_loss_8/Squeeze:0_grad'] OP outputs ['sparse_nn_2/over_arch_2/linear_2/FC_grad'] =============================== Finished all dper3 ops, number of good ops 19, bad ops 1, skipped 16 run_comparison for dper2 / dper3 nets running time: 0.0017991065979003906 result type: <class 'NoneType'> result: None ``` dper2 label part of net P111794577 dper3 label part of net after change P116817194 Reviewed By: kennyhorror Differential Revision: D17795740 fbshipit-source-id: 9faf96f5140f5a1efdf2985820bda3ca400f61fa	2019-10-18 10:08:38 -07:00
naresh	bd6f9e1d6c	torch.nn.functional.gumbel_softmax #27078 (#28121 ) Summary: Comments: * Grad check from `848d1ba13a/test/test_nn.py (L8898)` not added * Double data type as seen in `848d1ba13a/test/test_nn.py (L8916)` not tested Issue: https://github.com/pytorch/pytorch/issues/27078 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28121 Differential Revision: D18008515 Pulled By: yf225 fbshipit-source-id: 9363fe9430df0f2bfd337cc788b11ac93adaa360	2019-10-18 09:41:40 -07:00
Supriya Rao	3629974c1e	Fix quantized avg_pool2d test to support non-zero padding (#28246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28246 Updated the reference fp32 implementation to use the dequantized input tensor to correctly take padded values into account Test Plan: python test/test_quantized.py TestQNNPackOps.test_avg_pool2d Imported from OSS Differential Revision: D17989334 fbshipit-source-id: 848ce78713280f529f71ff48e930db8de18abc62	2019-10-18 09:14:54 -07:00
Andreas Koepf	4b64ada531	Fix typo (#28281 ) Summary: I know this is really a minor one and the list of people to mention will be significantly larger in the future. Nevertheless I would love to see my name written in correct international spelling (the strange German o-umlaut in my name becomes oe). Pull Request resolved: https://github.com/pytorch/pytorch/pull/28281 Differential Revision: D18007518 Pulled By: ezyang fbshipit-source-id: 1d03065636d7f65ac6b376690256c0d021482958	2019-10-18 08:51:12 -07:00
Jeremy Lilley	3d745508eb	String optimizations related to serialization. (#28230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28230 This change improves the pickling small data benchmark by roughly 30%. (25.8usec -> 18.05usec). One of the main issues was that we were spending 25%+ of the cpu profile time in std::[o]stringstream constructors alone. Two main parts - Change some std::stringstream to std::ostringstream, when they showed up on hot-ish paths, and it was trivial to convert them. Roughly 27% of the std::stringstream constructor time is spent building the constituent std::basic_istream. If the istream isn't needed, don't construct it. - For a couple of very hot paths (e.g. Pickler::pushGlobal), just convert to traditional string::append(). std::ostringstream is convenient, but not particularly efficient. ghstack-source-id: 92153103 Test Plan: Benchmarking: buck build mode/opt experimental/jeremyl/c2:SerializationBench Correctness: buck test mode/dev-nosan caffe2/test/... Differential Revision: D17982181 fbshipit-source-id: 7fd4d267293231244c10c1e5b8f4951a7a3d852f	2019-10-18 07:39:30 -07:00
Jeremy Lilley	ac61adb5ef	String opts related to deserialization. (#28263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28263 When looking at profiles of deserializing small data from torch::load(), we found some straightforward string-related changes that in aggregate improve the base time by 25%. One of the main problems was over-use of std::stringstream - the constructors alone were 18%+ of the time spent. This change improves unpickling/deserializing by converting a handful of the hottest usecases from the profiles: - unpickler's readString() goes from 10.3% of time to mostly out of the picture - QualifiedHame constructor (particularly Join call) was 8.9% of time, but afterwards disappears from the profiles. - getRecordID/hasRecord were ~5% each, but also get somewhat smaller. ghstack-source-id: 92158727 Test Plan: Benchmark in buck build mode/opt experimental/jeremyl/c2:SerializationBench Correctness in buck test mode/dev-nosan caffe2/test/... Differential Revision: D17997056 fbshipit-source-id: fc6d6c7da7557ff23c8e8c7dbe4c060abf860018	2019-10-18 07:36:17 -07:00
Hiroshi Ogawa	a1ac15081e	Implement lerp's derivative w.r.t. weight (#28219 ) Summary: Closes https://github.com/pytorch/pytorch/issues/22444. It seemed low priority, but the necessary change seems trivial, so I made this PR anyway. Thanks in advance for reviewing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28219 Differential Revision: D17989123 Pulled By: ezyang fbshipit-source-id: d122b50e90b63dc5d2eeb7689b5ea29d973424ed	2019-10-18 07:18:07 -07:00
Shahriar	91a260cef9	Adding MSELoss, KLDivLoss and BCELoss to C++ front-end (#27156 ) Summary: This PR adds ```MSELoss```, ```KLDivLoss``` and ```BCELoss```. The tests for ```BCELoss``` fail with the following error: ``` unknown file: Failure C++ exception with description "autograd_meta() INTERNAL ASSERT FAILED at /home/shahriar/Contrib/pytorch/c10/core/TensorImpl.h:533, please report a bug to PyTorch. set_requires_grad is not implemented for Tensor (set_requires_grad at /home/shahriar/Contrib/pytorch/c10/core/TensorImpl.h:533) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27156 Differential Revision: D17960323 Pulled By: yf225 fbshipit-source-id: 84b8431064f2f573679c03a8d7994e3e2f81a4d1	2019-10-17 22:07:01 -07:00
Junjie Bai	9c41b61e3f	Disable blobs_queue_db_test in ROCm CI (#28268 ) Summary: Flaky failures on master: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/41550/ https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-test/41512/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/28268 Differential Revision: D18000538 Pulled By: bddppq fbshipit-source-id: 23a13724eeafb915d6f1e1f2da9bd87be0c498b2	2019-10-17 21:41:53 -07:00
Lu Fang	53d9456adf	Clean up the stale item in bc white list (#28269 ) Summary: Remove one stale item Pull Request resolved: https://github.com/pytorch/pytorch/pull/28269 Reviewed By: hl475, BIT-silence Differential Revision: D18000957 Pulled By: houseroad fbshipit-source-id: bc50f80453ce9c675928e6db784d5ebe05861f2a	2019-10-17 21:35:30 -07:00
Jeremy Lilley	5c768ec380	Minor: add static_assert to Pickler buffering. (#28114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28114 This is followup on the pickler buffering change. ghstack-source-id: 92019521 Test Plan: This just adds an static assert, hence if it builds, we're good. Differential Revision: D17955006 fbshipit-source-id: d7fd69935d23f39db18029703f63c8f18d23047a	2019-10-17 21:16:48 -07:00
Jeremy Lilley	d7ff34c0f8	In torch::save() avoid zip compressing small header records. (#28180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28180 ScriptModuleSerializer::writeCode() is the only place during torch::save() serialization where we attempt to zip compress records. This change avoids compressing these string records if they are sufficiently small - e.g. in the example I looked at: - the strings were 123 and 28 bytes, respectively. - the cost in the compression routines was 16.5% of the torch::save() cost. (we're building a huffman table for a 28 byte string). We'd save time and not significantly affect the space if we add these 1-line conditional compressions, rather than making it unconditional. ghstack-source-id: 92104517 Test Plan: Benchmark: experimental/jeremyl/c2:SerializationBench Correctness: normal buck mode/dev-nosan caffe2/test/... Differential Revision: D17967995 fbshipit-source-id: 7ff934388533645dc987e105c814ffe6324f4596	2019-10-17 21:10:07 -07:00
Will Feng	5498a15d10	Add tests for libtorch macOS binary (#25208 ) Summary: This PR adds basic and dependency tests for libtorch macOS binary, so that we don't have issues like https://github.com/pytorch/pytorch/issues/14727 in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25208 Differential Revision: D18001189 Pulled By: yf225 fbshipit-source-id: 89be1947b5bc094fcc02b0f268b9d8ebaf0f6700	2019-10-17 20:39:09 -07:00
davidriazati	2e7dd54796	Fix RNN nonlinearity (#28058 ) Summary: This was referenced in the `RNN` docs but wasn't actually assigned Pull Request resolved: https://github.com/pytorch/pytorch/pull/28058 Pulled By: driazati Differential Revision: D17945867 fbshipit-source-id: 0f0dc2633183a7e67a12352a2a7ac0545284666a	2019-10-17 16:46:09 -07:00
Junjie Bai	0b243e9c4c	Disable c10d test_sync_params_with_buffers on ROCm (#28190 ) Summary: Failed runs on master: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2097/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2144/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2154/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/2167/ ``` 19:59:03 ====================================================================== 19:59:03 FAIL: test_sync_params_with_buffers (__main__.DistributedDataParallelTest) 19:59:03 ---------------------------------------------------------------------- 19:59:03 Traceback (most recent call last): 19:59:03 File "/var/lib/jenkins/workspace/test/common_distributed.py", line 130, in wrapper 19:59:03 self._join_processes(fn) 19:59:03 File "/var/lib/jenkins/workspace/test/common_distributed.py", line 211, in _join_processes 19:59:03 self._check_return_codes(elapsed_time) 19:59:03 File "/var/lib/jenkins/workspace/test/common_distributed.py", line 235, in _check_return_codes 19:59:03 self.assertEqual(first_process.exitcode, 0) 19:59:03 File "/var/lib/jenkins/workspace/test/common_utils.py", line 748, in assertEqual 19:59:03 super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 19:59:03 AssertionError: 10 not less than or equal to 1e-05 : 19:59:03 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28190 Differential Revision: D17971146 Pulled By: bddppq fbshipit-source-id: d3f527c14ca81073c1c236d5b3bb07a6ef1dde51	2019-10-17 15:09:50 -07:00
Igor Fedan	12dde7f58a	cdist performance improvement for euclidean distance (#25799 ) Summary: jacobrgardner https://github.com/pytorch/pytorch/issues/15253#issuecomment-491467128 preposed a way to speedup euclidean distance calculation. This PR is implementation of this solution for normal and batch version. Also simonepri provided performance metrics https://github.com/pytorch/pytorch/issues/15253#issuecomment-502363581 ![image](https://user-images.githubusercontent.com/12058312/64460756-44a24580-d0c9-11e9-9f7f-a5942f4c832d.png) Current implementation has speedup comparing to jacobrgardner approach ![image](https://user-images.githubusercontent.com/12058312/64461495-5553bb00-d0cb-11e9-87e6-302b8cc7e12b.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25799 Differential Revision: D17964982 Pulled By: ifedan fbshipit-source-id: bf7bd0dbfca51fd39e667da55139347480f30a2f	2019-10-17 14:56:54 -07:00
Ansha Yu	7c1df06efa	default caffe2_tvm_min_ops to 10 (#28250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28250 We've been running canaries with this setting for a while Test Plan: build, sanity canary Reviewed By: yinghai Differential Revision: D17872108 fbshipit-source-id: fb7f0373eac1c8aaae007a17f6ffb91482952813	2019-10-17 14:35:22 -07:00
Zafar Takhirov	07b5666a87	Add default arg to `prepare_qat` mapping. (#28193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28193 Fixes #28015 Test Plan: Imported from OSS Differential Revision: D17973121 Pulled By: z-a-f fbshipit-source-id: 03b3f70c70b89060c1f03d7ed8ab6002fe60bd49	2019-10-17 14:11:54 -07:00
Owen Anderson	7ebe8328e1	Address review comments on #28011 . Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28109 Differential Revision: D17966067 fbshipit-source-id: 9e4a03a1813835b67cd614d8fac18524f5b36cc5	2019-10-17 14:07:58 -07:00
neginraoof	95922c90b5	Export update for arange and _dim_arange (#26875 ) Summary: Export arange and _dim_arange using onnx::range in opset 11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26875 Reviewed By: hl475 Differential Revision: D17623848 Pulled By: houseroad fbshipit-source-id: 41f0066ca1c42882ccc051a3ee5448dca25ee5d2	2019-10-17 13:55:45 -07:00
Zafar Takhirov	a5ac7f6387	Changing observer name Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27779 Test Plan: Imported from OSS Differential Revision: D17886605 Pulled By: z-a-f fbshipit-source-id: 68c50b482e65015336ff27171fd730da493525b6	2019-10-17 11:36:03 -07:00
Andreas Koepf	86e7e872bf	Port of multi_margin_loss from TH to ATen (CPU) (#28062 ) Summary: This is a port of the existing TH CPU C MultiMarginCriterion to function multi_margin_loss for ATen. ~~The ATen/C++ version is unfortunately significantly slower than the original. It is currently unclear to me what causes the performance degradation since the Tensor access is raw-pointer based similar to the original C implementation. (A first implementation I had created using TensorAccessor was even about 2x slower than the one in this PR).~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/28062 Differential Revision: D17980636 Pulled By: ezyang fbshipit-source-id: bba27a13436adff5e687d95cc984ec2386ce7a73	2019-10-17 11:16:51 -07:00
davidriazati	618cb40e30	Add doc copy-edits from review (#26322 ) Summary: Add edits from doc review ](https://our.intern.facebook.com/intern/diff/17859654/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26322 Pulled By: driazati Differential Revision: D17859654 fbshipit-source-id: f3a116cddb5393bdfbef670c56efb2ee62ccf252	2019-10-17 11:12:35 -07:00
Mingzhe Li	5c2bf8abe5	change linear benchmark shapes (#28228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28228 as title Test Plan: ``` buck run //caffe2/benchmarks/operator_benchmark/pt:linear_test # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N32_IN1024_OUT256 # Input: N: 32, IN: 1024, OUT: 256 Forward Execution Time (us) : 1501.918 # Benchmarking PyTorch: linear # Mode: Eager # Name: linear_N64_IN256_OUT100 # Input: N: 64, IN: 256, OUT: 100 Forward Execution Time (us) : 1175.672 Reviewed By: hl475 Differential Revision: D17980463 fbshipit-source-id: c8aaf6fa4d847037accb1e5b9ee04900690fd6ae	2019-10-17 11:09:10 -07:00
Sebastian Messmer	21c3997974	Disable schema inference for unboxedOnly kernels (#27977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27977 The only remaining reason why we couldn't move some ops from globalATenDispatch to the c10 dispatcher was that schema inference didn't support some use cases. But actually, we don't need schema inference for these ops. By disabling it, we can move the remaining ops from globalATenDispatch to the c10 dispatcher. ghstack-source-id: 92104807 Test Plan: waitforsandcastle Differential Revision: D17929696 fbshipit-source-id: 05ec65b615487fde784293e3b533fa3ec09cf234	2019-10-17 10:49:56 -07:00
Mike Ruberry	8fff54ec39	Enables non-default CUDA stream in test_nn (#28192 ) Summary: Per title. Several stream fixes have gone in that may make this pass in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28192 Differential Revision: D17974219 Pulled By: mruberry fbshipit-source-id: 543d000789c83711a8b4bef169a87635fda7508b	2019-10-17 10:19:49 -07:00
Vitaly Fedyunin	951dd03037	Add memory format support to typecasting shortcuts `byte`,`char`,`double`,`bool`,`half`,`int`,`long`,`short`,`float`,`bfloat16` (#27228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980315 Pulled By: VitalyFedyunin fbshipit-source-id: fd5615621bc4968aa4ef2a26430c492c552ed671	2019-10-17 09:16:25 -07:00
Vitaly Fedyunin	15df371934	Add memory format support to typecasting shortcuts `byte`,`char`,`double`,`bool`,`half`,`int`,`long`,`short`,`float`,`bfloat16` (#27228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27228 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17980128 Pulled By: VitalyFedyunin fbshipit-source-id: b2646bab72c4475b7a82bb271d204a9d96d28bd4	2019-10-17 09:16:21 -07:00
Ryan J. Evans	c36552c4cb	Fixing dispatch error in windows debug builds (#24360 ) Summary: nullptr initialization values for dispatch pointers were overwriting values set using the REGISTER_DISPATCH macro. Relevant issue: https://github.com/pytorch/pytorch/issues/22681 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24360 Differential Revision: D17952241 Pulled By: ezyang fbshipit-source-id: 4bf86dc24153e504bbeacb526c58fd8230bb972a	2019-10-17 09:13:19 -07:00
Vadim Kantorov	e1be08fcf5	out-variant for torch.batch_norm_elemt (#27621 ) Summary: Following dicussion with ezyang in https://github.com/pytorch/pytorch/issues/26288 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27621 Differential Revision: D17978858 Pulled By: ezyang fbshipit-source-id: f843b691a67f1dc48b87ed6a633007d193150cf7	2019-10-17 09:09:46 -07:00
Hong Xu	4e71be449e	Remove tools/setup_helpers/nvtoolext.py (do not seem to be used) (#28125 ) Summary: `git grep nvtoolext` shows nothing (meaning that it is never imported). Pull Request resolved: https://github.com/pytorch/pytorch/pull/28125 Differential Revision: D17979164 Pulled By: ezyang fbshipit-source-id: 7cfe770c9f7140c8ad58676f912037e6226647d3	2019-10-17 09:07:09 -07:00
vishwakftw	4cc368e3a6	Declare the LAPACK and MAGMA dispatchers instead of defining them with a default error (#28133 ) Summary: This clears a lot of dead code that isn't reachable due to `AT_DISPATCH`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28133 Test Plan: - All existing tests should pass to ensure that the change is valid. Differential Revision: D17978803 Pulled By: ezyang fbshipit-source-id: 8fdaa74f9addb1d7987c5d625557b8a463a25500	2019-10-17 09:04:56 -07:00
Jeremy Lilley	076b116a41	In ProcessGroupAgent, use non-iostream torch::load()/save(). (#28063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28063 Avoid using the iostream versions of torch::load()/torch::save(), which incur at least one additional full data copy. ghstack-source-id: 92059608 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D17945206 fbshipit-source-id: ba24376c13762a28e569530e3b1a939ac6f72f43	2019-10-17 07:39:30 -07:00
Hong Xu	4a69d048e0	Move the CUDA implementation of log2 to ATen. (#26769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26769 Fix #24589 Test Plan: Imported from OSS Differential Revision: D17960122 Pulled By: VitalyFedyunin fbshipit-source-id: 58dff236886bbf3a0a152d7422aa8a5c478ee1de	2019-10-17 07:27:55 -07:00
Edward Yang	6923b93ebc	Revert D17972725: [pytorch][PR] Update onnx-tensorrt Test Plan: revert-hammer Differential Revision: D17972725 Original commit changeset: 01933b3f9e2b fbshipit-source-id: 43f3560a7a3922dd676678b61d6cce7f2006b3f1	2019-10-17 07:07:04 -07:00
Sebastian Messmer	bb0e46b65a	Remove preallocation of type ids (#28024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28024 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 92051872 Test Plan: unit tests Differential Revision: D17936165 fbshipit-source-id: 2c9df2b9b3f35b3e319641c96638321ac3433d5c	2019-10-16 23:08:11 -07:00
Zachary DeVito	58ed8ca9e1	clean up exported source format (#28129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28129 The previous PR in the stack removed the need to order classes/functions or have correct import statements. This resolved circular depedency issues that can arise when class constructors like ModuleList put new instances of themselves in a common namespace. This PR changes our export format to no longer produce this information. By doing so we can make the logic signficantly simpler, since we just keep track of an individual PythonPrint object per file. Notes: * PythonPrint was changed to manage its own stream/list of ranges. It was doing this anyway internally, this just makes the API more clear. * Since we are changing the serialization format, I also removed op_version_set. It is now replaced with the VERSION number that written in the zip archive. This further simplifies the code emission process. * A test of op_version_set was removed since there is no longer any behavior to test. Test Plan: Imported from OSS Differential Revision: D17961610 Pulled By: zdevito fbshipit-source-id: ada362c4ca34d05393a1a7e799c94785ab9d9825	2019-10-16 22:47:24 -07:00
Will Feng	aad5071206	Use torch::variant for enums in C++ API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26837 Test Plan: Imported from OSS Differential Revision: D17579438 Pulled By: yf225 fbshipit-source-id: 9ac59df28a317fdb3be2cc02c65962ad99117127	2019-10-16 22:40:57 -07:00
Supriya Rao	de0f9567a3	Add quantized avg_pool2d for pytorch mobile (#27631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27631 Add support to perform avg_pool2d on mobile. Tested using existing avg_pool2d python tests Uses qnnpack backend, which currently only support 4 dim inputs. Test Plan: python test/test_quantized.py TestQNNPackOps.test_avg_pool2d Imported from OSS Differential Revision: D17973792 fbshipit-source-id: 95ffffb2da656ed911a618b9cb68d6b728c16c74	2019-10-16 22:02:23 -07:00
Ilia Cherniavskii	62e281fbcf	Add CI builds (#27925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27925 Add extra CI builds for TBB and native builds Test Plan: check CI Differential Revision: D17914952 Pulled By: ilia-cher fbshipit-source-id: 16995038909d17eb6f9c69b9bddd8f12981ad36b	2019-10-16 21:53:40 -07:00
Ilia Cherniavskii	19956b200d	Relax set_num_threads restriction in parallel native case (#27947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27947 Don't throw exception if the requested size is the same as the currently used one Test Plan: ATEN_THREADING=NATIVE python setup.py develop --cmake Imported from OSS Differential Revision: D17919416 fbshipit-source-id: 411f7c9bd6a46e7a003b43a200c2ce3b76453a2e	2019-10-16 21:53:36 -07:00
Mikhail Zolotukhin	2265cddbd2	Cleanup torch::jit::script::Module API for accessing attributes/parameters/submodules. (#27260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27260 This PR has the following changes: - Slot class is removed. In all use cases except `lower_graph` we really just needed the attribute name and thus having an extra layer of abstraction through Slot only made the code harder to understand. - get_parameters, get_attributes, get_modules, and get_slots now return a list of <name, item> pairs instead of a list of Slots. Differential Revision: D17728910 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 94781611752dd88e7fddfe8b8e0252d6ec32ba68	2019-10-16 21:32:08 -07:00
Xiaomeng Yang	d083b443b4	Fix LayerNorm Bug (#28196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28196 Fix LayerNorm Bug Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm" Reviewed By: okhonko, houseroad Differential Revision: D17973451 fbshipit-source-id: 865e4f295b8d6c0438ec8872da0b43d3c5d3d3c6	2019-10-16 20:38:46 -07:00
Mike Ruberry	edc28676ef	Adds @overridePrecision decorator (#28131 ) Summary: Adds the overridePrecision decorator, which allows device generic tests to specify per-dtype precision overrides. Precision is overridden on the test class instance itself, and so is thread-local (so that running multiple tests in parallel will not conflict). It can be accessed directly from a test with self.precision, as before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28131 Differential Revision: D17969774 Pulled By: mruberry fbshipit-source-id: c4e0b71afac6bdc7cbf4e799f3054922de764820	2019-10-16 19:47:55 -07:00
Sebastian Messmer	35a5df8c94	Update onnx-tensorrt (#28158 ) Summary: We need https://github.com/onnx/onnx-tensorrt/pull/290 to be able to switch PyTorch to C++14. This PR updates the onnx-tensorrt dependency so we have that fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28158 Differential Revision: D17972725 Pulled By: smessmer fbshipit-source-id: 01933b3f9e2b6f79a00ef919ab1633a8c63571dd	2019-10-16 19:20:29 -07:00
Sebastian Messmer	f279b68a48	Update gloo (#28174 ) Summary: We need https://github.com/facebookincubator/gloo/pull/225 to be able to switch PyTorch to C++14. This PR updates the gloo dependency so we have that fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28174 Differential Revision: D17973079 Pulled By: smessmer fbshipit-source-id: 887996d1c2850bb97bf2eb081544b67ca5c9ae5f	2019-10-16 19:02:05 -07:00
Bram Wasti	86e93bde90	Back out "Use FunctionSchema instead of char* for dispatch" Summary: Original commit changeset: cb8e21d4b8d2 Test Plan: revert Reviewed By: jerryzh168 Differential Revision: D17971815 fbshipit-source-id: 92ca62b4ca20c3d083d1fc87e0080b988a981cc8	2019-10-16 18:57:30 -07:00
Sebastian Messmer	d9de2e0ba9	Back out "Revert D17936166: [wip] Constexpr type ids" (#28155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28155 Original commit changeset: 92c63a96dedd ghstack-source-id: 92051874 Test Plan: unit tests Differential Revision: D17964410 fbshipit-source-id: 1d989d28b3e1de6d43c915f122f2b65a77a332eb	2019-10-16 18:24:04 -07:00
Jeremy Lilley	ff00e8c9eb	Fix pushLong() issue in pickler. (#28057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28057 For pushLong() in Pickler, it looks like we only use for a single use case, with a 10-byte value. We were handling > 256 bytes incorrectly, by using a LONG4 opcode (expecting 4-byte length), but pushing 8 bytes. We could harden this handling, but rather than improve codepaths that we never expect to use, this change simply removes the incorrect codepath and adds and assert. ghstack-source-id: 92048325 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D17934174 fbshipit-source-id: ecc1ca37dbcc87151fc5bf2ffb6b05dff91d3667	2019-10-16 18:07:26 -07:00
Bram Wasti	aa6c394e39	Use FunctionSchema instead of char* for dispatch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27159 Test Plan: Imported from OSS Differential Revision: D17693481 Pulled By: bwasti fbshipit-source-id: cb8e21d4b8d29dcc1cd75cb6b681986679b835fe	2019-10-16 17:14:28 -07:00
Yanli Zhao	3214f134b6	fix python rpc handler exit crash (#27251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27251 Explicitly clean up py::objects to avoid segment faults when py::objects with CPython are cleaned up later at program exit. See similar issues reported https://github.com/pybind/pybind11/issues/1598 and https://github.com/pybind/pybind11/issues/1493. Our local tests also caught this segment faults if py::objects are cleaned up at program exit. The explaination is: CPython cleans up most critical utitlies before cleaning up PythonRpcHandler singleton, so when PythonRpcHandler signleton cleans up py::objects and call dec_ref(), it will crash. The solution is to clean up py::objects earlier when Rpc agent join(). Be note that py::objects can not be cleaned up when Rpc agent is destroyed as well, as Rpc agent is global variable and it will have same issue as PythonRpcHandler. close #27182 ghstack-source-id: 92035069 Test Plan: unit tests on python 3.6 and python 3.5 Differential Revision: D17727362 fbshipit-source-id: c254023f6a85acce35528ba756a4efabba9a519f	2019-10-16 16:57:38 -07:00
Carlos Miranda	7d277b0670	Multi Label Margin loss (#27659 ) Summary: In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `MultiLabelMarginLoss` module and `multilabel_margin_loss` functional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27659 Differential Revision: D17931905 Pulled By: yf225 fbshipit-source-id: 3642f75c79843dda55ac38de9f6f970f3e237847	2019-10-16 15:44:38 -07:00
Mingzhe Li	cbcb70f84c	print last 50 runs when using ai_pep_format (#28128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28128 as title Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.169559478759766"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.206514358520508"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.4950008392334"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.172897338867188"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.27255630493164"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.549837112426758"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "ms", "value": "29.63113784790039"} ... Reviewed By: hl475 Differential Revision: D17957611 fbshipit-source-id: 4e70ba2070b97fbbca0d6d4295abbead2ac356d4	2019-10-16 15:22:23 -07:00
Natalia Gimelshein	97257e257e	clean up test_cat_empty (#28115 ) Summary: Remove spurious parts from test_cat_empty Pull Request resolved: https://github.com/pytorch/pytorch/pull/28115 Test Plan: no additional tests needed. Differential Revision: D17956669 Pulled By: ngimel fbshipit-source-id: cffcfa9e5b50afba62c6dbc8ca5d9de95d0c020e	2019-10-16 14:42:14 -07:00
Hong Xu	cbb4c87d43	Improve the doc and test of logical_xor (#28031 ) Summary: Following up https://github.com/pytorch/pytorch/issues/27248. per suggestion by gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/28031 Differential Revision: D17962226 Pulled By: gchanan fbshipit-source-id: 788e4e1fc78b1cfc7915aedaa10c8656b19edc4d	2019-10-16 13:57:53 -07:00
Shihao Xu	3523e5427a	Add master to OSS RPC test (#27776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27776 I think it’s not worth it to equip other `RPCAgent` with collective communication capability, i.e. 1) have GLOO contained in `RPCAgent`, or 2) Implemented ::barrier() and ::drain() based on RPC messaging. The only use case that does not have a master is the OSS unit test suite, caffe2/test/rpc_test.py. I think having those unit tests to have a master is simpler than equipping `RPCAgent` with collective communication capability. Differential Revision: D5445858 fbshipit-source-id: 56ee24703abd8c5b366829430bef657e0f1dfeba	2019-10-16 13:45:45 -07:00
Natalia Gimelshein	174e1ba3b8	Small fixes to improve TensorIterator overhead for the common case of inputs and outputs of the same type (#27457 ) Summary: 1) Short-circuits computing common type and type promotion logic for the common case of operands and result of the same type 2) Improves performance of checking memory overlap by returning MemoryOverlap::FULL if tensors are the same, skips the call from TensorIterator when tensors are the same 3) Changes the default size of DimVector from 5 to 6, thus allowing it not to be resized for a common case of binary operation. `strides` DimVector is forced to have at least 2*num_tensors elements, which for an operation with 2 inputs and one output is 6 4) If `offset` is 0 (common non-broadcasting case), don't fill `strides` vector with 0-s, because all the values will be subsequently written to. These changes combined improve the overhead from 1.02 us to .74 us for a simple in-place operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27457 Test Plan: should be covered by existing tests Differential Revision: D17784532 Pulled By: ngimel fbshipit-source-id: e6a8ee58be5de14461bdbc2e2b0b6d16a96c309f	2019-10-16 13:06:20 -07:00
Sebastian Messmer	3ac4267763	Force building with GCC 5 (#28098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28098 Make sure that we're building with GCC 5 everywhere ghstack-source-id: 92013998 Test Plan: waitforsandcastle Differential Revision: D17953640 fbshipit-source-id: 26d978c60fc973c787383297d730b45d40fa300b	2019-10-16 12:49:59 -07:00
Zafar Takhirov	dc8785a022	Refactoing names for consistency Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27670 Test Plan: Imported from OSS Differential Revision: D17846269 Pulled By: z-a-f fbshipit-source-id: ed3c7441c185bf11b2e62879aa3ecbc654aa2d4e	2019-10-16 12:18:26 -07:00
Carlos Miranda	9540f6c3fe	Soft Margin loss (#27660 ) Summary: In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `SoftMarginLoss` module and `soft_margin_loss` functional. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27660 Differential Revision: D17958325 Pulled By: yf225 fbshipit-source-id: c14422765e6e1fdabf6c9687080e6d5ff490d300	2019-10-16 12:04:08 -07:00
Will Feng	c67d3533a7	Update C++ torch::nn parity table, and temporarily disable C++ API parity test (#28117 ) Summary: This PR updates `test/cpp_api_parity/parity-tracker.md` to reflect our progress on C++ `torch::nn` parity. It also disables the C++ API parity test temporarily, and as the next step I will refactor the parity test to make it simpler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28117 Differential Revision: D17957948 Pulled By: yf225 fbshipit-source-id: 1dd836c25665f57ba8efc6d1abf671a95c03eff7	2019-10-16 11:54:13 -07:00
Lara	735463f210	ONNX Export Scripted Interpolate Op (#27566 ) Summary: We currently support exporting traced interpolate ops to ONNX. Scripting interpolate op invokes aten::__interpolate in the Torch IR (instead of aten::upsample_[mode][dim]d), which we do not support yet. This PR implements the ONNX symbolic for __interpolate() to support exporting interpolate in scripting scenarios. Related open issue: https://github.com/pytorch/pytorch/issues/25807 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27566 Reviewed By: hl475 Differential Revision: D17817731 Pulled By: houseroad fbshipit-source-id: e091793df503e2497f24821cf2954ff157492c75	2019-10-16 11:22:22 -07:00
Zachary DeVito	5136ed0e44	Remove attempToRecoverType (#26767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26767 Now that we have tagged ivalues, we can accurately recover the type with `ivalue.type()`. This reomoves the other half-implemented pathways that were created because we didn't have tags. Test Plan: Imported from OSS Differential Revision: D17561191 Pulled By: zdevito fbshipit-source-id: 26aaa134099e75659a230d8a5a34a86dc39a3c5c	2019-10-16 11:07:13 -07:00
Zachary DeVito	fb4517132f	Allow 'Any' to appear as a type argument. (#26572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26572 Combined with isinstance specialization this allows a degree of polymorphic functions to work without needing to use our weirder overload hacks. We do not define any operators on Any, so the only thing you can do with it is to put it in containers or type refine it using an isinstance check. Any is restricted from appearing in non-argument position because we cannot restore type tags if it ends up as a field in a class. Test Plan: Imported from OSS Differential Revision: D17530643 Pulled By: zdevito fbshipit-source-id: f06f78ce84819f7773953a492f3d4c49219ee94c	2019-10-16 11:07:08 -07:00
Hiroshi Ogawa	97b39a296f	Fix error report highlight for unmatched type annotation (#27195 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/25801 (see there for my verbose analysis). As an example, for the following code: ``` import torch torch.jit.script def f1(x): # type: (int, int) -> None pass ``` this PR will change error message from this: ``` RuntimeError: Number of type annotations (2) did not match the number of function parameters (1): # type: (int, int) -> None ``` to this: ``` RuntimeError: Number of type annotations (2) did not match the number of function parameters (1): at __scratch__/example.py:4:0 torch.jit.script def f1(x): ~~~~~~~~ <--- HERE # type: (int, int) -> None pass ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27195 Differential Revision: D17910902 Pulled By: driazati fbshipit-source-id: af5c6353069d005752d6c7f0bd6a0c6db8437e55	2019-10-16 10:39:36 -07:00
davidriazati	8cdc262063	Add support for `@staticmethod` (#27163 ) Summary: Resolve static methods as functions Fixes #26792 ](https://our.intern.facebook.com/intern/diff/17695094/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27163 Pulled By: driazati Differential Revision: D17695094 fbshipit-source-id: 4671cae1a92526a35c83b8d9c12a50aa5442412b	2019-10-16 10:36:38 -07:00
svcscm	e3e54282cd	Updating submodules Summary: GitHub commits: `509dd6da09` `6b95a33c60` `90debac03b` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 20fbd2548722418516e602c9a538d0a541a19fee	2019-10-16 10:22:22 -07:00
Gao, Xiang	2d2fe14a60	Install CUDA for clang-tidy (#27967 ) Summary: fixes: https://github.com/pytorch/pytorch/issues/28009 clang-tidy is reporting `'cuda_runtime_api.h' file not found` when a PR modifying some file including this header. Installation script take from official site: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=debnetwork Pull Request resolved: https://github.com/pytorch/pytorch/pull/27967 Differential Revision: D17952383 Pulled By: ezyang fbshipit-source-id: 85807d93bd46eb902a84b2126784349ce3a01cfa	2019-10-16 10:02:19 -07:00
Edward Yang	94c1ff4388	Devirtualize allow_tensor_metadata_change() getter/setter. (#27667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27667 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17886548 Pulled By: ezyang fbshipit-source-id: b99db2e163e5621920f12b150709f0defbce13da	2019-10-16 09:57:31 -07:00
Edward Yang	4f4c69b1de	Make set_grad_accumulator private (friend class SavedVariable) (#27666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27666 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17886544 Pulled By: ezyang fbshipit-source-id: b9ff845cb1e5ec6f7cb4f2fa171403d555014248	2019-10-16 09:57:27 -07:00
Edward Yang	e1f58b7c4c	Make AutogradMeta a private struct in Variable. (#27654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27654 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17886547 Pulled By: ezyang fbshipit-source-id: ea0c5b40a5f34bc37657ed5d3bce9140063ddcbb	2019-10-16 09:57:23 -07:00
Edward Yang	34522c212a	Add trailing underscore to member variable. (#27651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27651 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17886546 Pulled By: ezyang fbshipit-source-id: b8f7c74b1004d35690a815b0c7671a07ca612e94	2019-10-16 09:57:19 -07:00
Moksh Jain	f38beff800	Add nn.Bilinear to C++ Frontend (#26082 ) Summary: Adds support for the Bilinear layer to the C++ frontend Pull Request resolved: https://github.com/pytorch/pytorch/pull/26082 Differential Revision: D17954148 Pulled By: yf225 fbshipit-source-id: 5e746bdea29b00e25969cd7a22044b8059b53687	2019-10-16 09:54:01 -07:00
Jeremy Lilley	3ed9a6e2ab	Buffer in Pickler to improve performance. (#27720 ) Summary: This change adds a small fixed-size buffer to Pickler to avoid calling writer_() and the associated downstream checks on a per-opcode/per-byte basis. We end up still doing a bounds check in the common case, but the memcpy() is a fixed size. And we reduce the number of backend calls. In practice, this change speeds up the Pickle1MInts benchmark for me locally from roughly 56msec to 22msec. Additionally, in this change we convert a few pushIValue() on typed lists, where we know the type to be double/int/boot to be pushInt() to bypass a bit of logic. We should additionally change the Unpickler, though keeping this separate, since the std::function<> prototype needs to be changed for this to work (i.e. return size_t rather than bool). Pull Request resolved: https://github.com/pytorch/pytorch/pull/27720 Test Plan: buck test mode/dev-nosan caffe2/test:... Benchmark in experimental/jeremyl/c2/SerializationBench.cpp (run in mode/opt) Differential Revision: D17847174 Pulled By: jjlilley fbshipit-source-id: 22e5e5fd33f1a369c124ea5aac7880538e2bf6a0	2019-10-16 09:37:15 -07:00
Peter Bell	3d3bff5ff1	Fix early expansion of CUDA_TOOLKIT_ROOT_DIR in libtorch builds (#27887 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15476, supersedes https://github.com/pytorch/pytorch/issues/23496, supersedes and closes https://github.com/pytorch/pytorch/issues/27607 As explained by rgommers in https://github.com/pytorch/pytorch/issues/23496, linking against the expanded library path for `libculibos` in `cmake/Dependencies.cmake` hard codes the path into the distributed cmake files. Instead, I only link against the targets (e.g. `caffe2::cudnn`) and move the dependency on `libculibos` into the cuda import targets declared in `cmake/public/cuda.cmake`. That file is distributed with the other cmake files and so the variable is expanded on the user's machine. I am now also using `CMAKE_STATIC_LIBRARY_SUFFIX` instead of `.a` to fix the windows issue from https://github.com/pytorch/pytorch/issues/15828. I don't have a windows setup to confirm though. Finally, to get pytorch to compile with the extra libraries enabled, I also had to link `__caffe2_nccl` to `torch_python`; otherwise I was getting include errors as the hard coded include directory was wrong. `nccl` is built into `build` not `third_party/build`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27887 Differential Revision: D17929440 Pulled By: ezyang fbshipit-source-id: 3db6bd94d758fca2e1d6a64f4f5eea03cc07cf64	2019-10-16 09:21:47 -07:00
Xiaomeng Yang	4f1f084d22	Make layer_norm dispatch from yaml file to fix XLA test (#28051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28051 Make layer_norm dispatch from yaml file to fix XLA test Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm" Reviewed By: houseroad Differential Revision: D17939919 fbshipit-source-id: 384b6a8008dabfc1aaeb0357c1bd195be68f1edb	2019-10-16 07:29:38 -07:00
Hong Xu	5c153de26b	Nicer promotion error message when pr. (#27941 ) Summary: Instead of an abstruse "unsupported scalarType", we print more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27941 Differential Revision: D17933972 Pulled By: ezyang fbshipit-source-id: 51e0e1c11e530606612482e24ff28898323e54fc	2019-10-16 07:04:13 -07:00
Lu Fang	1819fade35	Revert D17936166: [wip] Constexpr type ids Test Plan: revert-hammer Differential Revision: D17936166 Original commit changeset: 68cfa926c721 fbshipit-source-id: 92c63a96dedd8764e342c6437c6ea308d93d29b2	2019-10-16 06:47:10 -07:00
svcscm	054239dc0e	Updating submodules Summary: GitHub commits: `4727542db2` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 02d786c67323b3b9aa5822ebcd7c497798424ef7	2019-10-15 23:40:10 -07:00
Owen Anderson	08f4a244d3	Eliminate unnecessary Tensor refcount bump. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28011 Differential Revision: D17936915 fbshipit-source-id: 457ecd09bbe9af4f1fa8ede66ba1265763dc70dd	2019-10-15 22:50:00 -07:00
Jeremy Lilley	2e0294cb39	Make JIT Serialization support arbitrary std::function<> IO (#28039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28039 Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Test Plan: buck test mode/dev-nosan caffe2/test/... (Basic serialization test in caffe2/test/cpp/api/serialize.cpp) Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443 (1M time goes from 90ms -> 40ms, albeit with crc patch applied) Differential Revision: D17939034 fbshipit-source-id: 344cce46f74b6438cb638a8cfbeccf4e1aa882d7	2019-10-15 22:12:04 -07:00
Sebastian Messmer	9cc4405dc9	Constexpr type ids (#28023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28023 ghstack-source-id: 91987335 Test Plan: waitforsandcastle Differential Revision: D17936166 fbshipit-source-id: 68cfa926c721e5fbc96e083eb47e784bf34a9df4	2019-10-15 21:21:20 -07:00
Lu Fang	e9a91756cd	Back out "[pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU)" Summary: Original commit changeset: 9ddffe4dbbfa Test Plan: ci Reviewed By: yf225 Differential Revision: D17939581 fbshipit-source-id: 44a3b843bf1e7059fec57b9e3d12ed4886816145	2019-10-15 21:12:10 -07:00
BowenBao	ab50abca5c	Export masked_select and masked_scatter in opset 11 (#25949 ) Summary: - masked_select is exported as ONNX::GatherND - masked_scatter is exported as ONNX::ScatterND Pull Request resolved: https://github.com/pytorch/pytorch/pull/25949 Reviewed By: hl475 Differential Revision: D17465489 Pulled By: houseroad fbshipit-source-id: 4c3732617733ca2024a5e306ffa9f6bfcf9725d5	2019-10-15 21:09:37 -07:00
Sebastian Messmer	705958be5b	Update GCC for CentOS build (#28059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28059 ghstack-source-id: 91987332 Test Plan: waitforsandcastle Differential Revision: D17945780 fbshipit-source-id: 044a0d24837545eab6d637d6cbe644bb694f318f	2019-10-15 19:04:02 -07:00
Shen Li	d2c2501eb3	Minor improvements in RPC api docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28030 Test Plan: Imported from OSS Differential Revision: D17937426 Pulled By: mrshenli fbshipit-source-id: 74e03542ab40abcd71441a188215cb1562b558df	2019-10-15 19:00:46 -07:00
Sebastian Messmer	e4f5224ebd	Revert D17935286: Update GCC for centos CI builds Test Plan: revert-hammer Differential Revision: D17935286 Original commit changeset: 12f584d4a240 fbshipit-source-id: ecc49bbf1d6f78752bdb834b8a1b145a359c8240	2019-10-15 17:51:08 -07:00
Shen Li	59cd0faeff	Defer pg agent listener thread until contexts are initialized (#28013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28013 ProcessGroupAgent currently kicks off the listener thread in its constructor. However, serving requests requires contexts to be initialized, e.g., RRefContext and agent_ global var in api.py, which might not be done yet when the first request arrives. ProcessGroupAgent does not know what would be the appropriate time to start the listener thread, hence exposing an API for higher layer code to explicitly start listeners. Test Plan: Imported from OSS Differential Revision: D17932271 Pulled By: mrshenli fbshipit-source-id: 3b408477594d4d19319e7cd08dd6f383a7ed7670	2019-10-15 17:45:43 -07:00
Rohan Varma	00a2b36188	improve error handling in getNCCLVersion in NCCLUtils (#27883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27883 Returns early if NCCL version code returned to us is < 100, to prevent division errors. This shouldn't actually happen since the nvidia nccl version is way past 0.1.0 but nice to have this safeguard. ghstack-source-id: 91861083 Test Plan: Follow same process as https://github.com/pytorch/pytorch/pull/27068. Also force version to be < 100 and ensure that "Unknown NCCL Version" is returned. Differential Revision: D17903234 fbshipit-source-id: c4df63bb1c18f1b2ef9e4cd434d4ca6c5ac556df	2019-10-15 17:33:09 -07:00
Shihao Xu	871b1419de	Test graceful termination of RPCAgent with asymmetric load (#27761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27761 # Problem `rpc_test` currently only has test cases that put equal amount of work on every worker node. The problem is that even if the `RpcAgent::sync` is implemented as an empty method. There is no termination misbehavior detected. # Solution At least add one imbalanced-loaded test. ghstack-source-id: 91785984 Differential Revision: D5361435 fbshipit-source-id: 92d1f7cad61b27cdeadc2825ceab6e88d5e4b459	2019-10-15 16:45:21 -07:00
Sebastian Messmer	7b06f958cf	Update GCC for centos CI builds (#28018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28018 We need a newer GCC, GCC 4 is discontinued. ghstack-source-id: 91953133 Test Plan: waitforsandcastle Differential Revision: D17935286 fbshipit-source-id: 12f584d4a240453c62a854438b8579c1cbfd1e94	2019-10-15 16:37:56 -07:00
Zachary DeVito	cf43aa3e16	add type refinements for isinstance checks (#27772 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27772 This replaces unchecked_unwrap_optional with unchecked_cast. This enables the generalization of type refinement so that it works for isinstance checks as well. This also removes unchecked_unwrap_optional from code we generate, which is good because it is a hard op to serialize well since it doesn't directly encode the Optional[T] being unwrapped. In contrast, unchecked_cast always explicitly lists the type. Test Plan: Imported from OSS Differential Revision: D17885424 Pulled By: zdevito fbshipit-source-id: ce81077d6fbeaf2a802a2e0b17349aca85670466	2019-10-15 16:00:42 -07:00
Zachary DeVito	5d26ba08b7	Remove unnecessary Node* closures in operator registration Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27007 Test Plan: Imported from OSS Differential Revision: D17696525 Pulled By: zdevito fbshipit-source-id: b329b77afa0e6dbe9cb920a98cf07bb329d01023	2019-10-15 16:00:38 -07:00
Zachary DeVito	3de34744b3	Make PythonPrint a class (#26787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26787 A follow up PR will remove the need to issue import statements, or write classes in order since they are no longer needed. This change allows the same PythonPrint class to be used for an entire file which will be needed in that patch. Test Plan: Imported from OSS Differential Revision: D17566440 Pulled By: zdevito fbshipit-source-id: 1ee896da0cdfe6a003298e1d4b0238403b9ed6dd	2019-10-15 16:00:34 -07:00
Zachary DeVito	f62c8f48e8	remove dead LEGACY_PythonPrint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26786 Test Plan: Imported from OSS Differential Revision: D17566439 Pulled By: zdevito fbshipit-source-id: ae42b67fc00f9b1bb6ceb81bf278d213636c7f07	2019-10-15 16:00:30 -07:00
Edward Yang	2aa84d927b	Revert D17939700: Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) Test Plan: revert-hammer Differential Revision: D17939700 Original commit changeset: 4fc6156ba388 fbshipit-source-id: dded0a2140d2c14cd2f2a574987ecc164b0e5bfe	2019-10-15 15:24:36 -07:00
Edward Yang	c44e33b578	Revert D17889288: [pytorch][PR] Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) Test Plan: revert-hammer Differential Revision: D17889288 Original commit changeset: 9ddffe4dbbfa fbshipit-source-id: 4fc6156ba38834512b2f735ac0d03e34e69b7286	2019-10-15 14:35:28 -07:00
Dominik1123	5797f5dd27	Update 'einsum' docstring to conform to PEP-484 (#27563 ) Summary: [PEP-484](https://www.python.org/dev/peps/pep-0484/#arbitrary-argument-lists-and-default-argument-values) specifies that arbitrary argument lists, here `*operands`, should be annotated with the type of the single arguments, i.e. not indicating that the whole thing is wrapped into a `list` (which is a Python internal anyway). The previous docstring caused problems with type checkers for IDEs such as PyCharm ([see here](https://youtrack.jetbrains.com/issue/PY-38035)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/27563 Differential Revision: D17904748 Pulled By: soumith fbshipit-source-id: 0a7fcbbb12e388e6fc40d48bf533652a96024757	2019-10-15 14:35:24 -07:00
Edward Yang	a6cbbd2196	Revert D17843468: Save Docker image to workspace instead of pushing to ECR. Test Plan: revert-hammer Differential Revision: D17843468 Original commit changeset: c3f549e562c6 fbshipit-source-id: abb61692238c8b3ad54d31ef6bffe42ecc2f090e	2019-10-15 14:32:22 -07:00
Will Feng	964d3d8b38	Revert D17822962: [pytorch][PR] Make JIT Serialization support arbitrary std::function<> IO Test Plan: revert-hammer Differential Revision: D17822962 Original commit changeset: d344a7e59707 fbshipit-source-id: ba153a2110faf91d103bd0f8dea4e9613bd6b0da	2019-10-15 13:55:11 -07:00
Will Feng	fd3d6587e6	Make TripletMarginLossImpl subclass from Cloneable (#27956 ) Summary: Continuing from https://github.com/pytorch/pytorch/pull/27770 to make all `torch::nn` layers subclass from `torch::nn::Cloneable`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27956 Differential Revision: D17936555 Pulled By: yf225 fbshipit-source-id: 75f7982e7893675cf6da0f5419224b92af579818	2019-10-15 13:38:39 -07:00
Vitaly Fedyunin	d39ab0312a	Add memory_format support `to` and `type` operators (#27107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27107 Adds memory_format keyword argument (positional for cpp). 'Preserve' behavior now follows next rules: 1) If tensor is non-overlapping and dense - output tensor will have the same strides as input tensor. 2) If not (1) and tensor is stored in the channels last format, output tensor going to have channels last format. 3) Output tensor is going to be contiguous in all other cases. --- Dense tensor is the tensor that store values in a contiguous block of memory. Non-overlapping tensor is the tensor in which elements occupy individual non-repetitive memory. Test Plan: Imported from OSS Differential Revision: D17931062 Pulled By: VitalyFedyunin fbshipit-source-id: 2c5dd3dd05bf58a9a29f25562cd45190b009c3f9	2019-10-15 12:55:56 -07:00
Jeremy Lilley	cbe5ab1109	Make JIT Serialization support arbitrary std::function<> IO (#27586 ) Summary: Right now, torch::save() uses std::ostream, which results in unnecessary data copies in practice. Similar for torch::load(). Adding a std::function<size_t(const void*, size_t)> as an output option, parallel to the existing filename and std::ostream apis, gives users the flexibility to emit directly to a backing store. For a simple case of appending the output to a std::string, we observe significant benchmark savings (on order of -50%), even with the minor std::function<> dispatch overhead. The main reason is that std::ostringstream effectively requires 2 extra copies of the data beyond a simple string.append lambda. We also provide a parallel api for the load(), though this one is slightly more complex due to the need to do arbitrary position reads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27586 Test Plan: buck test mode/dev-nosan caffe2/test/... (Basic serialization test in caffe2/test/cpp/api/serialize.cpp) Benchmark in experimental/jeremyl/c2/SerializationBench.cpp, with D17823443 (1M time goes from 90ms -> 40ms, albeit with crc patch applied) Differential Revision: D17822962 Pulled By: jjlilley fbshipit-source-id: d344a7e59707f3b30d42280fbab78f87399e4d10	2019-10-15 12:39:58 -07:00
Pritam Damania	d482ed44f5	Fix test_docs_coverage. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27888 Test Plan: unit tests. Reviewed By: ezyang Differential Revision: D17911956 fbshipit-source-id: 141e2f883176a2c743514b9b3ab5272e5ea230e4	2019-10-15 12:20:13 -07:00
Mingzhe Li	182abb2580	accept -1 in iterations and warmup iterations (#28014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28014 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations -1 --warmup_iterations -1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 30827.046 ... Reviewed By: hl475 Differential Revision: D17932071 fbshipit-source-id: e4d9d256a0a4958110f61af13afdde70fc0f746c	2019-10-15 11:55:37 -07:00
Thomas Viehmann	f461184505	Use grad_out for cudnn CTC loss (#27039 ) Summary: Using grad_out for CuDNN CTC loss fixes: https://github.com/pytorch/pytorch/issues/26797, https://github.com/pytorch/pytorch/issues/25833. We also fix a cudnn incompatible change that surfaced during the testing: As of CuDNN 7.6 the semantics of the CTC loss gradients are different. This leads us to disable CuDNN CTC for CuDNN < 7.6. To mitigate the impact on users, we convert the parameters for the native implementation if CuDNN isn't applicable (previously this would give an error.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27039 Differential Revision: D17910815 Pulled By: ngimel fbshipit-source-id: 465b33612d3402f10c355aa7026a7e1ffaef3073	2019-10-15 11:36:37 -07:00
Jeremy Lilley	7e8420b7f6	Buffer to speed Unpickler (#27727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27727 This change uses a small buffer in the Unpickler to avoid calling reader_() byte-by-byte. Particularly, the unpickler has a tight loop reading 1-byte opcodes. This can be more efficient because we avoid the variable-sized memcpy (due to templating) and std::function indirection for the common fast path. This improves the unpickle-1m-ints benchmark by ~20%. This change requires changing the std::function<> interface to Unpickler to return size_t rather than bool, but there are only a few uses of this api. Test Plan: buck test caffe2/test/... benchmark in experimental/jeremyl/c2/SerializationBench Differential Revision: D17869980 fbshipit-source-id: 37e752744d19e12b7282252c8963355970bd4feb	2019-10-15 11:32:53 -07:00
Richard Zou	b65540cc27	Remove named tensor builds from CI (#27762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27762 These are now unnecessary because all of the named tensor tests run on regular CI. Test Plan: - verify that there are no named tensor builds on this PR. Differential Revision: D17915432 Pulled By: zou3519 fbshipit-source-id: 64b0c0bc41af65762fa953b273c64f1b338b80ca	2019-10-15 11:24:27 -07:00
Rohan Varma	1054ab213d	improve error message for scatter in processGroupGloo (#27458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27458 Same as the previous diff - improve error message by passing back the size discrepancy. ghstack-source-id: 91864213 Test Plan: `python test/test_c10d.py` Differential Revision: D17785296 fbshipit-source-id: f939b8091aede768ea215f69df2c83e438c430cf	2019-10-15 11:09:47 -07:00
Divyansh Singhvi	3397d41b8a	Wrapping namespace Reduction in namespace at (#26606 ) (#27422 ) Summary: 1) Wrapped namespace `Reduction` in namespace `at` 2) Prefixed `at::` wherever `Reduction::` is used Pull Request resolved: https://github.com/pytorch/pytorch/pull/27422 Differential Revision: D17913759 Pulled By: yf225 fbshipit-source-id: 8f00ca01cad2e7f673d316b128abf59c026e216c	2019-10-15 11:05:40 -07:00
Hong Xu	e6a71405a0	Let logical_xor support non-bool tensors (again) (#27248 ) Summary: f362a5a04b3708355b08e5c1285e46ca1b537ad6 reverted 5ca612b55ec1205f98e6bc5d5e64b1bf35f3b3cd due to build time conerns (also see https://github.com/pytorch/pytorch/issues/25254). Now we come back to this by reusing the underlying code in comparison operators: Logical operators on non-bool variables are essentially comparison operators that semantically output bool values. Compared with the previous implementation, we compromise by always applying XOR on the same input type, while output can be either the input type or the bool type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27248 Differential Revision: D17929356 Pulled By: ezyang fbshipit-source-id: dbac08c7614b36f05d24c69104fee9df9ca523d5	2019-10-15 10:56:32 -07:00
Long Jin	76bf8f62f7	fix loss_weight for self_supervision Summary: previously loss_weight is not used correctly for self-supervision branch Test Plan: buck test mode/dev-nosan //caffe2/caffe2/fb/dper/layer_models/models/experimental/tests:tum_test Reviewed By: xianjiec Differential Revision: D17862312 fbshipit-source-id: 554b793a5caa3886946c54333c81a0d8a10230d9	2019-10-15 10:40:48 -07:00
Will Feng	801b6cd0bd	Allow passing undefined Tensor to Module::register_parameter (#27948 ) Summary: C++ API `Module::register_parameter` should accept undefined Tensor as parameter, which is equivalent to `module.register_parameter("param", None)` in Python API. This unblocks https://github.com/pytorch/pytorch/pull/26082 and https://github.com/pytorch/pytorch/pull/27156. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27948 Differential Revision: D17931739 Pulled By: yf225 fbshipit-source-id: 21bdfc88e66e3dc39f3caf608a6a3de48c510fa9	2019-10-15 10:10:42 -07:00
Leo Bouillet	70838ad08b	Fix typo in TransformerEncoder and TransformerEncoderLayer documentation (#26230 ) Summary: Fixes a few small typos in the documentation, changing "endocder" to "encoder" and "sequnce" to "sequence" Pull Request resolved: https://github.com/pytorch/pytorch/pull/26230 Differential Revision: D17910820 Pulled By: vincentqb fbshipit-source-id: 58c63f8dbbd8e2079201d4485a0d4ef323ecfb49	2019-10-15 10:07:22 -07:00
Sebastian Messmer	ef8bcfe2c7	Revert D17488861: constexpr type ids Test Plan: revert-hammer Differential Revision: D17488861 Original commit changeset: ce7b059d7c86 fbshipit-source-id: 426fca9abe7122190fc17ac6976bc6bcbd5718df	2019-10-15 09:59:21 -07:00
Sebastian Messmer	1865f31efa	Revert D17490109: Remove preallocation of type ids Test Plan: revert-hammer Differential Revision: D17490109 Original commit changeset: 800c340d9d35 fbshipit-source-id: a3e39bbce53c828fe553379d9f2b66dc8a07c982	2019-10-15 09:59:17 -07:00
Sebastian Messmer	cf01f53b5a	Remove preallocation of type ids (#26509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26509 We preallocated type ids to align them with ScalarType. At that point, the maximum type id was 10 and we used 11 to specify undefined type id, see https://github.com/pytorch/pytorch/pull/10139. However, since then, ScalarType got more additions, 11 isn't undefined anymore, and numbers 11-15 have meaning. caffe2::TypeIdentifier also got its separate additions, 12 and upwards have meaning that differs from ScalarType. I'm going with the (CI-tested) assumption that caffe2::TypeIdentifier and ScalarType actually don't need to be aligned and remove the functionality for preallocated type ids. This simplifies our type ids. ghstack-source-id: 91896918 Test Plan: unit tests Differential Revision: D17490109 fbshipit-source-id: 800c340d9d3556a99f6e3ffc33af14ad68d7cc59	2019-10-15 08:47:13 -07:00
Sebastian Messmer	6f865c1e37	constexpr type ids (#26502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26502 Create type ids at compile time instead of incrementing a counter at runtime. This is done by computing a compile time crc64 on the type name. We couldn't do this before, because we still used GCC4 and that compiler didn't support the use of `__PRETTY_FUNCTION__` in a constexpr context. However, since GCC5 this is possible and we can use this trick. This does not change the semantics of preallocated type ids. I actually think we don't need to preallocate anymore, but I split the removal of preallocation into a separate diff to be able to test it separately. ghstack-source-id: 91896920 Test Plan: unit tests Differential Revision: D17488861 fbshipit-source-id: ce7b059d7c8686b69cb091a4a8beaf4b96391343	2019-10-15 08:47:09 -07:00
svcscm	f1d4e887e0	Updating submodules Summary: GitHub commits: `62569d9749` `bc04fdfec2` `6e9db9ddcf` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 77f4dda3bef09cfb6049a3bf5715821390cdecc1	2019-10-15 08:45:02 -07:00
Andreas Koepf	9033ace9c4	Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#27673 ) Summary: Replaces fused TH kernels with a 2-liner of regular Tensor functions. Benchmarking revealed that performance improves compared to PyTorch 1.2. Refs: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765 VitalyFedyunin ### Benchmarking results on my laptop: ## 1.4.0a0+f63c9e8 output ``` PyTorch version: 1.4.0a0+f63c9e8 CPU Operator sanity check: tensor(0.5926, grad_fn=<MeanBackward0>) tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262, -0.0078, -0.0129]) double backward tensor(-0.1540, grad_fn=<SumBackward0>) ok GPU Operator sanity check: tensor(0.5601, device='cuda:0', grad_fn=<MeanBackward0>) tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238, -0.0054, -0.0151], device='cuda:0') double backward tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>) ok CPU warmup 1000 took 9.025700273923576e-05 CPU warmup 10000 took 0.0009383050055475906 CPU warmup 100000 took 0.0015631120040779933 CPU warmup TOTAL time 0.0026368020044174045 CPU forward 1000 took 6.919399311300367e-05 CPU forward 10000 took 0.00014462800754699856 CPU forward 100000 took 0.0011234670091653243 CPU forward 1000000 took 0.014555767003912479 CPU forward 10000000 took 0.13409724000666756 CPU forward 100000000 took 1.246048310000333 CPU forward TOTAL time 1.3961777170043206 CPU for- & backward 1000 took 0.0003219560021534562 CPU for- & backward 10000 took 0.00037290599721018225 CPU for- & backward 100000 took 0.001975035003852099 CPU for- & backward 1000000 took 0.02621342398924753 CPU for- & backward 10000000 took 0.2944270490115741 CPU for- & backward 100000000 took 1.6856628700043075 CPU for- & backward TOTAL time 2.0091958299890393 GPU warmup 1000 took 0.0002462909906171262 GPU warmup 10000 took 9.991199476644397e-05 GPU warmup 100000 took 0.00034347400651313365 GPU warmup TOTAL time 0.0007382350013358518 GPU forward 1000 took 9.67290106927976e-05 GPU forward 10000 took 9.349700121674687e-05 GPU forward 100000 took 9.384499571751803e-05 GPU forward 1000000 took 0.0004975290066795424 GPU forward 10000000 took 0.0017606960027478635 GPU forward 100000000 took 0.003572814996005036 GPU forward TOTAL time 0.006185991995153017 GPU for- & backward 1000 took 0.00035818999458570033 GPU for- & backward 10000 took 0.0003240450023440644 GPU for- & backward 100000 took 0.0003223370003979653 GPU for- & backward 1000000 took 0.00036740700306836516 GPU for- & backward 10000000 took 0.0003690610028570518 GPU for- & backward 100000000 took 0.0003672500024549663 GPU for- & backward TOTAL time 0.002197896988946013 ``` ## 1.2 output ``` PyTorch version: 1.2.0 CPU Operator sanity check: tensor(0.5926, grad_fn=<SoftMarginLossBackward>) tensor([-0.0159, -0.0170, -0.0011, -0.0083, -0.0140, -0.0217, -0.0290, -0.0262, -0.0078, -0.0129]) double backward tensor(-0.1540, grad_fn=<SumBackward0>) ok GPU Operator sanity check: tensor(0.5601, device='cuda:0', grad_fn=<SoftMarginLossBackward>) tensor([-0.0393, -0.0316, -0.0233, -0.0140, -0.0141, -0.0161, -0.0322, -0.0238, -0.0054, -0.0151], device='cuda:0') double backward tensor(-0.2148, device='cuda:0', grad_fn=<SumBackward0>) ok CPU warmup 1000 took 8.422900282312185e-05 CPU warmup 10000 took 0.00036992700188420713 CPU warmup 100000 took 0.003682684007799253 CPU warmup TOTAL time 0.004169487991021015 CPU forward 1000 took 5.521099956240505e-05 CPU forward 10000 took 0.00036948200431652367 CPU forward 100000 took 0.003762389998883009 CPU forward 1000000 took 0.03725024699815549 CPU forward 10000000 took 0.3614480490068672 CPU forward 100000000 took 3.6139175269927364 CPU forward TOTAL time 4.016912263003178 CPU for- & backward 1000 took 0.0002734809968387708 CPU for- & backward 10000 took 0.0006605249946005642 CPU for- & backward 100000 took 0.005437346000690013 CPU for- & backward 1000000 took 0.051245586000732146 CPU for- & backward 10000000 took 0.5291594529990107 CPU for- & backward 100000000 took 5.23841712900321 CPU for- & backward TOTAL time 5.8253340990049765 GPU warmup 1000 took 0.0005757809994975105 GPU warmup 10000 took 0.0004058420017827302 GPU warmup 100000 took 0.0003764610009966418 GPU warmup TOTAL time 0.0013992580061312765 GPU forward 1000 took 0.0003543390048434958 GPU forward 10000 took 0.0003633670130511746 GPU forward 100000 took 0.0004807310033356771 GPU forward 1000000 took 0.0005875999922864139 GPU forward 10000000 took 0.0016903509967960417 GPU forward 100000000 took 0.014400018990272656 GPU forward TOTAL time 0.0179396449966589 GPU for- & backward 1000 took 0.0006167769897729158 GPU for- & backward 10000 took 0.0006845899915788323 GPU for- & backward 100000 took 0.000631830989732407 GPU for- & backward 1000000 took 0.0010741150035755709 GPU for- & backward 10000000 took 0.0017265130009036511 GPU for- & backward 100000000 took 0.014847910992102697 GPU for- & backward TOTAL time 0.01965981800458394 ``` ### Code used for performance test ``` import torch import torch.nn.functional as F import torch.nn as nn from timeit import default_timer torch.manual_seed(0) cpu = torch.device('cpu') gpu = torch.device('cuda') loss_fn = F.soft_margin_loss def run_benchmark(name, depth, require_grad, device, fn): total_start = default_timer() for i in range(3, 3 + depth): start = default_timer() n = 10 ** i a = torch.rand(n, requires_grad=require_grad, device=device) b = torch.rand(n, device=device) fn(a, b) end = default_timer() print('{} {} took {}'.format(name, n, end-start)) total_end = default_timer() print('{} TOTAL time {}'.format(name, total_end-total_start)) def fwd_only(a, b): out = loss_fn(a, b) def fwd_bck(a, b): out = loss_fn(a, b) out.backward() def sanity_check(name, device): print('{} Operator sanity check:'.format(name)) a = torch.rand(10, requires_grad=True, device=device) b = torch.rand(10, device=device) out = loss_fn(a,b) print(out) out.backward() print(a.grad) print('double backward') loss = loss_fn(a, b) loss2 = torch.autograd.grad(loss, a, create_graph=True) z = loss2[0].sum() print(z) z.backward() print('ok') print() print('PyTorch version:', torch.__version__) sanity_check('CPU', cpu) sanity_check('GPU', gpu) print() run_benchmark('CPU warmup', 3, False, cpu, fwd_only) run_benchmark('CPU forward', 6, False, cpu, fwd_only) run_benchmark('CPU for- & backward', 6, True, cpu, fwd_bck) print() run_benchmark('GPU warmup', 3, False, gpu, fwd_only) run_benchmark('GPU forward', 6, False, gpu, fwd_only) run_benchmark('GPU for- & backward', 6, True, gpu, fwd_bck) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27673 Differential Revision: D17889288 Pulled By: ezyang fbshipit-source-id: 9ddffe4dbbfab6180847a8fec32443910f18f0a9	2019-10-15 08:44:57 -07:00
DuckSoft	498ca083a6	adding IterableDataset to dataset.pyi (#27966 ) Summary: this shall fix https://github.com/pytorch/pytorch/issues/27820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27966 Differential Revision: D17929633 Pulled By: ezyang fbshipit-source-id: ff3e0fb7f998b0771183288200c0859eb5f381dd	2019-10-15 08:41:59 -07:00
Edward Yang	ba7919601f	Save Docker image to workspace instead of pushing to ECR. (#26720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26720 I'm trying to get rid of the need for CI jobs to have write access to ECR. Right now, they require write access because they push intermediate build results, which then get sucked down by downstream jobs. Instead of pushing back to ECR, we could just save them to CircleCI workspace. (There are some downsides to this approach: in particular, we save ALL layers to the workspace, not the new layers.) My original idea was to save to `~/built_image.tgz` and then load it. Unfortunately, the Android tests have a substantially more complicated Docker structure which means my simple idea doesn't work. The current structure is that there are instantiations of `pytorch_linux_build` per configuration (e.g., `x86_32`, `x86_64`, ...). Then `gradle_build` collates these Docker images together and combines them to publish. To handle this case, the upstream jobs have to save Docker images to distinct filenames in the workspace for the load to work correctly. This is achieved by adding a new parameter to `pytorch_linux_build`, `saved_docker_filename`, which specifies where to put the image. Additionally, to pass this parameter to the jobs, I stopped using configuration generation for this case, as I couldn't figure out how to get the generator to conditionally add another line to the YAML for this case. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17843468 Pulled By: ezyang fbshipit-source-id: c3f549e562c691b8f3f447705d4717c1fbb64040	2019-10-15 08:39:05 -07:00
Richard Zou	817cb4182e	Fix Sphinx warning about '_images' not existing (#27927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27927 This fixes `WARNING: html_static_path entry '_images' does not exist` by removing '_images' from conf.py. As far as I can tell, '_images' in `html_static_path` is only necessary if images already exist in the `_images` folder; otherwise, sphinx is able to auto-generate _images into the build directory and populate it correctly. Test Plan: - build and view the docs locally. Differential Revision: D17915109 Pulled By: zou3519 fbshipit-source-id: ebcc1f331475f52c0ceadd3e97c3a4a0d606e14b	2019-10-15 07:50:26 -07:00
zou3519	e5d6b75319	Bag of documentation fixes; fix more sphinx warnings (#27850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27850 Many of these are real problems in the documentation (i.e., link or bullet point doesn't display correctly). Test Plan: - built and viewed the documentation for each change locally. Differential Revision: D17908123 Pulled By: zou3519 fbshipit-source-id: 65c92a352c89b90fb6b508c388b0874233a3817a	2019-10-15 07:31:14 -07:00
vishwakftw	ad47788647	Add Polygamma to the docs (#27696 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25347 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27696 Differential Revision: D17916790 Pulled By: ezyang fbshipit-source-id: ac2635a300b1ef0ab437e3ffac152239754fe828	2019-10-15 07:00:57 -07:00
Shen Li	f10ea7a2e1	Add test for requires_process_group_agent decorator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27879 Test Plan: Imported from OSS Differential Revision: D17924096 Pulled By: mrshenli fbshipit-source-id: 91aaad12daf985768dfb05fb9630cee21a81a366	2019-10-15 06:57:34 -07:00
svcscm	19d83ab800	Updating submodules Summary: GitHub commits: `f40e2d0d42` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: a36a52442fb8a578d89e298bf1059ace42d7959a	2019-10-14 21:29:45 -07:00
Xiaomeng Yang	8b87f9a510	Add fused layer norm impl on CUDA in PyTorch (#27634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27634 Add fused layer norm impl on CUDA in PyTorch Performance benchmark compare to apex.FusedLayerNorm on a V100 machine. ************************************ Shape = (128, 2097152) curr LayerNorm forward: 7.252584544941783ms apex LayerNorm forward: 10.366813436849043ms curr LayerNorm backward: 15.568048988003284ms apex LayerNorm backward: 20.869979876093566ms ********************************** Shape = (256, 1048576) curr LayerNorm forward: 5.185673736967146ms apex LayerNorm forward: 6.3868385690730065ms curr LayerNorm backward: 13.942008479032665ms apex LayerNorm backward: 15.469660016940907ms ********************************** Shape = (512, 524288) curr LayerNorm forward: 4.672068868065253ms apex LayerNorm forward: 4.717993081081659ms curr LayerNorm backward: 13.46354596503079ms apex LayerNorm backward: 14.04774487693794ms ********************************** Shape = (1024, 262144) curr LayerNorm forward: 4.547273400006816ms apex LayerNorm forward: 5.378365494078025ms curr LayerNorm backward: 13.425063178874552ms apex LayerNorm backward: 14.235145597020164ms ********************************** Shape = (2048, 131072) curr LayerNorm forward: 4.526399010093883ms apex LayerNorm forward: 4.775081946980208ms curr LayerNorm backward: 13.222738380078226ms apex LayerNorm backward: 13.59594238596037ms ********************************** Shape = (4096, 65536) curr LayerNorm forward: 4.28789056581445ms apex LayerNorm forward: 4.48913648002781ms curr LayerNorm backward: 13.026655421825126ms apex LayerNorm backward: 13.57052089786157ms ********************************** Shape = (8192, 32768) curr LayerNorm forward: 4.243518367875367ms apex LayerNorm forward: 4.34588153520599ms curr LayerNorm backward: 13.140627697808668ms apex LayerNorm backward: 13.49891544203274ms ********************************** Shape = (16384, 16384) curr LayerNorm forward: 4.181216162163764ms apex LayerNorm forward: 4.268723972840235ms curr LayerNorm backward: 13.035593512002379ms apex LayerNorm backward: 13.463351831072941ms ************************************ Shape = (32768, 8192) curr LayerNorm forward: 4.097899778978899ms apex LayerNorm forward: 4.109480210812762ms curr LayerNorm backward: 13.041268918896094ms apex LayerNorm backward: 13.586135944118723ms Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "LayerNorm" Reviewed By: houseroad Differential Revision: D17462420 fbshipit-source-id: d4a67d160bb4eff73ffac64af46c56c3845cf211	2019-10-14 21:26:33 -07:00
Zachary DeVito	30d9316f35	refactor tryMatchSchema (#26499 ) (#27773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27773 We've changed how these functions are used over time, so I cleaned up the header file API to match. In particular: * tryMatchSchemas was added since the overload logic got copy/pasted into three separate locations. * With this change, tryMatchSchema is no longer public, as it is not needed outside of tryMatchSchemas * emitBuiltinFunction no longer needs a requires argument (it was always true) * Argument order for all the schema matching stuff now puts the 'self' builtin override last. This is only rarely used and was inconsistent with matchSchema Test Plan: Imported from OSS Differential Revision: D17885425 Pulled By: zdevito fbshipit-source-id: 064bc9fa4bd57b2e5366fff9f3c6ab9b9945e08b	2019-10-14 20:45:25 -07:00
Michael Suo	09464a7bf5	cleanup lint scripts a bit (#27805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27805 The expressions syntax for actions is pretty cool! Using it to clean up some of my convoluted checks from before Test Plan: Imported from OSS Differential Revision: D17909353 Pulled By: suo fbshipit-source-id: 8b9a85476ba19452f48c532a2daed830f074088a	2019-10-14 20:19:48 -07:00
Will Feng	11172c19be	codemod at::ArrayRef and torch::IntArrayRef to std::vector in C++ API tests (#27884 ) Summary: `at::ArrayRef` / `torch::IntArrayRef` should be discouraged in user code, because users might not be aware of the fact that it doesn't own the underlying data, which already leads to memory access bugs when they try to write the following: ```cpp auto expected_sizes = torch::IntArrayRef({2, 16, 6}); // The memory that represents `{2, 16, 6}` is released after this line ASSERT_EQ(output.sizes(), expected_sizes); // `expected_sizes` is pointing to invalid memory region ``` This PR changes all usage of `at::ArrayRef` and `torch::IntArrayRef` to the corresponding `std::vector` version, so that users won't pick up the habit of using `ArrayRef` by looking at the test code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27884 Differential Revision: D17921646 Pulled By: yf225 fbshipit-source-id: 461e79fc22b598aac230d36cc028085ce6cbe937	2019-10-14 18:00:30 -07:00
Michael Suo	a4a5b6fcaa	Revert D17913708: [pytorch][PR] [JIT] throw on custom forward for module containers Test Plan: revert-hammer Differential Revision: D17913708 Original commit changeset: 1cc2a8a4b573 fbshipit-source-id: 19ad68a1b0fd8e0f17e1b7ab92879106517e13d2	2019-10-14 17:48:31 -07:00
jiej	0af60a5c06	(#27299 ) Summary: Removing in-place operator for num_batches_tracked increment. The in-place operator used here turns out to block many optimization opportunities due to alias assumption for inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27299 Differential Revision: D17909341 Pulled By: ngimel fbshipit-source-id: 7d635be94dfd2002af435acf6ea71995adaa40f6	2019-10-14 17:48:27 -07:00
Shihao Xu	937e3f1db4	Enable RRef tests for other RPCAgent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27789 Differential Revision: D5444828 fbshipit-source-id: a2fa5a603e4b2970755bc5d16f6b2c84d65b0811	2019-10-14 17:42:23 -07:00
Owen Anderson	66f74783c3	Eliminate unnecessary Tensor refcount bumps. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27886 Differential Revision: D17915160 fbshipit-source-id: 3b6a6d89b71cb576f0bd6d330b884926d1ce659f	2019-10-14 16:31:02 -07:00
Alyssa Wang	4b1096c652	Fix predict net issue with LRU hash eviction Summary: We are seeing error "[enforce fail at BlackBoxPredictor.cpp:134] ! !parameter_workspace->HasBlob(out). Net REMOTE of type predict_net writes to blob cat/NGRAM_QRT_VERSIONS_x_EVENT_TYPE_AUTO_FIRST_X/Pool_Option_0/Repeat_0/sparse_lookup/w which exists in the parameter workspace" in online testing for calibration models. I'm suspecting it's due to the op CopyRowsToTensorOp are being used in prediction Test Plan: f143080108 offline predict net does not contain CopyRowsToTensorNet, which looks right. Waiting for Olga to test online behavior dper2 canary: https://fburl.com/fblearner/sv3o3yj1 Differential Revision: D17741823 fbshipit-source-id: 19721b632b5ea9ebfa1ef9ae0e99d3a10c926287	2019-10-14 16:08:14 -07:00
Michael Suo	aaedf1b38b	break out test_recursive_script (#27819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27819 The idea here is to preserve the fact that `test_jit.py` contains all the JIT tests. So we import `JitTestCase`s from `jit/` into `test_jit.py` so that the test loader will find and run them when you do `python test_jit.py`. This also means that things like `-k` will work as expected. The individual test files in `jit/` will throw if run directly, to prevent cases where the CI accidentally runs multiple versions of the same test. Differential Revision: D17898105 Test Plan: Imported from OSS Pulled By: suo fbshipit-source-id: 0cd6f8421c86c90a6e1bae33a3fdbe998f570e07	2019-10-14 16:00:35 -07:00
Michael Suo	151483e25d	move import_class_test files around (#26722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26722 Put them in a directory under jit/ to prep for test splitting Test Plan: Imported from OSS Differential Revision: D17550582 Pulled By: suo fbshipit-source-id: a592b671ffe808f02d0a597d441bd98a18c9109e	2019-10-14 16:00:31 -07:00
Mingzhe Li	382917bbd1	report per iteration execution time (#27923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27923 As title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 --ai_pep_format true # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.027768373489379883"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.02661752700805664"} PyTorchObserver {"type": "add_M64_N64_K64_cpu", "metric": "latency", "unit": "us", "value": "0.026746749877929688"} ... Reviewed By: hl475 Differential Revision: D17911718 fbshipit-source-id: 6fe28f2ab9ce1e0feabb5b822f04ff32dac977a9	2019-10-14 15:44:42 -07:00
Ilia Cherniavskii	7929a4157a	Fix TBB builds (#27937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27937 Fix TBB buidls Test Plan: ATEN_THREADING=TBB USE_TBB=1 python setup.py develop --cmake Differential Revision: D17916565 Pulled By: ilia-cher fbshipit-source-id: 292f07bcff63ae611299383d16527e8e24412102	2019-10-14 15:41:30 -07:00
Edward Yang	104bb57c43	Run all docker images with --cap-add=SYS_PTRACE --security-opt seccomp=unconfined (#27787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27787 This makes it possible to directly run gdb after 'docker exec'ing into a Docker image run from CircleCI (useful if you're doing the rerun with SSH workflow). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17889312 Pulled By: ezyang fbshipit-source-id: 522a75be18be69ff6ad83d47185ae3068bf725d4	2019-10-14 14:02:28 -07:00
Zafar Takhirov	93030f68be	Changing the hypothesis dev verbosity to 'normal' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27781 Test Plan: Imported from OSS Differential Revision: D17887043 Pulled By: zafartahirov fbshipit-source-id: be22c417cef5a00b702e2e54e065ea0449208fc0	2019-10-14 13:44:34 -07:00
Carlos Miranda	2cae3928b0	Multi-Label Soft Margin loss (#27669 ) Summary: In accordance with https://github.com/pytorch/pytorch/issues/25883, I added the `MultiLabelSoftMarginLoss` module and `multilabel_soft_margin_loss` functional. It looks like there isn't a C++ ATen implementation of `multilabel_soft_margin_loss`, so I translated the python version, which does not rely on a C/C++ backend either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27669 Differential Revision: D17907608 Pulled By: yf225 fbshipit-source-id: ccb02951e009973c2adbe604593ce929f10c39eb	2019-10-14 13:29:45 -07:00
jon-tow	0003771423	C++ API parity: Unfold (#27809 ) Summary: Adds `unfold` functional and module support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27809 Differential Revision: D17901792 Pulled By: yf225 fbshipit-source-id: ff58a1866bf240f37ebc589463c60593b8931f51	2019-10-14 13:21:59 -07:00
James Reed	fdea0cbe40	s/TestEndToEndHybridFrontendModels/TestModels/ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27877 Test Plan: Imported from OSS Differential Revision: D17909137 Pulled By: jamesr66a fbshipit-source-id: d8d730eed562b0f08caed7be302dd122af61e877	2019-10-14 13:13:30 -07:00
Elias Ellison	cd6b37afa7	throw on custom forward for module containers (#27763 ) Summary: Custom forwards of containers would silently not be compiled previously. Throw an error now instead. Fix for https://github.com/pytorch/pytorch/issues/26671 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27763 Differential Revision: D17913708 Pulled By: eellison fbshipit-source-id: 1cc2a8a4b57356ba7f007a95ede0a31e5d61aa82	2019-10-14 13:08:10 -07:00
Dmytro Dzhulgakov	169327f557	Add note that cuda quantization is not supported (#27829 ) Summary: People get confused with partial support otherwise: https://github.com/pytorch/pytorch/issues/27811 #27729 Suggestions on where else put warnings are welcomed (probably in tutorials - cc SethHWeidman ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27829 Differential Revision: D17910931 Pulled By: dzhulgakov fbshipit-source-id: 37a169a4bef01b94be59fe62a8f641c3ec5e9b7c	2019-10-14 11:25:51 -07:00
Michael Suo	4f6b567245	Remove sharding code from tests (#27818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27818 This has been turned off since january. Might as well clean it up. I want to do a bit of refactoring in this area. ghstack-source-id: 91827750 Test Plan: sandcastle Differential Revision: D17898077 fbshipit-source-id: e70bf8ee72b4703767d4e38f8c346a7849a866f5	2019-10-14 11:04:44 -07:00
Benny Chen	d23d62cb1e	Fix unaries to export fp16 instead of fp32 when rest of the model export to int8 Summary: Currently accelerators does not have the concept for fp32, it only has understandings of fp16 and int8 in terms of data input. In order to fixe the issue here, we want to make sure unaries are turned into fp16 when we have the int8 exporter turned on. Reviewed By: kennyhorror Differential Revision: D17743791 fbshipit-source-id: 7322d23eb12ac3f813b525fc0ddd066f95c8ca85	2019-10-14 10:51:17 -07:00
Rohan Varma	b5e0fd4c56	add known worker ids to distributed autograd context (#26324 ) Summary: Per https://github.com/pytorch/pytorch/issues/25525 we want to clean up distributed autograd context on all nodes, in addition to the local one. To do this, we want to send async RPCs to the other nodes telling them to clean up the context. The first step for this is for a node's context to know about the other workers. This PR does two things: 1) Adds the necessary data structures and getter functions to `DistAutogradContext` 2) Refactors calls to `addSendRpcBackward` to take in the `worker_id` as an additional argument Pull Request resolved: https://github.com/pytorch/pytorch/pull/26324 Differential Revision: D17769411 Pulled By: rohan-varma fbshipit-source-id: b7327d1209a574e2e88cb197edff3103024d51ad	2019-10-14 10:43:09 -07:00
Sebastian Messmer	5321f4553f	Remove GCC4 from CI (#26522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26522 Our binaries are already built using GCC5, so there is no reason to test for GCC4 anymore. This is an important prerequisite for switching to C++14, but even without the C++14 switch, this enables a gcc feature that I need for constexpr type ids. ghstack-source-id: 91851144 Test Plan: unit tests Differential Revision: D17494507 fbshipit-source-id: 7c0beb5e532ad9caa5cb02c1af26341c1017ff57	2019-10-14 09:51:50 -07:00
Gregory Chanan	524d9003f3	Kill unused THNN operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26972 Test Plan: Imported from OSS Differential Revision: D17628457 Pulled By: gchanan fbshipit-source-id: 009e2847b8ab6724f066a6f5a95b3324eceb3f30	2019-10-14 09:38:03 -07:00
Gregory Chanan	3714ca58d9	Kill more unused THCUNN operators. (#26971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26971 I believe this is currently exhaustive of the unused operators in THCUNN: LookupTable, SpatialSubSampling, Sqrt, Square, TemporalConvolution, TemporalMaxPooling. Test Plan: Imported from OSS Differential Revision: D17628380 Pulled By: gchanan fbshipit-source-id: a3ebd24765d00073e60212f6f664ec4a6d8c1d1b	2019-10-14 09:37:59 -07:00
Gregory Chanan	7583f87fa6	Kill a number of unused THCUNN operators. (#26970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26970 I believe these were only being (self)-referenced by direct THCUNN bindings, which were killed in the https://github.com/pytorch/pytorch/pull/25358 stack. This list is NOT exhaustive of what can be removed, and notably doesn't include THNN: Abs, DistKLDivCriterion, FeatureLPPooling, IndexLinear, L1Cost, LookupTableBag, MarginCriterion, SpatialConvolutionLocal, SpatialCrossMapLRn. Test Plan: Imported from OSS Differential Revision: D17628216 Pulled By: gchanan fbshipit-source-id: 0a0b17b446cf8ec9adef631f6f5c515182b560bb	2019-10-14 09:37:54 -07:00
StandbyMe	a23edd6b9c	Fix Type Errors in Examples about Named Tensor (#27828 ) Summary: `names` should be `tuple` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27828 Differential Revision: D17908112 Pulled By: zou3519 fbshipit-source-id: bd1454c5d6e6b690955f49380e34c4b0ddaf879b	2019-10-14 09:24:45 -07:00
vishwakftw	82a69a690f	Add documentation for torch.lgamma (#27812 ) Summary: Changelog: - Add doc string in _torch_docs.py, _tensor_docs.py - Expose in docs/source/torch.rst, docs/source/tensors.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/27812 Test Plan: - Remove `lgamma`, `lgamma_` from the blacklist Fixes https://github.com/pytorch/pytorch/issues/27783 Differential Revision: D17907630 Pulled By: ezyang fbshipit-source-id: 14e662a4e5262126889a437e5c4bfb21936730e8	2019-10-14 08:47:04 -07:00
Gaurav Tamba	cc5c34a0d0	Add nn::functional::normalize() to C++ Frontend (#27280 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/27048 PR Summary: Files Added: _torch/csrc/api/include/torch/nn/options/normalization.h torch/csrc/api/include/torch/nn/functional/normalization.h_ Files Modified: _test/cpp/api/functional.cpp torch/csrc/api/include/torch/nn/functional.h_ --- yf225 : I couldn't find a C++ equivalent of gradcheck(), is there such a function or is it sufficient to call .backward() in the test body? I don't think any solutions are checked for the Python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27280 Differential Revision: D17902109 Pulled By: yf225 fbshipit-source-id: 1bce1a88103d0f1848633fec90fde95ea8f3d1ed	2019-10-14 08:39:02 -07:00
Sebastian Meßmer	32c56747f7	Mention C++14 in the README (#26670 ) Summary: Technically, we don't need a C++14 compiler yet, but we will soon stop support for GCC 4. Requiring a "C++14" compiler excludes GCC 4, so it is a defensive statement. Some time later, we will actually require a C++14 compiler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26670 Differential Revision: D17907257 Pulled By: smessmer fbshipit-source-id: 5363d714f8d93597db008135f681b2e14d052fa0	2019-10-14 08:12:42 -07:00
Lei Zhang	0e8d4836e4	add feature name into module and update position weighted to match dper2 Test Plan: The notebook showed no diff for id score list https://our.intern.facebook.com/intern/anp/view/?id=154764 Reviewed By: alyssawangqq Differential Revision: D17649974 fbshipit-source-id: 84cb4ae372fc215295c2d0b139d65f4eacafae4a	2019-10-14 08:06:19 -07:00
Richard Zou	b7b73e43c0	Delete TEST_NAMEDTENSOR; run named tensor tests on all CIs (#27760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27760 There's nothing special about the named tensor tests that requires that they be run in their own CI job. In this PR we delete the TEST_NAMEDTENSOR flag that hides named tensor tests from regular jobs. In the future, we'll delete the named tensor CI job so that we do not duplicate signals. Test Plan: - wait for CI Differential Revision: D17882262 Pulled By: zou3519 fbshipit-source-id: f90c71cb939e53b8ea23f7e2ab95a5c41b8be0e3	2019-10-14 08:01:41 -07:00
Edward Yang	73521a0316	Roll more version numbers to 1.4.0 (#27751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27751 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17886488 Pulled By: ezyang fbshipit-source-id: 1c8d98b6f7ee3127ebec9a0b03132c38c97523c3	2019-10-14 07:16:27 -07:00
Michael Steininger	4bcedb6670	Mark sampler and batch_sampler arguments as optional in the DataLoader interface (#27821 ) Summary: Changelog: - DataLoader argument `sampler` is now of type `Optional[Sampler[int]]`instead of `Sampler[int]` - DataLoader argument `batch_sampler` is now of type `Optional[Sampler[Sequence[int]]]` instead of `Sampler[Sequence[int]]` Fixes https://github.com/pytorch/pytorch/issues/27737 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27821 Differential Revision: D17906623 Pulled By: ezyang fbshipit-source-id: 088cacbb7e9f7988995f40b71adc3e719815f5ad	2019-10-14 06:57:27 -07:00
Infinity	19df7e7e84	Fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27831 Differential Revision: D17904698 Pulled By: soumith fbshipit-source-id: 3923dd36bc29f0f6e814d299afd8eef224035ccd	2019-10-14 01:32:23 -07:00
Ethan Steinberg	848d1ba13a	Fix padding_idx in the new embedding cuda kernel. (#27731 ) Summary: The current embedding backwards CUDA kernel is somewhat broken. It effectively ignores padding_idx and also incorrectly drops an index from the input. This commit fixes that bug and fixes the unit test so that this behavior won't break in the future. This fixes https://github.com/pytorch/pytorch/issues/26302. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27731 Differential Revision: D17893803 Pulled By: ngimel fbshipit-source-id: 4ba02a17ec0e29a7016d65480d4ff0c276550616	2019-10-13 21:18:49 -07:00
jonmoon74	1c2cb6d523	Edits to ReadMe file (#27808 ) Summary: Grammar edits to the Readme file to make it read better in English Pull Request resolved: https://github.com/pytorch/pytorch/pull/27808 Differential Revision: D17901414 Pulled By: soumith fbshipit-source-id: 02e67289dafaf9280cb1c3bb2f37087cd134cc23	2019-10-13 17:09:02 -07:00
nuka137	07d4374239	C++ API: torch::nn::Softmax2d (#27509 ) Summary: Add torch::nn::Softmax2d module support for the C++ API. Softmax2d only supports module in Python API, so this PR adds only module support as well. This PR is WIP because it uses the function in https://github.com/pytorch/pytorch/issues/27446 . After https://github.com/pytorch/pytorch/issues/27446 is merged, I will remove WIP. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27509 Differential Revision: D17899715 Pulled By: yf225 fbshipit-source-id: bd891bc995f5a92bf4f5405f8bf07d1bd5de2479	2019-10-13 11:00:56 -07:00
PyExtreme	52528c041a	- TripletMarginLoss (#27713 ) Summary: Hi yf225 , I had to create a new branch to tackle merge conflict since I am using cloud due to some limitations on my PC. Therefore, I don't have enough command there. Also, I have incorporated the changes you have put before here https://github.com/pytorch/pytorch/pull/27613 Also, it would be great if you could recommend me some resources to work smmothly on GCP..:-D Thank you Pull Request resolved: https://github.com/pytorch/pytorch/pull/27713 Differential Revision: D17899695 Pulled By: yf225 fbshipit-source-id: eb6643223148774a5cbbd093bdcc5623872e5bba	2019-10-13 10:57:37 -07:00
zou3519	23bffc4f14	Fix most documentation warnings (#27782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27782 Warnings show up when running `make html` to build documentation. All of the warnings are very reasonable and point to bugs in our docs. This PR attempts to fix most of those warnings. In the future we will add something to the CI that asserts that there are no warnings in our docs. Test Plan: - build and view changes locally Differential Revision: D17887067 Pulled By: zou3519 fbshipit-source-id: 6bf4d08764759133b20983d6cd7f5d27e5ee3166	2019-10-13 10:34:01 -07:00
Pavel Belevich	446a79b959	C++ API parity: Threshold Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27538 Test Plan: Imported from OSS Differential Revision: D17835415 Pulled By: pbelevich fbshipit-source-id: 2a887704655be79ee458081c46a7eea31eca51dc	2019-10-13 09:38:31 -07:00
Pavel Belevich	cbdd55c669	C++ API parity: Tanhshrink Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27537 Test Plan: Imported from OSS Differential Revision: D17835409 Pulled By: pbelevich fbshipit-source-id: ad4120cfe01ea2508bf3ce1054022a2da649ac74	2019-10-13 08:12:13 -07:00
Pavel Belevich	2750ea25b2	C++ API parity: Tanh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27536 Test Plan: Imported from OSS Differential Revision: D17835411 Pulled By: pbelevich fbshipit-source-id: c8984aec2f4bae48ff901fafc8c53a4122192ac5	2019-10-13 06:34:18 -07:00
Will Feng	27027a4804	Fix torch::nn layers to always subclass from `torch::nn::Cloneable` (#27770 ) Summary: The impl class of `torch::nn` layers must always subclass from `torch::nn::Cloneable`, otherwise `module->clone()` doesn't work on them. This PR fixes layers that don't conform to this rule. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27770 Differential Revision: D17893051 Pulled By: yf225 fbshipit-source-id: 37cdf8c09e22f0f164cbd0e8700965a1778ec4c1	2019-10-12 16:23:46 -07:00
Will Feng	aa73701f03	Disable pytorch_short_perf_test_gpu CI job (#27797 ) Summary: The `pytorch_short_perf_test_gpu` CI job hasn't been giving useful signal compared to https://apaszke.github.io/pytorch-perf-hud/ or the FAI-PEP effort. This PR disables it to reduce maintenance workload for CI admins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27797 Differential Revision: D17897180 Pulled By: yf225 fbshipit-source-id: 91a66ebac3d15a44094a669da38c43e3ea9c20d2	2019-10-12 16:19:43 -07:00
Mike Ruberry	f6bda1e07b	Removes @default_floating_dtype decorator (#27628 ) Summary: One fewer legacy decorator cluttering the test suite. Functions relying on this decorator were updated or, in the case of test_sparse, the test suite was put back on double by default. Note: this PR is blocked on https://github.com/pytorch/pytorch/issues/27599. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27628 Differential Revision: D17896254 Pulled By: mruberry fbshipit-source-id: 13d460301f50ef4af7a660372432108164c0de1f	2019-10-12 12:39:34 -07:00
Michael Suo	341262754f	module dedupe (#26666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26666 Changes: - Introduce a `ConcreteModuleType` concept. This acts both as the key into the type cache, and as the source of truth for `ModuleValue::attr` queries. It needs to do both jobs because that's how we ensure correctness (if the types are different, it's because `ModuleValue::attr` would return different things). - Now `recursive_script` will first construct a `ConcreteModuleType` and search for a pre-existing type before starting compilation. - All previous paths to creating a `ScriptModule` (including inheriting from `ScriptModule`) are now rewritten to go through `create_script_module`, so that we have only a single place where construction happens. Behavioral changes: - Big change to `torch.jit.ScriptModule` inheritance: all attributes are now recursively scripted if possible, matching recursive scripting semantics. This makes it hard to keep something from being scripted (for example, a Python submodule). Possibly we'll need an `ignore()` type thing for attributes. In particular, this adds `self.training` to every ScriptModule, since it's present on every `nn.Module`. - I believe this change to be transparent to existing users of the inheritance API, since if you had an attribute that is unscriptable that you never used, there is no error. In some cases, we will create new attributes (even if they are unused), which will increase serialized model size from before. Test Plan: Imported from OSS Differential Revision: D17551196 Pulled By: suo fbshipit-source-id: b476d1c9feb3ddfd63406d90989aaf9dfe890591	2019-10-12 09:51:57 -07:00
Michael Suo	ffa422a8b3	kill _parameter_list (#27399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27399 This was devised in a time when we didn't have module attributes. They are essentially just tensor lists, so represent them that way. This has the additional benefit of making the RNN forward pass faster because we effectively cache the flattened weights. The only complication part is that someone may come along and do: ``` my_rnn_mod.w_ih_l0 = torch.nn.Parameter(...) ``` This means we need to override setattr to keep the flattened weights cache up to date. Test Plan: Imported from OSS Differential Revision: D17785658 Pulled By: suo fbshipit-source-id: 7789cd1d0d4922bfd5eba1716976442fbf150766	2019-10-12 09:51:53 -07:00
Michael Suo	759c99c2e3	[jit Python None should have its type inferred as NoneType (#26665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26665 This is actually useful. For example: in batchnorm.py, all the tracked stats are either `nn.Parameter` or `None`. We should register them as params if they are set, or attributes with type NoneType if they are not. Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D17551197 Pulled By: suo fbshipit-source-id: 8d6f6d76d4dab0d524c4ffdfe0c1dd465771cd00	2019-10-12 09:51:49 -07:00
Pritam Damania	3bccd3fc0d	Distributed Autograd - FAST mode backward pass implementation. (#27022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27022 This change implements the "FAST" mode distributed autograd backward pass as described in https://github.com/pytorch/pytorch/issues/23110. At a high level the backward pass works as follows: 1. We start by computing dependencies on the node that calls `torch.distributed.backward`. 2. This node computes the dependencies starting from the root nodes provided in the backward call and all the 'send' functions present in the current autograd context. The "FAST" mode assumes all 'send' functions are part of the autograd computation. 3. Once the dependency computation is done, the distributed autograd engine calls the local autograd engine to execute the autograd graph. Note that the autograd graph on a single node is not necessarily connected because of inter-node communication. As a result, we have special handling to ensure the local autograd engine ensures we execute the entire graph starting from the provided roots and all 'send' functions on the node. 4. When the local autograd engine hits a 'recv' function, it performs an async RPC to send the gradients over to the appropriate node and stores a future in the autograd context to keep track of this RPC. 5. On the destination node, the appropriate 'send' function is looked up and enqueued on the local autograd engine. If this is the first time the node is hearing about this autograd context id on the backward pass, then the node computes dependencies for the local autograd engine. 6. As part of compute dependencies, the distributed autograd engine discovers all leaf nodes and ensures those are passed as 'outputs' to the local autograd engine. This avoids running the 'AccumulateGrad' function. 7. The gradients computed for the leaf nodes are then actually accumulated in `DistAutogradContext` for the appropriate autograd context id. 8. The distributed autograd engine waits for the local autograd engine to complete and also waits for all the 'Futures' (stored in 4.) for respective RPCs to finish. We have made the following changes to the local autograd engine for this purpose: 1. Expose GraphTask and NodeTask so that the distributed autograd engine can use them. 2. Expose a `execute_with_graph_task` API which gives the distributed engine to build a GraphTask and pass it to the local autograd engine. 3. Expose a `enqueue_on_cpu` API, which allows the distributed engine to build a `NodeTask` for a 'send' function and enqueue it on the local autograd engine. In addition to this a few general improvements: 1. Added a `PropagateGradients` RPC call for the 'recv' function to pass gradients to the appropriate node during the backward pass. 2. Use IValues as much as possible in serialization for RpcWithAutograd. 3. If Future.wait(), contains a message type EXCEPTION, we throw an appropriate exception instead of just returning the message. This is inline with what most Future.wait() APIs do. 4. Added a `get_gradients(context_id)` API which allows users to retrieve a map from Tensor to respective gradient for the provided context_id on the local node. ghstack-source-id: 91794926 Test Plan: unit tests. Differential Revision: D17652615 fbshipit-source-id: 96f65c52adb2706ee29f4b49e1655afaa0a3bec3	2019-10-12 09:47:49 -07:00
Pavel Belevich	96aafc3cdc	C++ API parity: Softsign Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27535 Test Plan: Imported from OSS Differential Revision: D17835408 Pulled By: pbelevich fbshipit-source-id: 8548deab91f6fe0f7285fdd919c25129ed042181	2019-10-12 08:30:10 -07:00
Pavel Belevich	fcb6dd079e	C++ API parity: Softshrink Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27534 Test Plan: Imported from OSS Differential Revision: D17835404 Pulled By: pbelevich fbshipit-source-id: 7b9f3d3ea793f82840496912f248b0c48bb7463e	2019-10-12 06:36:20 -07:00
Gu, Jinghui	c3c0dcf6e3	Upgrade MKL-DNN to v0.21.1 (#27597 ) Summary: 1. Upgrade MKL-DNN to v0.21.1 2. Fix runtime error on legacy hardware with gcc8 3. Remove workaround for issue https://github.com/pytorch/pytorch/issues/21597 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27597 Differential Revision: D17891492 Pulled By: bddppq fbshipit-source-id: ab390a655f7ab7fb7144e2c333f25af85a0f5183	2019-10-12 02:40:43 -07:00
Michael Suo	039acbea90	Revert D17757197: Add CI builds Test Plan: revert-hammer Differential Revision: D17757197 Original commit changeset: e0522e159387 fbshipit-source-id: 10c20ff703676635afcb17ea36b0b48cd3688b7c	2019-10-11 23:15:51 -07:00
nuka137	abaa44122d	C++ API: torch::nn::Softmin (#27459 ) Summary: Add torch::nn::Softmin module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27459 Differential Revision: D17892852 Pulled By: yf225 fbshipit-source-id: db15b06e8ad33947e7d65995df700f5e90c3b6a8	2019-10-11 23:03:55 -07:00
Tao Xu	86fb63f4a0	add testing code to iOS nightly jobs (#27784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27784 ## Summary Since the nightly jobs are running without any testing code, we don't really have a way to verify the binary before uploading it to AWS. To make the work more solid, I came up with an approach to test our builds. ## How it works The XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. So the approach is link our binaries to a testing app and run `xcodebuild` to see if there is any linking error. The PRs below have already done some of the preparation jobs - [#26261](https://github.com/pytorch/pytorch/pull/26261) adds a dummy testing app - [#26632](https://github.com/pytorch/pytorch/pull/26632) adds a ruby script that does all the XCode configuration. The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a [tutorial](https://circleci.com/docs/2.0/ios-codesigning/) but is too complicated to implement. Anyway, I figured out an easier way to do it 1. Disable automatically code sign in XCode (done #27591 ) 2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done) 3. Install the developer certificate to the key chain store on CI machines via Fastlane. (done #27593 ) 4. Add the testing code to PR jobs and verify the result. (done #27594 ) 5. Add the testing code to nightly jobs and verify the result. ## Test Plan - Both PR jobs and nightly jobs can finish successfully. - `xcodebuild` can finish successfully Test Plan: Imported from OSS Differential Revision: D17893271 Pulled By: xta0 fbshipit-source-id: cb7679224e062a4884615f625a2933cad8bd4c11	2019-10-11 21:49:30 -07:00
BowenBao	907ce80321	Update onnx landing page for 1.3 (#27581 ) Summary: * Update supported operator list. * Update FAQ on implicit scalar casting. Traced models are now more robust. cc spandantiwari lara-hdr neginraoof Please feel free to add any missing points. Thank you! cc houseroad for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27581 Reviewed By: hl475 Differential Revision: D17882147 Pulled By: houseroad fbshipit-source-id: c1d745ca647fce2daf897bbb6d1ff8c283f18839	2019-10-11 20:53:50 -07:00
Shihao Xu	130127ca59	Rename `BACKEND` to be `RPC_BACKEND` to be seperated from `COMMUNICATION_BACKEND` like gloo,nccl, in `rpc_test.py` (#27792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27792 Close https://github.com/pytorch/pytorch/issues/27232 ghstack-source-id: 91807741 Differential Revision: D5474297 fbshipit-source-id: 5b230a6857813ec981e5056880abb5859655daa2	2019-10-11 19:49:46 -07:00
Rohan Varma	ccd460d415	use gloo enum instead of hardcoding stirng (#27652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27652 Changes "gloo" to dist.backend.GLOO in rpc_test.py. ghstack-source-id: 91764460 Test Plan: python test/test_rpc_fork.py && python test/test_rpc_spawn.py Differential Revision: D17845067 fbshipit-source-id: b220d3672d1e0b237da474276663d157230a4fdb	2019-10-11 19:06:23 -07:00
Michael Suo	5b88dd6a29	fix checkout for clang-tidy (#27796 ) Summary: whoops, this got left in by accident Pull Request resolved: https://github.com/pytorch/pytorch/pull/27796 Differential Revision: D17892482 Pulled By: suo fbshipit-source-id: f92255d78fe70d3c22c4422b6333ac288cb330d6	2019-10-11 18:43:25 -07:00
Ying Zhang	e8c23c9f85	Add various flags for fakefp16 conversion Summary: ATT Test Plan: manually tested Reviewed By: hyuen Differential Revision: D17849416 fbshipit-source-id: 85ae8fb9c31a0f0139a3c61d5a164b342851d847	2019-10-11 18:06:18 -07:00
James Reed	6e3a53e774	Sanitize module names on legacy import Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27764 Test Plan: Imported from OSS Differential Revision: D17882924 Pulled By: jamesr66a fbshipit-source-id: 89809798d29b971ffb7898188a94667c08641801	2019-10-11 17:43:06 -07:00
Hong Xu	2a23654880	Switch to official releases of katex and update doc for installing katex. (#27758 ) Summary: katex is a deprecated package in Ubuntu and has been removed in recent releases of Debian. Use npm instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27758 Differential Revision: D17891039 Pulled By: ezyang fbshipit-source-id: 53de6e14b2638298e5b61996dcd7ba8de02420a3	2019-10-11 17:20:06 -07:00
Mike Ruberry	fab48eb200	Makes some CPU-only tests in test_torch generic (#27688 ) Summary: Per title. Also testing putting test_advancedindex back on the default stream. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27688 Differential Revision: D17888351 Pulled By: mruberry fbshipit-source-id: af8adeca89f575fc276921b39049b07135ed9776	2019-10-11 17:13:41 -07:00
Supriya Rao	57d608d1f9	Suppress info messages in qnnpack (#27774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27774 Printing messages with warning and above severity only Test Plan: python test/test_quantized.py TestQNNPackOps Imported from OSS Differential Revision: D17886364 fbshipit-source-id: 62a1009f63b049f78b5e13990f758f0fdb0cbc4d	2019-10-11 17:10:01 -07:00
Michael Suo	ba20ad999c	port the rest of the linters over to github actions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27768 Test Plan: Imported from OSS Differential Revision: D17888973 Pulled By: suo fbshipit-source-id: 635bef7854084404d08673d99b1bae502e0dc833	2019-10-11 17:01:59 -07:00
Michael Suo	57d4f8e3d7	kill azure pipelines flake8 (#27767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27767 Note that this kills flake8 for py2.7. I think it's reasonable given the impending removal of py2 support entirely, but someone sanity check me on this Test Plan: Imported from OSS Differential Revision: D17888975 Pulled By: suo fbshipit-source-id: 87559f9e18d39e035e0c781c67025b194a593bc6	2019-10-11 17:01:54 -07:00
Michael Suo	640b486339	add clang-tidy to github actions (#27755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27755 This gives us nice annotations. See https://github.com/suo/pytorch/pull/22/files for an approximation of what it will look like (ignore the warnings on the lint.yml file). I deleted the old azure pipelines one since making the code work for both was annoying, and unlike flake8 this one does not affect master Test Plan: Imported from OSS Differential Revision: D17888974 Pulled By: suo fbshipit-source-id: d8928a1451b6ef500dc1889284cab2845ecdeeea	2019-10-11 17:01:50 -07:00
Negin Raoof	3d2c90131a	opset 11 updates (#27578 ) Summary: Opset 11 updates: - Enabled ORT tests for updated ops in opset 11 - Updated index_copy and index_fill symbolic for opset 11 to modify onnx::Scatter -> onnx::ScatterElemets Pull Request resolved: https://github.com/pytorch/pytorch/pull/27578 Reviewed By: hl475 Differential Revision: D17852462 Pulled By: houseroad fbshipit-source-id: c88747804054d0f3455f2c58fd1d8725e0b2f803	2019-10-11 16:18:40 -07:00
Hong Xu	4da68227e9	Clarify that when the divisor in div is zero and the dividend is integral, the behavior is undefined. (#25968 ) Summary: Currently when an integral tensor is divided by zero, it emits a "floating point exception" (which can be different from system to system). Clarify in the document that nothing would be guaranteed under this circumstance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25968 Differential Revision: D17888097 Pulled By: ezyang fbshipit-source-id: 7c3ce3ac4080479d637cc2710b6aa3ae7e42431d	2019-10-11 15:37:09 -07:00
Mike Ruberry	a710a8b758	Makes CUDA tests in test_autograd generic (#27709 ) Summary: Per title. test_autograd.py no longer needs to import common_cuda as a result of this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27709 Differential Revision: D17881298 Pulled By: mruberry fbshipit-source-id: 8b0351b65a49a072ce5ed7e7099b712847983877	2019-10-11 14:43:00 -07:00
Jithun Nair	6eef469074	Enable mgpu unit tests for rocm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27518 Differential Revision: D17880153 Pulled By: bddppq fbshipit-source-id: 5b6210104ec66747558a08f97dda1e7796f681df	2019-10-11 14:35:36 -07:00
Lu Fang	eb5222397e	Better hashing for constant pool (#27733 ) Summary: Some models many contain thousands constants (like list of ints) and Constant Pooling and CSE pass will move the constant around and update the constant pooling. However our existing hash function only consider the node type + input type + output node (https://bddppq.github.io/codebrowser/pytorch/pytorch/torch/csrc/jit/node_hashing.cpp.html#_ZNK5torch3jit8HashNodeclEPKNS0_4NodeE), which will have a lot of conflicts... I have profiled, one insert may take as long as about 0.2 second... And loading the model will take 200 second, which is insane. So we should fix this performance issue by considering the constant value as well to avoid the conflict. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27733 Reviewed By: bddppq Differential Revision: D17873733 Pulled By: houseroad fbshipit-source-id: 2338d7bf67174a8e56caa19a30401199f68b592a	2019-10-11 14:30:13 -07:00
Ilia Cherniavskii	a22e8f90cd	Add CI builds (#27357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27357 Add extra CI builds for TBB and native builds Test Plan: check CI Differential Revision: D17757197 Pulled By: ilia-cher fbshipit-source-id: e0522e15938710fbf6404478725620282d1287ec	2019-10-11 14:18:25 -07:00
Martin Yuan	977445b635	Disable TSAN test for LiteInterpreterConv (#27748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27748 There's TSAN test failure. From stack it's likely related to mkldnn (https://github.com/pytorch/pytorch/issues/27497). Before the issue is resolved, disable TSAN test. ghstack-source-id: 91761706 Test Plan: buck test mode/dev-tsan caffe2/test/cpp/jit:jit -- 'JitTest\.LiteInterpreterConv' --run-disabled Reviewed By: bddppq Differential Revision: D17880082 fbshipit-source-id: 251d9b9577838146231c8e122f755936edd1c281	2019-10-11 14:05:33 -07:00
Edward Yang	7135f7c263	Revert D17412856: [JIT] add type refinements for isinstance checks Test Plan: revert-hammer Differential Revision: D17412856 Original commit changeset: ded47eb086c4 fbshipit-source-id: 854a6c8f322435c3f3416dbedcb642cb2d2902b1	2019-10-11 13:02:30 -07:00
Chris Gottbrath	f35d7d4614	Pr v130 doc changes oct10 take2 (#27721 ) Summary: resolves issues: https://github.com/pytorch/pytorch/issues/27703 Updates to index for v1.3.0 * add javasphinx to the required sphinx plugins * Update "Package Reference" to "Python API" * Add in torchaudio and torchtext reference links so they show up across all docs not just the main page * Add "Other Languages" section, add in C++ docs, add in Javadocs * Add link to XLA docs under Notes: http://pytorch.org/xla/ this includes changes to: docs/source/conf.py docs/source/index.rst docs/source/nn.rst docs/requirements.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/27721 Differential Revision: D17881973 Pulled By: jlin27 fbshipit-source-id: ccc1e9e4da17837ad99d25df997772613f76aea8	2019-10-11 11:49:14 -07:00
Kevin Chen	275dfa3485	Initial commit for L0 norm approx (#27756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27756 Implement approximate L0 norm for use in the dense feature regularizer that will be used for feature importance. The formula is as follows: {F212246801} Reviewed By: wx1988 Differential Revision: D17432708 fbshipit-source-id: 57d6c9c3dd1b4e210b9f10264075c57dbc9c8cb6	2019-10-11 11:24:34 -07:00
Pieter Noordhuis	c5ec0a7ede	Don't run dist_autograd_fork on Python 2 (#27612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27612 The file imports from torch.distributed.rpc, which won't be initialized when running on Python 2. Test Plan: Imported from OSS Differential Revision: D17855033 Pulled By: pietern fbshipit-source-id: 6e6b0ca248d0512dac5a44e10e153c710cefe02c	2019-10-11 11:18:46 -07:00
Rohan Varma	f36345eb0b	improve error message on incorrect inputs into gather for (#27439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27439 When users call dist.gather, they have to pass in a `gather_list` to the function on the destination worker, and this list needs to have the same size as the number of processes in the group. When the user initializes this list incorrectly, the current error message is not very helpful: This changes the error message so that the incorrect gather_list size is pointed out and the correct one is given. ghstack-source-id: 91413442 Test Plan: Added a unit test and tested with an incorrect gather_list size. Differential Revision: D17781370 fbshipit-source-id: b49aad1b1197daf77daa10911296664e6340e2fa	2019-10-11 11:00:42 -07:00
Jeremy Lilley	726bbfffb9	Add possibility for miniz to use an external crc definition. (#27558 ) Summary: We add an #ifdef check for USE_EXTERNAL_MZCRC, in which case miniz will look for an external mz_crc32 definition. The default behavior is unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27558 Test Plan: Unchanged default behavior, but buck test caffe2/test/... Differential Revision: D17814440 Pulled By: jjlilley fbshipit-source-id: e4ecbe37ee2f9eec176093372f21b3b8e52a5f81	2019-10-11 10:16:01 -07:00
Ailing Zhang	15f9fe1d92	Add missing Optional annotation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27564 Differential Revision: D17816121 Pulled By: ailzhang fbshipit-source-id: 5a4ac12ed81bf5d900ec3e7ab616082cb98d832d	2019-10-11 09:04:29 -07:00
Pavel Belevich	c79d3a4a98	C++ API parity: Softplus Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27489 Test Plan: Imported from OSS Differential Revision: D17835410 Pulled By: pbelevich fbshipit-source-id: 51a8c4ab2ff4b860c96eda1ed8f073017b8cf9ae	2019-10-11 09:00:32 -07:00
Pavel Belevich	9d448099fd	C++ API parity: Sigmoid Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27488 Test Plan: Imported from OSS Differential Revision: D17835405 Pulled By: pbelevich fbshipit-source-id: 78e13047a2a1f2776c59e778db7ba120716e93d3	2019-10-11 07:45:31 -07:00
Pavel Belevich	795c913636	C++ API parity: CELU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27487 Test Plan: Imported from OSS Differential Revision: D17835406 Pulled By: pbelevich fbshipit-source-id: a8282ae65d8996efcc8b8d846cfa637c3f89eda6	2019-10-11 06:23:57 -07:00
Yinghai Lu	cddc147267	Back out "Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup" (#27728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27728 Original commit changeset: 15ad64e49f92 Test Plan: same as previous one. Reviewed By: dreamingleo Differential Revision: D17872553 fbshipit-source-id: fd9d180d5e02e2c17285898c79cdd9509ffb8bbf	2019-10-10 23:52:43 -07:00
Pavel Belevich	6294a9a877	C++ API parity: RReLU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27437 Test Plan: Imported from OSS Differential Revision: D17835413 Pulled By: pbelevich fbshipit-source-id: 5d943fdac4fd2633e7f7ca13db1a7fed5636ca50	2019-10-10 19:14:48 -07:00
Edward Yang	07fc7d05ce	Revert D17488297: [jit] refactor tryMatchSchema Test Plan: revert-hammer Differential Revision: D17488297 Original commit changeset: a32d838ce355 fbshipit-source-id: 2bd319d9554d81d09231bf1e34c8417bff468940	2019-10-10 17:39:48 -07:00
Tao Xu	6385a39eec	add testing code to PR jobs (#27594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27594 ## Summary Since the nightly jobs are lack of testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary. Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs - [#26261](https://github.com/pytorch/pytorch/pull/26261) - [#26632](https://github.com/pytorch/pytorch/pull/26632) The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a [tutorial](https://circleci.com/docs/2.0/ios-codesigning/) but is too complicated to implement. Anyway, I figured out an easier way to do it 1. Disable automatically code sign in XCode 2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done) 3. Install the developer certificate to the key chain store on CI machines via Fastlane. 4. Add the testing code to PR jobs and verify the result. 5. Add the testing code to nightly jobs and verify the result. ## Test Plan - Both PR jobs and nightly jobs can finish successfully. - `xcodebuild` can finish successfully Test Plan: Imported from OSS Differential Revision: D17850703 Pulled By: xta0 fbshipit-source-id: ab220061c6e2ec75cae23684ad999c4f9c276820	2019-10-10 17:36:12 -07:00
Pavel Belevich	352092ca95	C++ API parity: ReLU6 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27436 Test Plan: Imported from OSS Differential Revision: D17835414 Pulled By: pbelevich fbshipit-source-id: 77e743d2f6b71fb3ba5643f9d676f2bb8f236cfa	2019-10-10 17:12:17 -07:00
Elias Ellison	5d495a11cb	add unused and is_scripting to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27630 Differential Revision: D17868856 Pulled By: eellison fbshipit-source-id: 7cf183d5c0d5436fbaa549a02e6b8fd47fa15b67	2019-10-10 17:02:17 -07:00
Edward Yang	2488c29129	Revert D17846079: [TSAN unittest] Disable TSAN test in LiteInterpreterConv Test Plan: revert-hammer Differential Revision: D17846079 Original commit changeset: 669d63856902 fbshipit-source-id: 996d64f12efab52d571fc81a7c602d7f18da7255	2019-10-10 16:29:16 -07:00
nuka137	6711969dd8	C++ API: torch::nn::LogSoftmax (#27462 ) Summary: Add torch::nn::LogSoftmax module and functional support for the C++ API. Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27462 Differential Revision: D17867121 Pulled By: yf225 fbshipit-source-id: dae8ac981c1c6ccdef013cd2d886ad4a043f6243	2019-10-10 16:18:15 -07:00
Ailing Zhang	b3cb072de7	Revert D17826873: Adding support to offsets based Fused8BitRowwiseEmbeddingLookup Test Plan: revert-hammer Differential Revision: D17826873 Original commit changeset: 23c4a96d9252 fbshipit-source-id: 15ad64e49f922a859abc574b261ac0f857682ff4	2019-10-10 16:16:06 -07:00
Shen Li	d8df8aa842	Remove deprecated script_rref_proto Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27697 Test Plan: Imported from OSS Differential Revision: D17855448 Pulled By: mrshenli fbshipit-source-id: b3d39e79dfc1f8745ac9617ca618df3ea38b1b86	2019-10-10 16:05:46 -07:00
auroraustc	f7d7c4b72f	Fix a bug of C++ L-BFGS optimizer (#27606 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model: ``` terminate called after throwing an instance of 'c10::Error' what(): There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245) ``` Add some `if (!parameter.grad().defined()) {...}` in the ` torch/csrc/api/src/optim/lbfgs.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27606 Differential Revision: D17866550 Pulled By: yf225 fbshipit-source-id: bcaf0bf75b93c57304856b03d8984c1617ebbfef	2019-10-10 15:38:05 -07:00
Kutta Srinivasan	415b17e81c	Fix for flaky caffe2 dataio test (test_time_limit_reader_with_short_limit) (#27592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27592 The caffe2 data reader test `test_time_limit_reader_with_short_limit` is flaky as-written because it places an upper bound on how much can be read, but under stress it is possible for fewer records to be read. The fix is to make the assertion check a fuzzy/range check rather than exact equality, since there's not a straightforward way to precisely test a timer-based feature. ghstack-source-id: 91543898 Test Plan: `buck test mode/dev-tsan //caffe2/caffe2/python:dataio_test-2.7 -- --stress-runs 20` -> P117156924 (with fix, 100% pass) P117158750 - without fix, lots of failures in this test Reviewed By: boryiingsu Differential Revision: D17816775 fbshipit-source-id: 2ab0d3304fbd9c9806d37a4fe2912c840616db61	2019-10-10 13:53:58 -07:00
Pavel Belevich	8515650c2b	C++ API parity: ReLU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27435 Test Plan: Imported from OSS Differential Revision: D17835407 Pulled By: pbelevich fbshipit-source-id: b8ee86c7a76674bc88d8e995424dad22d3caab59	2019-10-10 13:34:38 -07:00
Yinghai Lu	ce6287f675	Adding support to offsets based Fused8BitRowwiseEmbeddingLookup (#27635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27635 PyTorch uses `offsets` instead of `lengths` for embedding table lookup. Adding support to that for fused quantized version. AVX2 version is generated with ``` python caffe2/caffe2/perfkernels/hp_emblookup_codegen.py --fused --use-offsets ``` Test Plan: ``` buck test caffe2/torch/fb/sparsenn:test ``` Reviewed By: jianyuh Differential Revision: D17826873 fbshipit-source-id: 23c4a96d92521deaebc02b688ad735d76a4476df	2019-10-10 10:50:44 -07:00
Will Feng	e8087a3060	Change C++ API test files to only include torch/torch.h (#27067 ) Summary: One of the purposes of the C++ API tests in `test/cpp/api/` should be to check that including `torch/torch.h` is a sufficient prerequisite for using all C++ frontend features. This PR change ensures that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27067 Differential Revision: D17856815 Pulled By: yf225 fbshipit-source-id: 49c057bd807b003e4a00f6ba73131d763a0f277a	2019-10-10 09:46:29 -07:00
Michael Suo	9bc8fb8dfd	Revert D17850696: [pytorch][PR] Updates to quantization related files, index.rst, and javadocs Test Plan: revert-hammer Differential Revision: D17850696 Original commit changeset: 3de146f06522 fbshipit-source-id: 565fef87fcf6021362ec3e540be78641d47ef9a7	2019-10-10 09:23:33 -07:00
Martin Yuan	829a5c8584	Disable TSAN test in LiteInterpreterConv Summary: There's TSAN test failure. From stack it's likely related to mkldnn (https://github.com/pytorch/pytorch/issues/27497). Before the issue is resolved, disable TSAN test. Test Plan: buck test mode/dev-tsan caffe2/test/cpp/jit:jit -- 'JitTest\.LiteInterpreterConv' --run-disabled Reviewed By: bddppq Differential Revision: D17846079 fbshipit-source-id: 669d6385690223d83996fb14051c39df0c521dfa	2019-10-10 08:50:59 -07:00
Mingzhe Li	38a3eabd3e	remove cuda from add_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27698 Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K64_cpu # Input: M: 64, N: 64, K: 64, device: cpu Forward Execution Time (us) : 29691.940 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_cpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 60820.813 Reviewed By: hl475 Differential Revision: D17855731 fbshipit-source-id: c64c530f4dbcb5b4132a88894b24e5658aa49d66	2019-10-10 08:32:04 -07:00
Edward Yang	9d925c1d6f	Revert D17851047: [pytorch][PR] Add javasphinx extension Test Plan: revert-hammer Differential Revision: D17851047 Original commit changeset: 8ed7e3c44f20 fbshipit-source-id: 9021436a7c84f7582c3d4d3e29fb5f7b0887e88c	2019-10-10 07:36:42 -07:00
Dmytro Dzhulgakov	d931c8bf75	substantially restructure all quantized docs to group logically (#27677 ) Summary: Make everything clickable Organize APIs logically in subsections Fix many typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/27677 Differential Revision: D17850650 Pulled By: dzhulgakov fbshipit-source-id: 060f6ed988d1c4beecba6bc8daf55626961fac98	2019-10-10 00:50:02 -07:00
Jessica Lin	91959aa3d3	Add javasphinx extension Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27681 Differential Revision: D17851047 Pulled By: brianjo fbshipit-source-id: 8ed7e3c44f2055d2b8577686aff1d13548f45688	2019-10-09 23:20:33 -07:00
jon-tow	f3df6b8ede	Add C++ torch::nn::functional::affine_grid (#27263 ) Summary: Adds`torch::nn::functional::affine_grid` functional support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883, https://github.com/pytorch/pytorch/issues/27196 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27263 Differential Revision: D17802350 Pulled By: yf225 fbshipit-source-id: e823ee53da4a4cc6a1650d2dfc09b0ef6a74e249	2019-10-09 23:17:49 -07:00
Jessica Lin	1118ea5866	Updates to quantization related files, index.rst, and javadocs (#27676 ) Summary: - Update torch.rst to remove certain autofunction calls - Add reference to Quantization Functions section in nn.rst - Update javadocs for v1.3.0 - Update index.rst: - Update "Package Reference" to "Python API" - Add in torchaudio and torchtext reference links so they show up across all docs not just the main page - Add "Other Languages" section, add in C++ docs, add in Javadocs - Add link to XLA docs under Notes: http://pytorch.org/xla/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/27676 Differential Revision: D17850696 Pulled By: brianjo fbshipit-source-id: 3de146f065222d1acd9a33aae3b543927a63532a	2019-10-09 22:52:19 -07:00
Zachary DeVito	51656eefb0	refactor tryMatchSchema (#26499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26499 We've changed how these functions are used over time, so I cleaned up the header file API to match. In particular: * tryMatchSchemas was added since the overload logic got copy/pasted into three separate locations. * With this change, tryMatchSchema is no longer public, as it is not needed outside of tryMatchSchemas * emitBuiltinFunction no longer needs a requires argument (it was always true) * Argument order for all the schema matching stuff now puts the 'self' builtin override last. This is only rarely used and was inconsistent with matchSchema Test Plan: Imported from OSS Differential Revision: D17488297 Pulled By: zdevito fbshipit-source-id: a32d838ce35544972fa8767557acc22149081b55	2019-10-09 22:11:24 -07:00
Zachary DeVito	d44b9cd4bb	add type refinements for isinstance checks (#26271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26271 This replaces unchecked_unwrap_optional with unchecked_cast. This enables the generalization of type refinement so that it works for isinstance checks as well. This also removes unchecked_unwrap_optional from code we generate, which is good because it is a hard op to serialize well since it doesn't directly encode the Optional[T] being unwrapped. In contrast, unchecked_cast always explicitly lists the type. Test Plan: Imported from OSS Differential Revision: D17412856 Pulled By: zdevito fbshipit-source-id: ded47eb086c4610998ad92bb1174225af00220f7	2019-10-09 22:11:19 -07:00
Tao Xu	52985a3501	Install developer certificate for code signing (#27593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27593 ## Summary Since the nightly jobs are lack of testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary. Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs - [#26261](https://github.com/pytorch/pytorch/pull/26261) - [#26632](https://github.com/pytorch/pytorch/pull/26632) The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a [tutorial](https://circleci.com/docs/2.0/ios-codesigning/) but is too complicated to implement. Anyway, I figured out an easier way to do it 1. Disable automatically code sign in XCode 2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done) 3. Install the developer certificate to the key chain store on CI machines via Fastlane. 4. Add the testing code to PR jobs and verify the result. 5. Add the testing code to nightly jobs and verify the result. ## Test Plan - Both PR jobs and nightly jobs can finish successfully. - `xcodebuild` can finish successfully Test Plan: Imported from OSS Differential Revision: D17848814 Pulled By: xta0 fbshipit-source-id: 48353f001c38e61eed13a43943253cae30d8831a	2019-10-09 20:07:30 -07:00
Thomas Viehmann	e66e00cd17	Fix native ctc_loss gradient indexing bug for large target sizes (#27460 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/27442 Thank you Mohamed Yousef (ASDen) for the report with minimal reproducing example and detailed analysis! Pull Request resolved: https://github.com/pytorch/pytorch/pull/27460 Differential Revision: D17789378 Pulled By: soumith fbshipit-source-id: dc01a31b998cced4462e933d4b32e09b331f7e41	2019-10-09 19:26:47 -07:00
Michael Suo	17a54e1b3d	Revert D17840343: [pytorch][PR] changes to the documentation in support of quantization Test Plan: revert-hammer Differential Revision: D17840343 Original commit changeset: 06bf3da6012b fbshipit-source-id: 35f96fac299a0f9dd8ad864f475f606317c46823	2019-10-09 19:20:44 -07:00
Michael Suo	971f773886	Revert D17750005: [jit] Add doc copy-edits from review Test Plan: revert-hammer Differential Revision: D17750005 Original commit changeset: 230d1d33efb0 fbshipit-source-id: 12d22567b99286a8c4f719c3a384cb3665f7ba54	2019-10-09 19:12:58 -07:00
BowenBao	ba792335fc	Export traced aten::unbind (#27247 ) Summary: This PR enables exporting aten::unbind created by the tracer. The traced version IR will always have this pattern ```aten::unbind -> prim::ListUnpack```. Another PR supporting scripted aten::unbind will be submitted separately later. ``` // Unbind is being converted to ONNX as Split + Squeeze. // Example IR // graph(%0 : Float(3, 4, 5)): // %7 : Long() = prim::Constant[value={0}]() // %3 : Tensor[] = aten::unbind(%0, %7) // %4 : Float(4, 5), %5 : Float(4, 5), %6 : Float(4, 5) = prim::ListUnpack(%3) // return (%4, %5, %6) // // Translates to ONNX: // graph(%0 : Float(3, 4, 5)): // %1 : Tensor, %2 : Tensor, %3 : Tensor = onnx::Split[axis=0](%0) // %4 : Float(4, 5) = onnx::Squeeze[axes=[0]](%3) // %5 : Float(4, 5) = onnx::Squeeze[axes=[0]](%2) // %6 : Float(4, 5) = onnx::Squeeze[axes=[0]](%1) // return (%6, %5, %4) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27247 Reviewed By: hl475 Differential Revision: D17791095 Pulled By: houseroad fbshipit-source-id: 83b724275124dd1dedb272583a2fefbdf7035d4c	2019-10-09 18:20:03 -07:00
Martin Yuan	9e9713f071	Register operators of CV models in PyTorch mobile (#27609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27609 It's a fix to PR27379, where it failed in Windows CI. Currently the operators need to be registered manually through c10 registration. Test Plan: The operators should be covered by tests on operators. A few ops (add, conv) are covered in test_lite_interpreter.cpp for demonstration. CV models may be too large to include in unittests. Simple local loaders can be built. Follow similar pattern as in test_lite_interpreter to load the torch script model run the model to get reference results save and load the mobile module using torch::jit::module._save_for_mobile() and torch::jit::_load_for_mobile(). run the mobile module by run_method() and compare the results to reference results. Tested models: Lenet XrayMobileV3 Differential Revision: D17832709 fbshipit-source-id: 51e44fa95240b241da85cb67dc2302878742903c	2019-10-09 17:30:10 -07:00
Jessica Lin	18d5210de9	changes to the documentation in support of quantization (#27603 ) Summary: this includes changes to docs/source/conf.py docs/source/index.rst docs/source/nn.rst docs/source/torch.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/27603 Differential Revision: D17840343 Pulled By: gottbrath fbshipit-source-id: 06bf3da6012b334e3246a6a2cad42358462e2630	2019-10-09 17:13:34 -07:00
Lara Haidar	2093fac4ee	ONNX Export ConstantOfShape with default dtype (#27577 ) Summary: Exporting a scripted module to ONNX, with ops like torch.zeros(), fails when the dtype is not specified. This PR adds support to exporting scripted torch.zeros() ops (and similar ops) without specifying the dtype (dtype will default to float). Pull Request resolved: https://github.com/pytorch/pytorch/pull/27577 Reviewed By: hl475 Differential Revision: D17822318 Pulled By: houseroad fbshipit-source-id: b2d4300b869e782a9b72534fea1263eb83744953	2019-10-09 17:05:35 -07:00
Chris Gottbrath	e049e0b027	adding quantization.rst file for quantization feature (#27559 ) Summary: This was written by Raghu, Jessica, Dmytro and myself. This PR will accumulate additional changes (there are a few more things we need to add to this actual rst file). I'll probably add the related image files to this PR as well. I'm breaking draft PR https://github.com/pytorch/pytorch/pull/27553 into more easily digestible pieces. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27559 Differential Revision: D17843414 Pulled By: gottbrath fbshipit-source-id: 434689f255ac1449884acf81f10e0148d0d8d302	2019-10-09 16:45:09 -07:00
Jessica Lin	0eccd05ab4	Add javadoc rst files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27646 Differential Revision: D17844860 Pulled By: brianjo fbshipit-source-id: 9b3ddf8dab2f63345b73436aeb245eea1686c350	2019-10-09 16:40:02 -07:00
Peter Bell	85f33a4738	Fix install location for ATen_CORE_HEADERS by avoiding relative paths (#27449 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/20046 While installing, `aten/src/ATen` is shortened to just `ATen` so these relative paths become `/usr/local/include/ATen/core/../../../../torch` or simply `/usr/torch`. Note that in cmake, `Caffe2` is the name for the root `pytorch` project so `Caffe2_SOURCE_DIR` gives the `pytorch` directory; not the `caffe2` directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27449 Differential Revision: D17844763 Pulled By: ezyang fbshipit-source-id: fcd964ef1b891972f18155eb72732e90f0d50b8b	2019-10-09 16:37:42 -07:00
Pavel Belevich	1fec1441a1	C++ API parity: PReLU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27429 Test Plan: Imported from OSS Differential Revision: D17835412 Pulled By: pbelevich fbshipit-source-id: e678d5920dad1293bb0ba3de28e2da3087d19bde	2019-10-09 16:31:54 -07:00
Richard Zou	0fbbc7acb4	Allow `align_to` to take in partially named tensors (#27308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27308 Currently, `tensor.align_to(*names)` has the restriction that the `tensor` must be fully named. This doesn't need to be the case, when using Ellipsis, we "expand the ellipsis to all unmentioned dimensions, in the order which they appear in the original tensor". For example, consider `tensor: Tensor[None, None, C]`. `tensor.align_to(C, None, None)` is ambiguous because the user might have wanted to switch the order of the None dimensions and there is no way to specify that using this API. However, `tensor.align_to('C', ...)` isn't ambiguous: we can select the two unnamed dimensions in the order in which they appear. To actually implement this, we write a brand-new `align_to(names, ellipsis_idx)` function in c++ that is separate from the regular `align_to(names)` implementation. Ideally we would support "..." as a special name in c++ and combine the two implementations; we'll need to support "..." in c++ in the future but that requires a bit of extra work. In this PR, Python processees the ellipsis and then calls the correct overload. Test Plan: - run tests Differential Revision: D17745179 Pulled By: zou3519 fbshipit-source-id: 9fed06d224215cfb7efecd8c002604baab3c45e6	2019-10-09 16:28:45 -07:00
Tao Xu	7591010077	Disable automatically code signing for TestApp (#27591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27591 ## Summary Since the nightly jobs are lack of testing phases, we don't really have a way to test the binary before uploading it to AWS. To make the work more solid, we need to figure out a way to verify the binary. Fortunately, the XCode tool chain offers a way to build your app without XCode app, which is the [xcodebuild](https://developer.apple.com/library/archive/technotes/tn2339/_index.html) command. Now we can link our binary to a testing app and run `xcodebuild` to to see if there is any linking error. The PRs below have already done some of the preparation jobs - [#26261](https://github.com/pytorch/pytorch/pull/26261) - [#26632](https://github.com/pytorch/pytorch/pull/26632) The challenge comes when testing the arm64 build as we don't have a way to code-sign our TestApp. Circle CI has a [tutorial](https://circleci.com/docs/2.0/ios-codesigning/) but is too complicated to implement. Anyway, I figured out an easier way to do it 1. Disable automatically code sign in XCode 2. Export the encoded developer certificate and provisioning profile to org-context in Circle CI (done) 3. Install the developer certificate to the key chain store on CI machines via Fastlane. 4. Add the testing code to PR jobs and verify the result. 5. Add the testing code to nightly jobs and verify the result. ## Test Plan - Both PR jobs and nightly jobs can finish successfully. - `xcodebuild` can finish successfully Test Plan: Imported from OSS Differential Revision: D17844036 Pulled By: xta0 fbshipit-source-id: 741f0442a718c9bda706107a2c4c3baed4c37137	2019-10-09 16:23:15 -07:00
t-kuha	b6fea4f77f	Removes floating_dtype decorator from test_torch and test_cuda (#27599 ) Summary: Per title. Also makes a few test_torch tests generic. This PR removes ~half the floating_dtype decorators. Follow-up will remove the rest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27599 Differential Revision: D17840056 Pulled By: mruberry fbshipit-source-id: 428bb5498c452083e3608325e0b548b1d75baf2d	2019-10-09 16:10:26 -07:00
Mingzhe Li	aeae5d6020	add dim to the cat benchmark (#27620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27620 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:cat_test -- --iterations 3 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim0 # Input: M: 256, N: 512, K: 1, dim: 0 Forward Execution Time (us) : 775.348 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim1 # Input: M: 256, N: 512, K: 1, dim: 1 Forward Execution Time (us) : 3612.599 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_M256_N512_K1_dim2 # Input: M: 256, N: 512, K: 1, dim: 2 Forward Execution Time (us) : 91416.224 ... `` Reviewed By: hl475 Differential Revision: D17835348 fbshipit-source-id: 94e02e328c4ea61b2e210d860ccdd377ef2b97f8	2019-10-09 16:03:07 -07:00
Mingzhe Li	abcd221f19	add as_strided operator to the benchmark (#27632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27632 Support as_strided operator in the benchmark suite. Test Plan: buck run caffe2/benchmarks/operator_benchmark/pt:as_strided_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset0 # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 0 Forward Execution Time (us) : 92.008 # Benchmarking PyTorch: as_strided # Mode: Eager # Name: as_strided_M256_N256_size(32,32)_stride(1,1)_storage_offset1 # Input: M: 256, N: 256, size: (32, 32), stride: (1, 1), storage_offset: 1 Forward Execution Time (us) : 91.029 ... Reviewed By: hl475 Differential Revision: D17840076 fbshipit-source-id: 6585feb51ebfaca40032ffa0a61d5f76c25a2599	2019-10-09 15:42:05 -07:00
Ashkan Aliabadi	283f4814d3	Modify PyTorch's integration of NNPACK to use a unified underlying thread pool implementation. (#27341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27341 Multi-threaded: ``` Pixel 2: Before: 362.716 PR-27402: 185.799 PR-27341: 142.011 Pixel 3: Before: 246.755 PR-27402: 160.045 PR-27341: 115.437 ``` Single-threaded: ``` Pixel 2: Before: 308.084 PR-27340: 303.539 PR-27341: 313.558 Pixel 3: Before: 234.272 PR-27340: 227.158 PR-27341: 232.787 ``` Test Plan: Imported from OSS Differential Revision: D17835333 Pulled By: AshkanAliabadi fbshipit-source-id: 9502c230d8567b141ae93f611ac524d855ed9bdf	2019-10-09 15:00:29 -07:00
Carlos Miranda	3246fddfd6	Implement C++ API torch::nn::MultiMarginLoss. (#27424 ) Summary: Hi yf225 , here is the C++ frontend API MultiMarginLoss implementation and tests https://github.com/pytorch/pytorch/issues/27198. Could you review it and tell me if it is okay? I am not entirely sure I used `c10::optional` correctly, but `options.weight()` resulted in a compilation error, so I went with `options.weight().value()` instead of `value_or()` to follow the logic in `torch.nn._WeightedLoss.register_buffer` (where one can pass a `None` value). Oh, and are the tests supposed to be skipped or did I do something wrong? I ran `pytest test/test_cpp_api_parity.py -k Loss -v` , and the `L1Loss` test passed but the others were skipped... Thank you for the review in any case! Pull Request resolved: https://github.com/pytorch/pytorch/pull/27424 Differential Revision: D17839963 Pulled By: yf225 fbshipit-source-id: f4b6012590cf22d56d42751c214df80cce717cb8	2019-10-09 14:44:41 -07:00
jon-tow	0fed4756d0	C++ API parity: SELU (#27434 ) Summary: Adds `SELU` functional and module support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27434 Differential Revision: D17782762 Pulled By: yf225 fbshipit-source-id: 96c7ce84b9baf9e219a63e631929b8997ba6f3f0	2019-10-09 14:39:28 -07:00
nuka137	28a1806cbc	C++ API: torch::nn::Softmax (#27446 ) Summary: Add torch::nn::Softmax module support for the C++ API Related Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27446 Differential Revision: D17839546 Pulled By: yf225 fbshipit-source-id: 7c7fb55111b261614de7c3a75fa1019fbde93c67	2019-10-09 14:19:47 -07:00
davidriazati	e7c9c8098a	Add doc copy-edits from review (#26322 ) Summary: Add edits from doc review ](https://our.intern.facebook.com/intern/diff/17750005/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26322 Pulled By: driazati Differential Revision: D17750005 fbshipit-source-id: 230d1d33efb015e40327373a05a1d3eced7c5c00	2019-10-09 14:16:48 -07:00
Jerry Zhang	9084fcba46	test_equal in test_quantized.py (#27616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27616 Fix a problem in reference implementation of equal Test Plan: pytho test/test_quantized.py Imported from OSS Differential Revision: D17837055 fbshipit-source-id: 1e4bc32f4334c0352468a61fa4316a1c0ff76485	2019-10-09 14:13:56 -07:00
Pavel Belevich	fbba4edd1d	C++ API parity: ELU, Hardshrink, Hardtanh, LeakyReLU, LogSigmoid minor fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27565 Test Plan: Imported from OSS Differential Revision: D17835416 Pulled By: pbelevich fbshipit-source-id: 9e83bdb4bf44cbc2ef09e2088df4bf0694c235f0	2019-10-09 13:23:49 -07:00
Dylan Bespalko	7c472ec597	Vectorized complex unary and binary op support. (#26500 ) Summary: Added Complex support with AVX to unary ops and binary ops. I need to add nan propagation to minimum() and maximum() in the future. In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for complex numbers is here: pytorch-cpu-strided-complex extension Preliminary Benchmarks are here. I tried rrii and riri and found that riri is better in most situations. Divide is very slow because you can't reduce 1/(x+y) Sqrt is also very slow. Reciprocal could be sped up after I add conj() Everything else is typically within 20% of the real number performance. Questions: Why does macOS not support mil? #if AT_MKL_ENABLED() && !defined(__APPLE__) in vml.h. MKL does support some complex operations like Abs, so I was curious about trying it. Is MKL just calling AVX? Pull Request resolved: https://github.com/pytorch/pytorch/pull/26500 Differential Revision: D17835431 Pulled By: ezyang fbshipit-source-id: 6746209168fbeb567af340c22bf34af28286bd54	2019-10-09 12:49:21 -07:00
Edward Yang	d70f8dd964	Tests for fallback boxed dispatch (including TLS mode) (#26719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26719 This PR adds a pair of tests for fallback boxed dispatch, exercising two different ways you might use it: (1) to implement a "wrapper" tensor type (e.g., LazyTensor, NestedTensor), and (2) to implement a toggleable "mode" (e.g., Profiling, Tracing). Both implement the most trivial possible implementations of their type: they "wrap" a real tensor simply forward along to the real implementation. This PR also adds the necessary feature support for toggleable mode, which is in the original generic dispatch abstraction design, but was not previously implemented. I had not originally intended to add this, but it turns out writing a new "mode" is a lot simpler than writing a "wrapper" type, so I ended up writing the mode version first. General structure of the PR: * Add two new testing tensor type ids, `TESTING_ONLY_GenericWrapperTensorId` and `TESTING_ONLY_GenericModeTensorId`, which our tests use. They might find other use in other tests if necessary. * Add support for toggling the availability of `TESTING_ONLY_GenericModeTensorId`. Introduces a new thread local variable accessible by `tls_local_tensor_type_set()` which is considered as part of dispatch. * The mode fallback is very simple: it increments a counter and then passes on the call to the underlying kernel by invoking the JIT. * The wrapper fallback is more complex: it parses the arguments, unwrapping any wrapped tensor arguments, then invokes the JIT, and then rewraps the outputs. The examples here are somewhat simplistic; there are a number of engineering improvements that could be applied. We could save these for later (landing this patch to get immediate testing), or incorporate them into this patch: * `getOperator` is horrible. Bram Wasti and I discussed a plan for how to make this easier, by simply refactoring the JIT interface. * `GenericWrapperTensorImpl` doesn't populate all of its fields accurately. Most notably, size is not setup correctly. * `generic_wrapper_fallback` should handle tensor lists in arguments and returns properly. One pitfall: fallback dispatch only works with non-c10 code. That's why I test using `batch_norm`. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D17549624 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 57dbdd8d6812a66082aa6db2934c8edcda340ea6	2019-10-09 12:20:29 -07:00
Zachary DeVito	eb9000be4e	always use the closure to resolve variable names (#27515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27515 Resoving variable names using the local activation frames does not work when using recursive scripting, but our current code tries to do it (incorrectly) anyway. The reason it works is only because the script call is in the same local frame as the definition. This will not be true in practice and makes it seem like the API works in more cases than it really does. This forces us to always use closure-based annotations, documents it, and it fixes the tests so that they still pass. Test Plan: Imported from OSS Differential Revision: D17803403 Pulled By: zdevito fbshipit-source-id: e172559c655b05f0acf96c34f5bdc849f4e09ce2	2019-10-09 12:16:15 -07:00
Will Feng	1b385e7e5f	Add std::variant backport (mpark) as c10::variant, with gcc 7.3.1 fix (#27575 ) Summary: This is the same as https://github.com/pytorch/pytorch/pull/26836 with workarounds for gcc 7.3.1 bug in light of https://github.com/pytorch/pytorch/pull/27277#issue-324044466. The workaround also limits the use cases of `c10::variant`, but it is sufficient for our (simple) use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27575 Differential Revision: D17834410 Pulled By: yf225 fbshipit-source-id: e8f3c0be2904ec3d2975cbb80af237a5c9d0cb92	2019-10-09 12:10:39 -07:00
Edward Yang	013ca32730	Devirtualize numel() (#27294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27294 Fixes #27291 I'm a little annoyed that I have to reintroduce manual binding code. But it's probably not a good idea to teach the codegen how to do fastpath functions (is it?) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17763486 Pulled By: ezyang fbshipit-source-id: 5793b53e2db80b044e57faae325a95c649d9d459	2019-10-09 11:43:50 -07:00
Mingzhe Li	ab15584dce	add random sample function to generate list of inputs (#23174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23174 This diff introduces a new function to random generates inputs based on the weights. Test Plan: buck run mode/dev-nosan //caffe2/benchmarks/operator_benchmark/common/tests:random_sample_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N5_K7 # Input: M: 1, N: 5, K: 7 Forward Execution Time (us) : 82.923 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N6_K8 # Input: M: 1, N: 6, K: 8 Forward Execution Time (us) : 79.535 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M2_N6_K7 # Input: M: 2, N: 6, K: 7 Forward Execution Time (us) : 83.471 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N4_K7 # Input: M: 1, N: 4, K: 7 Forward Execution Time (us) : 84.410 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N6_K7 # Input: M: 1, N: 6, K: 7 Forward Execution Time (us) : 82.399 ``` Reviewed By: zheng-xq Differential Revision: D15791723 fbshipit-source-id: 730e34d455e962ddf594a491d7c81c3f99fafa86	2019-10-09 11:24:14 -07:00
Mingzhe Li	c1ed0150c5	canonical example of torch.add benchmark (#23402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23402 This diff tries to make torch.add as a canonical example for op benchmark. Once it lands, we will also modify all other op benchmarks to be uniform with this example. With that, when people are adding new ops, they can copy paste any existing code. Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu # Input: M: 8, N: 16, K: 32, device: cpu Forward Execution Time (us) : 146.586 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda # Input: M: 8, N: 16, K: 32, device: cuda Forward Execution Time (us) : 92.151 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu # Input: M: 16, N: 16, K: 64, device: cpu Forward Execution Time (us) : 428.421 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecuda # Input: M: 16, N: 16, K: 64, device: cuda Forward Execution Time (us) : 89.811 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecpu # Input: M: 64, N: 64, K: 128, device: cpu Forward Execution Time (us) : 11857.012 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M64_N64_K128_devicecuda # Input: M: 64, N: 64, K: 128, device: cuda Forward Execution Time (us) : 93.918 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwdall # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 990.125 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd1 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 781.217 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_bwd2 # Input: M: 8, N: 16, K: 32, device: cpu Backward Execution Time (us) : 777.307 ``` Reviewed By: zheng-xq Differential Revision: D16501974 fbshipit-source-id: f1eec010eabf11ce4fcf6cfe6f85cd5241a7022d	2019-10-09 11:24:10 -07:00
Mingzhe Li	a750a1a2b4	modify config_list to support cross product of attributes (#23399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23399 This diff enables config_list function to support cross product of inputs besides the shapes. The following is an example using the update interface. The same input shapes can run on different devices and dtypes. ``` add_short_configs = op_bench.config_list( attr_names=['M', 'N', 'K'], attrs=[ [8, 16, 32], [16, 16, 64], [64, 64, 128], ], cross_product_configs={ 'device': ['cpu', 'cuda'], 'dtype': [torch.float, torch.float64], }, tags=['short'], ) ``` Test Plan: buck run mode/dev-nosan caffe2/benchmarks/operator_benchmark/common/tests:pt_configs_list_test -- --iterations 3 ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_dtypetorch.float32 # Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float32 Forward Execution Time (us) : 164.489 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecpu_dtypetorch.float64 # Input: M: 8, N: 16, K: 32, device: cpu, dtype: torch.float64 Forward Execution Time (us) : 158.677 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda_dtypetorch.float32 # Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float32 Forward Execution Time (us) : 103.866 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M8_N16_K32_devicecuda_dtypetorch.float64 # Input: M: 8, N: 16, K: 32, device: cuda, dtype: torch.float64 Forward Execution Time (us) : 106.027 # Benchmarking PyTorch: add # Mode: Eager # Name: add_M16_N16_K64_devicecpu_dtypetorch.float32 # Input: M: 16, N: 16, K: 64, device: cpu, dtype: torch.float32 Forward Execution Time (us) : 451.016 ... ``` buck test caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test ``` Building: finished in 2.4 sec (100%) 6882/6882 jobs, 2 updated Total time: 2.8 sec Trace available for this run at /tmp/testpilot.20190730-160519.3952794.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 203f0104fbfcec4128be2c482c64736309ae39c9 fbpkg a4b2a9897a0c45069bd07d83e5981052 at Sun Jul 28 01:22:13 2019 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/667/t.par Discovering tests Running 3 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830 ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_config_list_impl (operator_benchmark_test.TestConsumeOp) 0.011 1/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_list_of_ops (operator_benchmark_test.TestConsumeOp) 19.920 2/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - test_single_op (operator_benchmark_test.TestConsumeOp) 23.418 3/3 (passed) ✓ caffe2/benchmarks/operator_benchmark/fb:operator_benchmark_test - main 0.000 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/5910974514382830 Summary (total time 29.90s): PASS: 4 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: zheng-xq Differential Revision: D16501272 fbshipit-source-id: d92b5cf50b0f37d5b3a79d423acb521366b4e8db	2019-10-09 11:24:06 -07:00
peter	b9b9fd4fad	Fix the arithmetic overflow issue for MSVC (#27596 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27568. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27596 Differential Revision: D17831612 Pulled By: ezyang fbshipit-source-id: eff18095a74b6b82f70ed3f11d201483097205c5	2019-10-09 09:31:23 -07:00
Hong Xu	987e37b9c2	Enable EXE001 flake8 check. (#27560 ) Summary: According to https://github.com/pytorch/pytorch/issues/27285 , seems we do not intend to use shebang as an indication of Python version, thus we enable EXE001 flake8 check. For violations, we either remove shebang from non-executable Python scripts or grant them executable permission. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27560 Differential Revision: D17831782 Pulled By: ezyang fbshipit-source-id: 6282fd3617b25676a6d959af0d318faf05c09b26	2019-10-09 09:15:29 -07:00
Hong Xu	65cdc8db5d	Remove GEN_TO_SOURCE from CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27570 Differential Revision: D17831571 Pulled By: ezyang fbshipit-source-id: d4f98dab64c892886cc4fd4128428a677edfd7a8	2019-10-09 08:58:42 -07:00
Edward Yang	eb8fe883d8	Revert D17599915: [pytorch][PR] Support 0-batch size for nn.Linear. Test Plan: revert-hammer Differential Revision: D17599915 Original commit changeset: 78894ce602d9 fbshipit-source-id: 3afd3621e85e5aa8b186d3542f71cef441f3d1bb	2019-10-09 08:58:38 -07:00
Edward Yang	47e6d40b9c	Revert D17810912: Register operators of CV models in PyTorch mobile Test Plan: revert-hammer Differential Revision: D17810912 Original commit changeset: 2cc25dbe81a3 fbshipit-source-id: 3b020f8eee2064f8f5df939b689332c9cab320d5	2019-10-09 08:56:08 -07:00
Nathan Goldbaum	15bec0970c	Add instructions for setting up ccache from conda (#27481 ) Summary: I was unable to use the existing instructions since I don't have sudo privileges on my GPU development machine and couldn't easily install `ccache` or the build dependencies for `ccache`. However, I was able to get it working by installing `ccache` with `conda` and then creating symlinks to shadow my compilers as in the build-from-source installation instructions. I figure this might be generally useful as others might not have sudo privileges on their pytorch development machine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27481 Differential Revision: D17831556 Pulled By: ezyang fbshipit-source-id: c5373d8739ad910015e677e7ad48bd91b770f842	2019-10-09 08:49:51 -07:00
zou3519	59b14a7620	Documentation for named tensors (#27173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27173 `docs/source/named_tensor.rst` is the entry point; most users will land either here or the named tensor tutorial when looking to use named tensors. We should strive to make this as readable, concise, and understandable as possible. `docs/source/name_inference.rst` lists all of the name inference rules. It should be clear but it's hard to make it concise. Please let me know if anything doesn't make sense and please propose alternative wordings and/or restructuring to improve the documentation. This should ultimately get cherry-picked into the 1.3 branch as one monolithic commit so it would be good to get all necessary changes made in this PR and not have any follow ups. Test Plan: - built and reviewed locally with `cd docs/ && make html`. Differential Revision: D17763046 Pulled By: zou3519 fbshipit-source-id: c7872184fc4b189d405b18dad77cad6899ae1522	2019-10-08 22:22:30 -07:00
Anjali Chourdia	a37be201c1	Implement torch.nn.Embedding / EmbeddingBag in PyTorch C++ API (#26358 ) Summary: added more variables to EmbeddingOptions and updated EmbeddingImpl reset, forward functions. Also added EmbeddingBag. ----- This PR is BC-breaking in the following way: Previously, `EmbeddingOptions` supports `count` and `dimension` as options arguments. After this PR, they are renamed to `num_embeddings` and `embedding_dim` respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26358 Differential Revision: D17714337 Pulled By: yf225 fbshipit-source-id: f9f969c68e4bece106b92f8e2e02ac39c8455fb7	2019-10-08 22:13:39 -07:00
Jason Fried	b96f49885f	caffe2 python ideep conv_op test_int8_convolution skip for python 3 Summary: This test was failing in 3.7, turns out it was ommitted by test director in 3.6 so I added a skip for both versions Test Plan: unittests is skipped in 3.7 and 3.6 all other tests pass. Reviewed By: tomdz Differential Revision: D17820967 fbshipit-source-id: 571f0ec7fe1b0cb50ead4e0d18c00151a701f36a	2019-10-08 21:31:11 -07:00
Lin Jiang	1f158adeee	Add support for attention weight in SparseLookup (#26748 ) Summary: Support attention weights input to SparseLookup. In attention sum pooling, if attention weights can be pre-calculated before embedding lookup, they can be passed to SparseLookup and processed by SparseLengthsWeightedSum op. One example is id_score attention sum pooling. Essentially the net is converted from: LengthsSum(Mul(Gather(keys, w), att_weight)) to: SpaseLenghtsWeightedSum(keys, w, att_weight) It unblocks potential efficiency gain with distributed training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26748 Test Plan: unit test Reviewed By: chocjy Differential Revision: D17553345 Pulled By: wheatkit fbshipit-source-id: 60cc3c4b0bc1eade5459ac598e85286f3849a412	2019-10-08 20:22:25 -07:00
Andrey Malevich	a891e92f89	Support 0-batch size for nn.Linear. (#27211 ) Summary: At the current moment of time nn.Linear (an it's interal functional code), will fail in THBlas: RuntimeError: invalid argument 8: lda should be at least max(1, 0), but have 0 at caffe2/aten/src/TH/generic/THBlas.cpp:363 This diff is trying to fix this bug. As of now I was able to identify 2 possible places where changes needs to be done based on current dispatcher logic: 1. The file touched in this diff 2. caffe2/aten/src/THC/generic/THCTensorMathBlas.cu At the moment I didn't find a better places comparing to injecting logic to those files: the only non-generated function for forward pass, this + mm_mat2_backward function family on a backward pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27211 Test Plan: New unit-tests are passing. Code that was failing earlier works. Need to test other backends. Differential Revision: D17599915 Pulled By: kennyhorror fbshipit-source-id: 78894ce602d96aac2d6bf8c16a3fab43973e2d53	2019-10-08 16:43:21 -07:00
Mikhail Zolotukhin	c27853fbba	Expose torch::jit::script::Module::dump_to_str to python as module._c.dump_to_str. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27556 Test Plan: Imported from OSS Differential Revision: D17814331 Pulled By: ZolotukhinM fbshipit-source-id: a25fc853897d37c6a703373838b522c64ad3aa78	2019-10-08 16:32:23 -07:00
Mikhail Zolotukhin	6cf189512c	Remove underscore from pybind of module._c.dump (#27555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27555 It is already under '_c' anyway. Test Plan: Imported from OSS Differential Revision: D17814333 Pulled By: ZolotukhinM fbshipit-source-id: ca21649d553f6601be12828958a8077867d0e30e	2019-10-08 16:32:19 -07:00
Jerry Ma	1610ea8ef8	Comprehensive-ish instrumentation for CUDA memory allocator (#27361 ) Summary: Adds comprehensive memory instrumentation to the CUDA caching memory allocator. # Counters Added comprehensive instrumentation for the following stats: - Allocation requests (`allocation`) - Allocated memory (`allocated_bytes`) - Reserved segments from cudaMalloc (`segment`) - Reserved memory (`reserved_bytes`) - Active memory blocks (`active`) - Active memory (`active_bytes`) - Inactive, non-releasable blocks (`inactive_split`) - Inactive, non-releasable memory (`inactive_split_bytes`) - Number of failed cudaMalloc calls that result in a cache flush and retry (`cuda_malloc_retries`) - Number of OOMs (`num_ooms`) Except for the last two, these stats are segmented between all memory, large blocks, and small blocks. Along with the current value of each stat, historical counts of allocs/frees as well as peak usage are tracked by the allocator. # Snapshots Added the capability to get a "memory snapshot" – that is, to generate a complete dump of the allocator block/segment state. # Implementation: major changes - Added `torch.cuda.memory_stats()` (and associated C++ changes) which returns all instrumented stats as a dictionary. - Added `torch.cuda.snapshot()` (and associated C++ changes) which returns a complete dump of the allocator block/segment state as a list of segments. - Added memory summary generator in `torch.cuda.memory_summary()` for ease of client access to the instrumentation stats. Potentially useful to dump when catching OOMs. Sample output here: https://pastebin.com/uKZjtupq # Implementation: minor changes - Add error-checking helper functions for Python dicts and lists in `torch/csrc/utils/`. - Existing memory management functions in `torch.cuda` moved from `__init__.py` to `memory.py` and star-imported to the main CUDA module. - Add various helper functions to `torch.cuda` to return individual items from `torch.cuda.memory_stats()`. - `torch.cuda.reset_max_memory_cached()` and `torch.cuda.reset_max_memory_allocated()` are deprecated in favor of `reset_peak_stats`. It's a bit difficult to think of a case where only one of those stats should be reset, and IMO this makes the peak stats collectively more consistent. - `torch.cuda.memory_cached()` and `torch.cuda.max_memory_cached()` are deprecated in favor of `*memory_reserved()`. - Style (add access modifiers in the allocator class, random nit fixes, etc.) # Testing - Added consistency check for stats in `test_cuda.py`. This verifies that the data from `memory_stats()` is faithful to the data from `snapshot()`. - Ran on various basic workflows (toy example, CIFAR) # Performance Running the following speed benchmark: https://pastebin.com/UNndQg50 - Before this PR: 45.98 microseconds per tensor creation - After this PR: 46.65 microseconds per tensor creation Pull Request resolved: https://github.com/pytorch/pytorch/pull/27361 Differential Revision: D17758747 Pulled By: jma127 fbshipit-source-id: 5a84e82d696c40c505646b9a1b4e0c3bba38aeb6	2019-10-08 15:42:48 -07:00
Martin Yuan	04cd777ed4	Create BUCK build for lite-interpreter (#27546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27546 Add files in csrc/jit/mobile folder to torch_core, as a first step to have light interpreter built in BUCK. Next the files will be independent of torch_core (T54912812) ghstack-source-id: 91523987 Test Plan: buck build -c pytorch.enable_rtti=1 -c project.ignore= -c ndk.app_platform=android-23 -c user.libcxx_cflags=-DFOLLY_USE_LIBCPP=1 -c user.libcxx_cxxflags=-DFOLLY_USE_LIBCPP=1 -c ndk.cxx_runtime=libcxx -c user.ndk_cxxflags=-g0 //xplat/experimental/pytorch/mobile:lite_predictorAndroid#android-armv7 && adb push buck-out/gen/xplat/experimental/pytorch/mobile/lite_predictorAndroid#android-armv7 /data/local/tmp/ In adb shell: data/local/tmp/lite_predictorAndroid\#android-armv7 add_it.bc buck build -c project.ignore= @//fbcode/mode/dev-asan //xplat/experimental/pytorch/mobile:lite_predictor Reviewed By: ljk53 Differential Revision: D17717547 fbshipit-source-id: 4c00a35eb231968d05d0d7b56bcfd5dc0258d4bb	2019-10-08 15:20:30 -07:00
Igor Fedan	ff03f9bc94	Remove CPU_tensor_apply* from Normalization.cpp (#27327 ) Summary: https://github.com/pytorch/pytorch/issues/24486 https://github.com/pytorch/pytorch/issues/24485 https://github.com/pytorch/pytorch/issues/24484 https://github.com/pytorch/pytorch/issues/24483 https://github.com/pytorch/pytorch/issues/24482 https://github.com/pytorch/pytorch/issues/24481 https://github.com/pytorch/pytorch/issues/24480 https://github.com/pytorch/pytorch/issues/24479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27327 Differential Revision: D17811268 Pulled By: ifedan fbshipit-source-id: 7ce54d8e87752e9ea34b12b1415e1398017070cd	2019-10-08 14:49:59 -07:00
Martin Yuan	e16868ab29	Register operators of CV models in PyTorch mobile (#27379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27379 Currently the operators need to be registered manually through c10 registration. Test Plan: The operators should be covered by tests on operators. A few ops (add, conv) are covered in test_lite_interpreter.cpp for demonstration. CV models may be too large to include in unittests. Simple local loaders can be built. Follow similar pattern as in test_lite_interpreter to 1. load the torch script model 2. run the model to get reference results 3. save and load the mobile module using torch::jit::module._save_for_mobile() and torch::jit::_load_for_mobile(). 4. run the mobile module by run_method() and compare the results to reference results. Tested models: * Lenet * XrayMobileV3 Differential Revision: D17810912 fbshipit-source-id: 2cc25dbe81a3c9a85108b3efe6a8e957028fc622	2019-10-08 14:05:26 -07:00
Igor Fedan	3f660cdf0f	Remove CUDA_tensor_apply1 (#27313 ) Summary: CUDA_tensor_apply1 is unused, so it will be removed Pull Request resolved: https://github.com/pytorch/pytorch/pull/27313 Differential Revision: D17746076 Pulled By: ifedan fbshipit-source-id: 99120a5f1f0f716b4dc19b6ffe931071cbcdaea2	2019-10-08 13:23:00 -07:00
Hong Xu	e7b6ea5535	Move the CUDA implementation of atan2 (which was partially implemented in ATen) to ATen. (#26178 ) Summary: std::atan2 is not used because it does not work with HIP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26178 Differential Revision: D17747897 Pulled By: VitalyFedyunin fbshipit-source-id: b300f0573c431e1425644c9c1899d0b024c6a57c	2019-10-08 13:15:51 -07:00
Heungsub Hans Lee	c1c176d91b	record_stream() for shifted view tensors (#27371 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/27366 The address of a view tensor might be shifted from the head of the storage. ```python >>> x = torch.rand(10, 10, device=0, requires_grad=True) >>> y = x[2:] >>> hex(x.data_ptr()) '0x7f1b15c00000' >>> hex(y.data_ptr()) '0x7f1b15c00050' ``` Currently, `Tensor.record_stream()` silently ignores shifted view tensors, because `CUDACachingAllocator` cannot find the block from the shifted address. ```c++ void recordStream(void* ptr, cuda::CUDAStream stream) { if (ptr) { std::lock_guard<std::recursive_mutex> lock(mutex); Block* block = find_allocated_block(ptr); if (block) { ... } // 'block' is nullptr if 'ptr' is shifted. } } ``` So we cannot protect shifted view tensor which is used to compute or copy in an arbitrary stream against unexpected reallocation. Once we call `record_stream()` on a tensor, our intention is to protect the storage behind the tensor against reallocation until all works in the stream finish. This rule should be consistent regardless of the type of tensors including the view. We can retrieve the head of the address from any types of tensors by `tensor.storage().data_ptr()`. Hence, I've thought it's better to pass to `recordStream()` rather than `tensor.data_ptr()` for consistent behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27371 Reviewed By: ezyang Differential Revision: D17768558 Pulled By: albanD fbshipit-source-id: 7705f52b0177625168edb6f71c07a029df471bc5	2019-10-08 12:31:26 -07:00
Zafar Takhirov	6e59fb6a97	.gitignore for the docs folder Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27491 Test Plan: Imported from OSS Differential Revision: D17796152 Pulled By: zafartahirov fbshipit-source-id: d1aaf27b4ea1fb683cd889e5a935b4ca275de3ad	2019-10-08 12:18:30 -07:00
Guanheng Zhang	eb93200321	Fix DDP incompatibility issue with nn.MultiheadAttention. (#26826 ) Summary: Fix issue https://github.com/pytorch/pytorch/issues/26698. With different query/keys/value dimensions, `nn.MultiheadAttention` has DDP incompatibility issue because in that case `in_proj_weight` attribute is created but not used. Fix it and add a distributed unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26826 Differential Revision: D17583807 Pulled By: zhangguanheng66 fbshipit-source-id: c393584c331ed4f57ebaf2d4015ef04589c973f6	2019-10-08 12:13:34 -07:00
Nathan Goldbaum	f522bde121	Replace references to _DataLoaderIter with _BaseDataLoaderIter (#27105 ) Summary: Back in April, malmaud added type annotations for `dataloader.py`. However, at about the same time, SsnL in https://github.com/pytorch/pytorch/issues/19228 replaced `_DataLoaderIter` with `_BaseDataLoaderIter` and two subclasses, `_SingleProcessDataLoaderIter`, and `_MultiProcessingDataLoaderIter`. However - probably because these changes happened in parallel at roughly the same time, the type stubs and several other references in the codebase were never updated to match this refactoring. I've gone ahead and done the updates to reflect the refactoring in https://github.com/pytorch/pytorch/issues/19228, which fixes the specific type stub/impelementation mismatch pointed out in https://github.com/pytorch/pytorch/issues/26673, although not the broader problem that pytorch doesn't have a test to make sure that the `.pyi` type stub files match the real API defined in `.py` files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27105 Differential Revision: D17813641 Pulled By: ezyang fbshipit-source-id: ed7ac025c8d6ad3f298dd073347ec83bb4b6600c	2019-10-08 12:09:02 -07:00
Nathan Goldbaum	d57124823b	Regenerate aten_op.h when native_functions.yaml changes. (#27253 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10127. This ensures that aten_op.h is regenerated whenever a new native kernel is removed. Previously it was only being regenerated when new native kernels were added because this generated new source files, which this cmake target depended on. However if a native kernel is removed then there is no dependent target and the header is never regenerated. Explicitly depending on native_functions.yaml ensures that the header is regenerated even if a kernel is removed. I'm no cmake expert so alternative approaches or reasons why this is obviously incorrect are very appreciated! EDIT: reflecting comments below we now depend on `Dependencies.yaml` instead of `native_functions.yaml`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27253 Differential Revision: D17813659 Pulled By: ezyang fbshipit-source-id: 2c754a88ba62495c14de8a9649f6675d2dad0b7d	2019-10-08 11:54:51 -07:00
Mingzhe Li	31a6ff46c1	change input shape to reduce variation (#27548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27548 as title Test Plan: i_dont_want_it Reviewed By: hl475 Differential Revision: D17811295 fbshipit-source-id: 3be957f6f3eaa464ebf4f5bd7c07d096ae4eae8c	2019-10-08 11:45:06 -07:00
Pieter Noordhuis	b4ce922b58	Move RPC API to torch.distributed.rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27290 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D17808212 Pulled By: pietern fbshipit-source-id: c79907940fe4888b2ceaaa1cda0078e39c89b454	2019-10-08 11:31:25 -07:00
Pieter Noordhuis	a6d26ce135	Move internal functions to torch.distributed.rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27289 Test Plan: Imported from OSS Differential Revision: D17808214 Pulled By: pietern fbshipit-source-id: 4c453028e431c3e951d439784017ef07037ba1a9	2019-10-08 11:31:20 -07:00
Pieter Noordhuis	14f1629c4d	Move RPC backend registry to torch.distributed.rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27288 Test Plan: Imported from OSS Differential Revision: D17808215 Pulled By: pietern fbshipit-source-id: 489c031e02cd3141a861cf7ec2273aaa4c55b7d7	2019-10-08 11:31:16 -07:00
Pieter Noordhuis	1fd14c5822	Remove torch.distributed.rpc function (#27287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27287 This is replaced by calls to `dist.rpc_sync` and `dist.rpc_async`. Test Plan: Imported from OSS Differential Revision: D17808210 Pulled By: pietern fbshipit-source-id: 3103a615fa8b08224780387a3ea4ac6b1c73badb	2019-10-08 11:31:12 -07:00
Pieter Noordhuis	48a571b29c	Rename variables and add comments (#27286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27286 The name `runUDFFunction` stutters because the F in UDF also stands for function. Renamed these variables to be identical to their Python equivalents. Renamed those to share a prefix and drop `internal`, because internal functions can use an underscore prefix. Test Plan: Imported from OSS Differential Revision: D17808208 Pulled By: pietern fbshipit-source-id: 7619f07fc8215203dfb1da1eb281845edcd2bb99	2019-10-08 11:31:08 -07:00
Pieter Noordhuis	f597926fe0	Remove shebang from non-executable files in torch.distributed Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27285 Test Plan: Imported from OSS Differential Revision: D17808207 Pulled By: pietern fbshipit-source-id: 6141c1783e3a6f448a298275120db1f254b42b2a	2019-10-08 11:31:03 -07:00
Pieter Noordhuis	c742918854	Fix pybind11 warnings in python_rpc_handler.cpp (#27284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27284 The warnings related to usage of the deprecated != operator. Instead of checking the member field on every function call, we can check it once, on construction of PythonRpcHandler. Test Plan: Imported from OSS Differential Revision: D17808213 Pulled By: pietern fbshipit-source-id: 022c8f77f266942c49c55b1729e62dbb06262d77	2019-10-08 11:30:59 -07:00
Edward Yang	0d22f3b170	Emergency split CUDA libtorch build/test into separate job (#26859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26859 CUDA builds are intermittently taking greater than five hours, hitting CircleCI's timeout limit, and also all around making developers unhappy. Part of the reason for this is because they build PyTorch twice: once as normal, and once as libtorch. This diff splits libtorch into a new job to parallelize this and get us below the patch. It's an emergency diff because I did the minimum possible work to make this work, including grody hacks to make sure macos libtorch builds still work (without adding a separate job there). - Add a new libtorch config, to cuda9 (same as before). Disable generation of the other test variants. - Adjust common.sh to NO LONGER set BUILD_TEST_LIBTORCH for pytorch-linux-trusty-py3.6-gcc7; we will test for libtorch in the job name for this case. (I noticed a bug while looking at this.) - Adjust build.sh and test.sh. The eventual logic is that if you are a libtorch build, ONLY build libtorch; otherwise do the same thing you used to do (including respecting BUILD_TEST_LIBTORCH) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17810592 Pulled By: ezyang fbshipit-source-id: 8dcdb8f7424ddda293500d9fc90097a54dca28b9	2019-10-08 11:24:21 -07:00
Alexander Melnikov	660264e173	fix documentation for add_hparams (#27521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27521 adding new lines to add_hparams description Test Plan: sphinx-autobuild Reviewed By: orionr Differential Revision: D17800387 fbshipit-source-id: 4a09a86a9d35c6c2d3a7e2857027f9d053851585	2019-10-08 10:56:44 -07:00
Jonathan Tow	3b5d40c339	Add C++ torch::nn::CosineEmbeddingLoss (#27345 ) Summary: Adds `torch::nn::CosineEmbeddingLoss` module and functional support for the C++ API. Issue: https://github.com/pytorch/pytorch/issues/25883 Reviewer: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27345 Differential Revision: D17801402 Pulled By: yf225 fbshipit-source-id: 0eabe80d7d36397e6667b331c3fa2f56d7a15962	2019-10-08 10:52:05 -07:00
James Reed	e63bfb7877	Use orig source range in Node::print Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27524 Test Plan: Imported from OSS Differential Revision: D17806454 Pulled By: jamesr66a fbshipit-source-id: 5e3edb87fc79ad8dd1aed0b7d4a2153e7e0429ab	2019-10-08 10:30:56 -07:00
svcscm	e2143fdeb8	Updating submodules Summary: GitHub commits: `fdc5edee63` `266b453eb0` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: bd5b2e9b7bd31d8995e75124b61e29423b624265	2019-10-08 10:20:06 -07:00
davidriazati	725810f42c	Set existing attributes under recursive script (#27514 ) Summary: This is related to #27109, `training` was being skipped since modules have it as an attribute by default, but it should be copied anyways. ](https://our.intern.facebook.com/intern/diff/17802544/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27514 Pulled By: driazati Differential Revision: D17802544 fbshipit-source-id: 9e8f068903b67073c509c2c598b27622fcada2d7	2019-10-08 10:12:04 -07:00
Mike Ruberry	7f183a978f	Stops common_utils.py from setting the default tensor type (to torch.DoubleTensor) (#27444 ) Summary: This PR stop common_utils.py from setting the default tensor type when it's imported. See issue https://github.com/pytorch/pytorch/issues/27355. This is a frequent source of confusion for test writers. Many tests relied on this setting (whether they knew it or not), and this PR also updates the test suite to pass without common_utils.py setting the default tensor type. Some larger test files now set the default floating dtype themselves, however. These test files are: - test_autograd.py - test_distributions.py - test_jit.py - test_nn.py This is still a significant improvement from today, however. First, these files set the default floating dtype much more clearly than importing it from common_utils. Second, the rest of the test suite no longer sets this globally. Third, this PR is a springboard to updating those tests, too. In particular, as tests are made generic they can be moved aways from relying on this global setting. Notable technical changes in this PR are: - Significant updates to test_torch.py to make it pass without setting the default floating dtype globally. - The default_floating_dtype decorator is now defined in common_utils, a couple versions of this operator were defined in test files previously. - test_torch-specific parts of common_utils were refactored into test_torch. - tensor creation methods in common_utils were updated to accept an optional dtype and device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27444 Differential Revision: D17795235 Pulled By: mruberry fbshipit-source-id: 7f77271c0c836e69f183ad9057a2c4b29f09d2e1	2019-10-08 09:52:44 -07:00
Abbas	16ece1c9da	Fixed typos and grammatical errors (#27465 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27465 Differential Revision: D17810732 Pulled By: pietern fbshipit-source-id: b8a62dd086a4f4a61c9aa6acfa495cf822995604	2019-10-08 09:31:45 -07:00
Edward Yang	6e0312a9c5	Revert "Make static dispatch turn off variable before entering the kernel. (#26908 )" (#27283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27283 This reverts commit 9159a601ca4953ecf0d3dc568cd0b966de2d4686. Test Plan: Imported from OSS Differential Revision: D17738167 Pulled By: ezyang fbshipit-source-id: cc4048d553017409279603590833d1529f59048c	2019-10-08 09:21:07 -07:00
Chris Gottbrath	a96b003b39	docstring only formatting changes: quantize.py, fake_quantize.py, observer.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27415 Reviewed By: zafartahirov Differential Revision: D17783101 Pulled By: gottbrath fbshipit-source-id: a7acbc55edfaa75fdbd17fd30d530710a401b22f	2019-10-08 09:21:03 -07:00
Swati Rallapalli	e63addfff6	Exponential decay of the weight of task loss (#27508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27508 Implemented a simple exponential decay of the weight of lr loss function, with a lower bound. Test Plan: buck test //caffe2/caffe2/fb/dper/layer_models/tests:mtml_test -- test_task_weight_decay https://our.intern.facebook.com/intern/testinfra/testrun/3377699729136308 canary: f140103452 Reviewed By: chenshouyuan Differential Revision: D17524101 fbshipit-source-id: 9a653e21a4ecb74dfc4ac949c9e3388f36ef3a20	2019-10-08 09:15:41 -07:00
Edward Yang	2c51e0659b	Roll master to 1.4.0 (#27374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27374 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17809770 Pulled By: ezyang fbshipit-source-id: 75bd97426494a7bbbf08f9bce7563d35871443d8	2019-10-08 08:58:53 -07:00
Lu Fang	34662f77c6	Revert D17159707: [pytorch][PR] [ONNX] Fixed Select symbolic to export slice when index = negative one Test Plan: revert-hammer Differential Revision: D17159707 Original commit changeset: 2c3b27542108 fbshipit-source-id: accce910abdbe13270d0f592810a48b1dabe4b01	2019-10-08 01:59:10 -07:00
svcscm	1b5df37441	Updating submodules Summary: GitHub commits: `e80ecd1d63` `6c7a36b1b3` `8750462043` `442d7def67` `c138dc3d2c` `3833f10989` `6fc473d530` `82d259dade` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7834a4a8620d0ab9b60060e0abadfba457fb2890	2019-10-08 01:08:45 -07:00
James Reed	84e2dc692a	Fix broken name mangling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27511 Test Plan: Imported from OSS Differential Revision: D17801185 Pulled By: jamesr66a fbshipit-source-id: 3eaa9542a445c9401f3f96e11138ec09b0d8350a	2019-10-07 20:05:32 -07:00
Pavel Belevich	23f2fb0aec	#include <stdexcept> into flat_hash_map.h (#27478 ) Summary: Fixing https://github.com/pytorch/pytorch/issues/27266 In general we should not rely on transitively included headers, we should implicitly include all headers if their members are used in the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27478 Differential Revision: D17799522 Pulled By: pbelevich fbshipit-source-id: 5818394a212c947cfac3a6cf042af9ebb8b9d9a0	2019-10-07 19:24:07 -07:00
Pritam Damania	24242e86fa	Ensure NCCL error handling code is disabled for NCCL versions < 2.4 (#27124 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27124 ncclCommAbort() and ncclGetAsyncError() were two APIs added in NCCL 2.4 to detect errors in NCCL communicators. These were used as part of ProcesGroupNCCL and we also enforced that only NCCL versions 2.4+ were supported. Although, there is still legitimate use for older NCCL versions and hence we should still support those. For that purpose, in this change I've ensured we disable NCCL error checking for versions < 2.4. ghstack-source-id: 91452959 Test Plan: 1) Test with 2.4.8 2) Test with 2.2.13 3) unit tests. Differential Revision: D17178988 fbshipit-source-id: 5dc44b5f7b4b00466c67fd452315f1d4f5c47698	2019-10-07 17:39:32 -07:00
Your Name	4bd8ae13c6	Move hipify to torch/utils to bundle them into torch package (#27425 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/27418 but try to put it under "torch" namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/27425 Differential Revision: D17779490 Pulled By: bddppq fbshipit-source-id: 688338d143509b37dfc110df17af3331db48a42b	2019-10-07 17:25:45 -07:00
David	ce16d689b3	FunctionEventAvg implements __iadd__ interface (#27498 ) Summary: Resolving issue https://github.com/pytorch/pytorch/issues/26433 by making FunctionEventAvg implement the `__iadd__` interface again, like it used to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27498 Differential Revision: D17801918 Pulled By: ezyang fbshipit-source-id: 0597059c903ac168ed64a05ac1decff3ffd14f06	2019-10-07 17:14:27 -07:00
David Reiss	4a28ab95d0	Clean up JavaDoc comments in pytorch_android Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27455 Test Plan: Imported from OSS Differential Revision: D17800658 Pulled By: dreiss fbshipit-source-id: dbd01d9fa5ac82c50daf54c2869dc18be233d8dd	2019-10-07 17:01:30 -07:00
David Reiss	1ffa81d772	Various cleanups to pytorch_android API (#27454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27454 See detailed discussion at https://github.com/pytorch/pytorch/issues/27350 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800480 Pulled By: dreiss fbshipit-source-id: bf174e8b16231b89be771de0fa54c41e864a3eb0	2019-10-07 17:01:26 -07:00
David Reiss	b66df47a11	Refactor python_android test to separate Android-specific components (#27453 ) Summary: All of the test cases move into a base class that is extended by the intrumentation test and a new "HostTests" class that can be run in normal Java. (Some changes to the build script and dependencies are required before the host test can actually run.) ghstack-source-id: fe1165b513241b92c5f4a81447f5e184b3bfc75e Pull Request resolved: https://github.com/pytorch/pytorch/pull/27453 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Differential Revision: D17800410 fbshipit-source-id: 1184f0caebdfa219f4ccd1464c67826ac0220181	2019-10-07 17:01:22 -07:00
Nikita Shulga	aab9673e8d	Avoid variable shadowing in ``::at::philox_engine::single_round()`` (#27486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27486 Rename `key` argument of `single_round` method to `in_key` Test Plan: CI Reviewed By: stepancheg, soumith Differential Revision: D17782904 fbshipit-source-id: 6feae55c407f39d41db099b013dcbd3990768603	2019-10-07 16:34:22 -07:00
Negin Raoof	16454095e0	Fixed Select symbolic to export slice when index = negative one (#25273 ) Summary: Exporting torch.select when index = negative one (x[:,-1]) was broken. This PR has the fix in symbolic function for select. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25273 Reviewed By: hl475 Differential Revision: D17159707 Pulled By: houseroad fbshipit-source-id: 2c3b275421082758f1b63c1c9b6e578f03ca9f76	2019-10-07 14:24:34 -07:00
Lu Fang	8cc9d27647	Automatic update of fbcode/onnx to 2891e1459745933f4bba9a8cb3371cf3c9eb1d16 (#27474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27474 Previous import was 034921bd574cc84906b7996c07873454b7dd4135 Included changes: - [2891e145](https://github.com/onnx/onnx/commit/2891e145): Fix Unique unit test (#2381) <Scott McKay> - [25cf73e5](https://github.com/onnx/onnx/commit/25cf73e5): update shapeInference h file link (#2369) <prcvih> - [e3074bc0](https://github.com/onnx/onnx/commit/e3074bc0): modify file path (#2378) <prcvih> - [9058d3a4](https://github.com/onnx/onnx/commit/9058d3a4): Incrementing version number to 1.6.0 (#2353) (#2385) <Kevin Chen> - [c963586d](https://github.com/onnx/onnx/commit/c963586d): Remove typing packages from test requirements (#2375) <Aiken Cairncross> Test Plan: ci Reviewed By: bddppq Differential Revision: D17791527 fbshipit-source-id: 23ad5abe313cd4e4eedcbe7794b98450b3b7d3bc	2019-10-07 14:16:29 -07:00
Natalia Lunova	a4cba50d62	Put metrics back to torch.utils.tensorboard similar we have in TensorboardX Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27252 Test Plan: Check metrics in the Scuba table: https://fburl.com/scuba/k5x8yosj Reviewed By: sanekmelnikov Differential Revision: D17723414 fbshipit-source-id: 64d42e0b4582f635d38f38feb2b2a6c4826f2065	2019-10-07 14:10:38 -07:00
davidriazati	0046092178	Reduce special casing around 'training' (#27109 ) Summary: Most of this was old cruft left over from special handling of `training` before we had a `bool` type. This makes all modules have a `training` attribute that is true by default and removes all other special handling. Fixes #26884 ](https://our.intern.facebook.com/intern/diff/17728129/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27109 Pulled By: driazati Differential Revision: D17728129 fbshipit-source-id: 8ddc9fbb07a953dd05529538bfdd01ed88b5cb57	2019-10-07 13:52:59 -07:00
Negin Raoof	a24291a554	Unfold export (#24970 ) Summary: ONNX export for Unfold in symbolic opset9 + op and ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/24970 Reviewed By: hl475 Differential Revision: D17495106 Pulled By: houseroad fbshipit-source-id: fcd179a1213c0f219628f25c09e66fcfe4c5df50	2019-10-07 13:06:37 -07:00
Pritam Damania	1250acef90	Disable tsan for test_multiprocessing. (#27410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27410 Similar to https://github.com/pytorch/pytorch/pull/25005, TSAN is not safe to use in a multi-threaded program with fork and can cause deadlocks. As a result, disabling this test for TSAN. ghstack-source-id: 91393545 Test Plan: buildbot Differential Revision: D17775141 fbshipit-source-id: 109b8095240ad43ee4a6380f70b9efca863c0a4a	2019-10-07 11:29:04 -07:00
vishwakftw	0222eceaaa	Remove outdated note in cholesky_solve and triangular_solve doc strings (#26989 ) Summary: We do support inputs with dim > 2 in _out variants Pull Request resolved: https://github.com/pytorch/pytorch/pull/26989 Differential Revision: D17785632 Pulled By: soumith fbshipit-source-id: d42ba7ca9c225ad1a26ff3b410d0c5c08eaed001	2019-10-06 23:28:48 -07:00
Edward Yang	0b6186d778	Remove Tensor.h, TensorMethods.h from src/core. (#27086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27086 This is a major source of merge conflicts, and AFAICT isn't necessary anymore (it may have been necessary for some mobile build stuff in the past). This is a commandeer of #25031 Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D17687345 Pulled By: ezyang fbshipit-source-id: bf6131af835ed1f9e3c10699c81d4454a240445f	2019-10-06 09:37:50 -07:00
Pavel Belevich	2cc1e69cc9	C++ API parity: LogSigmoid Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27060 Test Plan: Imported from OSS Differential Revision: D17682404 Pulled By: pbelevich fbshipit-source-id: d60d64cd4caf1f56a2e05c516f91321d46ec9624	2019-10-05 06:18:25 -07:00
Johannes M Dieterich	17c672e704	enable rocTX API (#27416 ) Summary: ROCm 2.9 brings support for the rocTX API through rocTracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27416 Differential Revision: D17777480 Pulled By: bddppq fbshipit-source-id: 6bce9b54c94e5b4c5787570d2b85736882bd23a7	2019-10-05 01:55:00 -07:00
Junjie Bai	04436f6c60	Upgrade to ROCm 2.9 (#27417 ) Summary: New docker images built with tag 325: https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/325 Related ossci-job-dsl commits: `a00a76f927` Pull Request resolved: https://github.com/pytorch/pytorch/pull/27417 Differential Revision: D17777517 Pulled By: bddppq fbshipit-source-id: a6b8cb86b37f537d402f6d2c7d28ad28a6a5a317	2019-10-05 00:36:34 -07:00
Dmytro Dzhulgakov	bac11d1002	Tweak docs on building docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27364 Differential Revision: D17777402 Pulled By: dzhulgakov fbshipit-source-id: 304c678e5c80d7f8c779d65c11f9bf1b0facdb52	2019-10-04 22:14:37 -07:00
Ayman Saleh	e0ae3ce5e4	Docstring fix (#27225 ) Summary: Correcting docstring for `add_image_with_boxes` method. Fixed spelling mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27225 Differential Revision: D17776604 Pulled By: jerryzh168 fbshipit-source-id: 45f69643ec3b58c46b9fb67411c42a6d09b7290e	2019-10-04 21:29:36 -07:00
Shen Li	7a2e61c28e	Remove dependency on six from dist_autograd_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27369 Test Plan: Imported from OSS Differential Revision: D17763104 Pulled By: mrshenli fbshipit-source-id: dd146809686e7720f2b77012eebb6aed72851556	2019-10-04 21:24:25 -07:00
BowenBao	1741adfd3e	Use deepcopy inputs for ONNX ort test cases (#27186 ) Summary: Running models with inplace operators will change values of input tensors. Deepcopy input tensors each time to keep the original input tensors intact. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27186 Differential Revision: D17776598 Pulled By: jerryzh168 fbshipit-source-id: d4808a11185a9ab0d782a62d7d708dfe7e94559c	2019-10-04 19:01:59 -07:00
Hao Lu	1f0328c6d4	Add randomFill to test_utils.h Summary: Add helper function randomFill to test_utils.h so we can use it in benchmark scrips as well tests. Test Plan: ``` buck run mode/opt //tvm/sparse:cblas_bench ``` Reviewed By: yinghai Differential Revision: D17759193 fbshipit-source-id: e4909b04e83ca9382ab4718855fb63743d028de1	2019-10-04 18:29:22 -07:00
Junjie Bai	f4d0d0a811	Enable RCCL in ROCm build (#27383 ) Summary: continues https://github.com/pytorch/pytorch/pull/23884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27383 Differential Revision: D17767248 Pulled By: bddppq fbshipit-source-id: 3a506844ca6f01d7bbe8be5bde0976999e3a2b90	2019-10-04 17:41:41 -07:00
Zafar Takhirov	7b3881f68c	Adding docstrings for nnq.functional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27363 Test Plan: Imported from OSS Differential Revision: D17758907 Pulled By: zafartahirov fbshipit-source-id: f560f2726cf51ceebdbf22ebef2d067422340cf2	2019-10-04 17:19:47 -07:00
Wanchao Liang	b05ec828ad	Add interface/object serialization as module attribute (#26770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26770 This PR added the interface/object serialization as module attribute, to allow initializing object as a interface type during python initialization. Because interface type can be backed by any class object that implements that interface, if we declare it in python/module.__init__, we will need to collect the run time types of the value and serialize them to ensure complete code information Test Plan: Imported from OSS Differential Revision: D17742707 fbshipit-source-id: 7f614ad4f982996d320a0e2dd3515bf47370e730	2019-10-04 17:12:08 -07:00
Amy Yang	381cf2bd24	add warning to dnnlowp fc if quantization kind is not min_max Summary: Print warning when using DNNLOWP dynamic int8 quant for FC and activation_quantization_kind != min_max. Warning will display in console but not in Bento. Would have to use CAFFE_ENFORCE to alert in Bento. Test Plan: buck run unit test forcing DNNLOWP FC with activation_quantization_kind = "l2" and saw warning printed in console. Reviewed By: csummersea Differential Revision: D17770921 fbshipit-source-id: b6532e4c9a86d74e3db4cb432735505d378a366e	2019-10-04 17:03:19 -07:00
Ivan Kobzarev	afbbe16f49	Add methods to write image tensor content to buffer (#27359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27359 Adding methods to TensorImageUtils: ``` bitmapToFloatBuffer(..., FloatBuffer outBuffer, int outBufferOffset) imageYUV420CenterCropToFloat32Tensor(..., FloatBuffer outBuffer, int outBufferOffset) ``` To be able to - reuse FloatBuffer for inference - to create batch-Tensor (contains several images/bitmaps) As we reuse FloatBuffer for example demo app - image classification, profiler shows less memory allocations (before that for every run we created new input tensor with newly allocated FloatBuffer) and ~-20ms on my PixelXL Known open question: At the moment every tensor element is written separatly calling `outBuffer.put()`, which is native call crossing lang boundaries As an alternative - to allocation `float[]` on java side and fill it and put it in `outBuffer` with one call, reducing native calls, but increasing memory allocation on java side. Tested locally just eyeballing durations - have not noticed big difference - decided to go with less memory allocations. Will be good to merge into 1.3.0, but if not - demo app can use snapshot dependencies with this change. PR with integration to demo app: https://github.com/pytorch/android-demo-app/pull/6 Test Plan: Imported from OSS Differential Revision: D17758621 Pulled By: IvanKobzarev fbshipit-source-id: b4f1a068789279002d7ecc0bc680111f781bf980	2019-10-04 16:33:50 -07:00
Raghuraman Krishnamoorthi	ac0f18437f	MovingAverage Observer (#27396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27396 Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches ghstack-source-id: 91369018 Test Plan: buck test caffe2/test:quantization -- 'test_per_tensor_observers $test_quantization\.ObserverTest$' --print-passing-details buck test caffe2/test:quantization -- 'test_per_channel_observers $test_quantization\.ObserverTest$' --print-passing-details Differential Revision: D17727213 fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0	2019-10-04 16:28:59 -07:00
Ivan Kobzarev	92a2caa028	Pickup proxy parameters for publishing (#27389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27389 Pickup gradle proxy parameters (handy for publishing from devserver) in maven publishing gradle plugin Test Plan: Imported from OSS Differential Revision: D17773548 Pulled By: IvanKobzarev fbshipit-source-id: 662c0b2835e6cf1e4009da79e27268d4a19c2ceb	2019-10-04 16:21:31 -07:00
Ivan Kobzarev	18215337f4	Change nightly builds version to 1.4.0-SNAPSHOT (#27381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27381 Changing android nightly builds from master to version 1.4.0-SNAPSHOT, as we also have 1.3.0-SNAPSHOT from the branch v1.3.0 Test Plan: Imported from OSS Differential Revision: D17773620 Pulled By: IvanKobzarev fbshipit-source-id: c39a1dbf5e06f79c25367c3bc602cc8ce42cd939	2019-10-04 16:14:24 -07:00
J M Dieterich	32d009a37f	Add gfx908 to the list of per-default compiled architectures. (#27388 ) Summary: ROCm 2.8 added preliminary support for gfx908. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27388 Differential Revision: D17767772 Pulled By: bddppq fbshipit-source-id: 172daf5bb66d3db86a13e287059af4b9b90a7f57	2019-10-04 14:49:33 -07:00
J M Dieterich	6db0cc472c	add some support for the occupancy API on ROCm (#27390 ) Summary: Unfortunately, the HIP function takes uint32_t* instead of int*, so we still need to ifdef for the time being. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27390 Differential Revision: D17768832 Pulled By: bddppq fbshipit-source-id: c65176660cb0783a04f0a4a064f686818d759589	2019-10-04 14:45:53 -07:00
Junjie Bai	3c2cd8cc10	Some hipify script cleanups (#27375 ) Summary: continue https://github.com/pytorch/pytorch/issues/26363 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27375 Differential Revision: D17764992 Pulled By: bddppq fbshipit-source-id: ecc06521179677efcedb1d58ceda63df7d63627e	2019-10-04 14:43:22 -07:00
Pavel Belevich	8b61a220c0	C++ API parity: LeakyReLU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27059 Test Plan: Imported from OSS Differential Revision: D17682407 Pulled By: pbelevich fbshipit-source-id: 2a4f42e9438799ba8de7282ac7a6fd3ff97ee048	2019-10-04 14:18:03 -07:00
Rohan Varma	badb08d577	Add clip_grad_norm_ to c++ api (#26140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26140 Per https://github.com/pytorch/pytorch/issues/25883, we want to work towards C++/Python API parity. This diff adds clip_grad_norm_ to the c++ API to improve parity. ghstack-source-id: 91334333 ghstack-source-id: 91334333 Test Plan: Added a unit test Differential Revision: D17312367 fbshipit-source-id: 753ba3a4d084d01f3cc8919da3108e67c809ad65	2019-10-04 13:50:36 -07:00
Pritam Damania	646e214706	ProcessGroupNCCL should respect timeout passed in to init_process_group. (#27224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27224 As part of adding error handling to NCCL, we are now able to specify a timeout for operations using ProcessGroupNCCL. Although, this timeout had a default of 10 seconds and didn't respect the timeout specified in init_process_group. In this change, I've ensured we pass the appropriate timeout to ProcessGroupNCCL. ghstack-source-id: 91283548 Test Plan: Added unit test to verify timeout passed in to init_process_group is respected. Differential Revision: D17717992 fbshipit-source-id: c73320187f1f3b2693ba1e177d80646e282d01a2	2019-10-04 13:28:57 -07:00
Soumith Chintala	f4c37e6b32	fix OSX CI build (#27373 ) Summary: fix OSX caffe2 CI build, attempt 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27373 Differential Revision: D17768461 Pulled By: soumith fbshipit-source-id: b0a076c07382327730b5d86b8a00f5388c368b5e	2019-10-04 13:06:58 -07:00
Pavel Belevich	192ca9730f	C++ API parity: Hardtanh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27038 Test Plan: Imported from OSS Differential Revision: D17682405 Pulled By: pbelevich fbshipit-source-id: f65e76696e0041c3518f56da94f2e3b800305234	2019-10-04 12:53:33 -07:00
Rohan Varma	0be6641fbf	add function to get nccl version for error messages (#27068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27068 Adds a function that uses ncclGetVersion from the NCCL API to retrieve the NCCL version. Converts it into a readable string, and is called in NCCL-related error messages to log the NCCL version. Hopefully this will help with debugging NCCL errors. Test Plan: Modify C10D_NCCL_CHECK in NCCLUtils.hpp to always error by setting ncclResult_t error = ncclSystemError force an NCCL error with script test/simulate_nccl_errors.py: Start master node: python test/simulate_nccl_errors.py localhost 9124 0 2 Start other node: python test/simulate_nccl_errors.py localhost 9124 1 2 On the master node, should see the following error message w/NCCL version: ``` Traceback (most recent call last): File "simulate_nccl_errors.py", line 29, in <module> process_group.allreduce(torch.rand(10).cuda(rank)).wait() RuntimeError: NCCL error in: ../torch/lib/c10d/ProcessGroupNCCL.cpp:375, unhandled system error, NCCL version 2.4.8 ``` Differential Revision: D17639476 fbshipit-source-id: a2f558ad9e883b6be173cfe758ec56cf140bc1ee	2019-10-04 12:49:45 -07:00
davidriazati	a33dbccf60	Fix some return std::move warnings (#27384 ) Summary: clang-tidy was complaining about these Pull Request resolved: https://github.com/pytorch/pytorch/pull/27384 Pulled By: driazati Differential Revision: D17767412 fbshipit-source-id: 03e2630790edf3f6bbf9064e754156613032b464	2019-10-04 12:30:13 -07:00
davidriazati	a6bb8b52d4	Reduce error context from 10 -> 3 (#26765 ) Summary: 10 lines of error context (on both sides) is overkill, especially now that we have line numbers. With a compilation stack of a couple functions, it becomes a pain to scroll to the top of the stack to see the real error every time. This also fixes class names in the compilation stack to a format of `ClassName.method_name` instead of the the full qualified name Old output ``` clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor): Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'. : at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20 top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level) batch_idx = torch.arange(num_images, device=device)[:, None] objectness = objectness[batch_idx, top_n_idx] levels = levels[batch_idx, top_n_idx] proposals = proposals[batch_idx, top_n_idx] final_boxes = [] final_scores = [] for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes): boxes = box_ops.clip_boxes_to_image(boxes, img_shape) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE keep = box_ops.remove_small_boxes(boxes, self.min_size) boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep] # non-maximum suppression, independently done per level keep = box_ops.batched_nms(boxes, scores, lvl, self.nms_thresh) # keep only topk scoring predictions keep = keep[:self.post_nms_top_n] boxes, scores = boxes[keep], scores[keep] final_boxes.append(boxes) final_scores.append(scores) 'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8 num_images = len(anchors) num_anchors_per_level = [o[0].numel() for o in objectness] objectness, pred_bbox_deltas = \ concat_box_prediction_layers(objectness, pred_bbox_deltas) # apply pred_bbox_deltas to anchors to obtain the decoded proposals # note that we detach the deltas because Faster R-CNN do not backprop through # the proposals proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors) proposals = proposals.view(num_images, -1, 4) boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE losses = {} if self.training: assert targets is not None labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets) regression_targets = self.box_coder.encode(matched_gt_boxes, anchors) loss_objectness, loss_rpn_box_reg = self.compute_loss( objectness, pred_bbox_deltas, labels, regression_targets) losses = { 'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8 """ if self.training and targets is None: raise ValueError("In training mode, targets should be passed") original_image_sizes = [(img.shape[-2], img.shape[-3]) for img in images] images, targets = self.transform(images, targets) features = self.backbone(images.tensors) if isinstance(features, torch.Tensor): features = OrderedDict([(0, features)]) proposals, proposal_losses = self.rpn(images, features, targets) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets) detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes) losses = {} losses.update(detector_losses) losses.update(proposal_losses) # TODO: multiple return types?? # if self.training: ``` New output ``` RuntimeError: clip_boxes_to_image(Tensor boxes, (int, int) size) -> (Tensor): Expected a value of type 'Tuple[int, int]' for argument 'size' but instead found type 'Tuple[int, int, int]'. : at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:365:20 final_scores = [] for boxes, scores, lvl, img_shape in zip(proposals, objectness, levels, image_shapes): boxes = box_ops.clip_boxes_to_image(boxes, img_shape) ~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE keep = box_ops.remove_small_boxes(boxes, self.min_size) boxes, scores, lvl = boxes[keep], scores[keep], lvl[keep] 'RegionProposalNetwork.filter_proposals' is being compiled since it was called from 'RegionProposalNetwork.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/rpn.py:446:8 proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors) proposals = proposals.view(num_images, -1, 4) boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE losses = {} 'RegionProposalNetwork.forward' is being compiled since it was called from 'MaskRCNN.forward' at /home/davidriazati/dev/vision/torchvision/models/detection/generalized_rcnn.py:53:8 if isinstance(features, torch.Tensor): features = OrderedDict([(0, features)]) proposals, proposal_losses = self.rpn(images, features, targets) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets) detections = self.transform.postprocess ``` ](https://our.intern.facebook.com/intern/diff/17560963/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26765 Pulled By: driazati Differential Revision: D17560963 fbshipit-source-id: e463548744b505ca17f0158079b80e08fda47d49	2019-10-04 11:24:52 -07:00
Pim de Haan	9f9c6c0999	From docs of scatter_add_() removed erroneous comment on uniqueness of indices. (#27132 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27080 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27132 Differential Revision: D17765307 Pulled By: soumith fbshipit-source-id: b0892ff442f3b49f8e3cdf029e2a08b51fa88f28	2019-10-04 11:02:19 -07:00
Igor Fedan	50b3f9d815	Allow use cpu_serial_kernel with void-lambda (#27370 ) Summary: https://github.com/pytorch/pytorch/pull/27271 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27370 Differential Revision: D17763265 Pulled By: ifedan fbshipit-source-id: d670560dfc555db529b18c01aa42f0ccb2127889	2019-10-04 10:04:44 -07:00
Martin Yuan	19ab5381c3	Add OPN instruction and vararg operator table (#27104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27104 * The use case here is to replace prim::ListConstruct, which requires Node, but Node is not available in mobile lite interpreter. * (OPN, X, N), X is the index to the vararg operator-name and operator tables. N is number of inputs. For ListConstruct example, operator name can be "aten::listconstruct" and the overloaded name is the output type ("int", "float", "bool", "tensor" and "generic"). * A vararg operator table is built with void(int input_size, Stack& stack) functions. ## Unit test LiteInterpreterConv covers OPN instruction and conv operator. Test Plan: Imported from OSS Differential Revision: D17762853 fbshipit-source-id: 475aa0c6678e3760cec805862a78510913a89c83	2019-10-04 09:35:53 -07:00
Shihao Xu	e166bcbbde	Make RpcTest re-usable by other RPC backends by using init_method to initialize a RPC backend (#27320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27320 https://github.com/pytorch/pytorch/pull/27208/ # Problem Other RPC backends take init_method. # Solution Set up init_method in rpc tests. ghstack-source-id: 91335127 Differential Revision: D17709219 fbshipit-source-id: 3184c6e9b922a6ff9f4d1cb9abfa118b23f43eeb	2019-10-04 09:20:05 -07:00
Vincent Quenneville-Belair	28b1f586f6	Change schedulers to chainable form (#26423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26423 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` # ghstack This contains the changes from #24352. Opening again since they were reverted. This reverts commit 1c477b7e1f378e9c1f8efed296241f68a8a4372b. Test Plan: Imported from OSS Differential Revision: D17460427 Pulled By: vincentqb fbshipit-source-id: 8c10f4e7246d6756ac91df734e8bed65bdef63c9	2019-10-04 08:53:14 -07:00
Anjali Chourdia	da669c25ee	autograd: double backwards function for binary_cross_entropy loss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26983 Reviewed By: albanD Differential Revision: D17714357 Pulled By: anjali411 fbshipit-source-id: cebfe09a9048c4be457b7f2718bc396c06ecabee	2019-10-04 08:29:22 -07:00
Sameer Deshmukh	c389156fc4	move new_zeros to core from THP (#26511 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/25831 ezyang can you please have a look? Pull Request resolved: https://github.com/pytorch/pytorch/pull/26511 Differential Revision: D17763037 Pulled By: ezyang fbshipit-source-id: 3596c01c4ab421e7785d6055cc813806f840a5c7	2019-10-04 08:23:35 -07:00
Pearu Peterson	b7fb2b8862	Implement pickle support for sparse tensors and torch.layout instances (#27062 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/16667 and https://github.com/OpenMined/PySyft/issues/2326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27062 Differential Revision: D17762932 Pulled By: ezyang fbshipit-source-id: dd99c1f4ac8eb2286eb55aa20ce973f60ce7b7e1	2019-10-04 08:09:32 -07:00
Edward Yang	76fc028533	Revert D17743310: [pytorch][PR] Allow use cpu_serial_kernel with void-lambda Test Plan: revert-hammer Differential Revision: D17743310 Original commit changeset: a149751f2d67 fbshipit-source-id: 043240201d67966dd08b7b1bc2f9bf4897923e00	2019-10-04 08:00:49 -07:00
Hong Xu	081069e8ca	Remove CUDA_VERSION from Python script (which has already been detected in CMake) (#27316 ) Summary: (Intentionally left blank) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27316 Differential Revision: D17762715 Pulled By: ezyang fbshipit-source-id: 044c0ea6e8c2d12912c946a9a50b934b5253d8c8	2019-10-04 07:49:57 -07:00
Richard Zou	e29baaca3d	Make `align_to` method-only. (#27304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27304 The ellipsis version of `align_to` only works if it is called as a method. To prevent any confusion, this PR disables `torch.align_to` (but keeps `Tensor.align_to`. Test Plan: - [namedtensor ci] Differential Revision: D17743809 Pulled By: zou3519 fbshipit-source-id: cf5c53dcf45ba244f61bb1e00e4853de5db6c241	2019-10-04 07:18:52 -07:00
Francisco Massa	13c39c8ecc	Remove six dependency (#27282 ) Summary: https://github.com/pytorch/pytorch/pull/27136 added a dependency on `six`, which is not available by default and is not marked as a dependency on PyTorch binaries, causing torchvision CI to break, see https://circleci.com/gh/pytorch/vision/20778?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link for example. This PR use `torch._six` instead of `six` as a replacement. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27282 Reviewed By: lerks Differential Revision: D17737561 Pulled By: fmassa fbshipit-source-id: 7dcd0cc2c8bab27b8f4535f664f60388818d3497	2019-10-04 04:56:25 -07:00
Mike Ruberry	a7de545c63	Makes test_cuda.py's generated tensor op tests generic (#27210 ) Summary: - The tensor op tests generated in test_cuda.py are now generic and appear in test_torch,py - Data previously held in auxiliary data structures and files, like test_cuda_ignores.txt, is inlined Previously the tensor op tests used several auxiliary data structures, a file, and exception handling to filter the test suite. If a function wasn't implemented, for example, that exception would be caught. This let functions like trigamma, which isn't callable, appear to be tested. See https://github.com/pytorch/pytorch/issues/27230. Filtering from additional data stores is error prone, too. It requires developers understand what data stores are used and how they're used. The existing sources are also sometimes incorrect. The txt file claims that dist_ doesn't work on half tensors, for example, but the updated tests verify it does. In addition to making these tests generic, this PR removes those auxiliary data structures and does not catch any exceptions. Exceptions are errors. (This also means that if something implemented breaks it will now report as an error. Previously the test suite would have reported a pass.) The test infrastructure was also simplified to not perform computations with CPU half tensors since they do not support many operations. This introduces a float<->half conversion quirk but eliminates awkward functions that would first convert cpu tensors to float, perform an operation, and convert them back. With this change test_cuda.py is almost entirely CUDA-specific. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27210 Differential Revision: D17757907 Pulled By: mruberry fbshipit-source-id: b3c191c379667b1a7d5361087bdf82f397f77f65	2019-10-04 02:40:59 -07:00
Mike Ruberry	527b10c2d1	Fixes PackedSequence.to (and unifies PackedSequence conversions) (#27245 ) Summary: PackedSequence.to(device) incorrectly places one of three tensors on the device and leaves the other two tensors where they are. If these devices are distinct then further operations on PackedSequence will fail. This behavior is inconsistent with Tensor.to and PackedSequence's behavior when .cuda() is called. Additionally, PackedSequence defines multiple other conversion functions that were independently and inconsistently implemented. This PR unifies all implementations and makes the PackedSequence.to behavior more consistent with Tensor.to. It is not completely consistent per comments. test_device_mask in test_nn.py is updated to validate the new functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27245 Differential Revision: D17757850 Pulled By: mruberry fbshipit-source-id: 58f0bd40f1aa300fb0a91ee743483d645f977dc5	2019-10-04 02:22:41 -07:00
Junjie Bai	76f847546b	Enable Python3.6 PyTorch ROCm CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27353 Differential Revision: D17758495 Pulled By: bddppq fbshipit-source-id: 95e329bc30f092e4093a33c408f1647b803d9983	2019-10-04 00:23:37 -07:00
Peter Bell	d0a4b2f586	Choose num_threads in parallel_for based on GRAIN_SIZE (#26963 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24080, Continuation of https://github.com/pytorch/pytorch/issues/26886 What soumith said in https://github.com/pytorch/pytorch/pull/26886#issuecomment-535760635 seems plausible > I wonder if it has to do with `#pragma omp parallel num_threads(num_threads)` which has unintended consequences, where even if `num_threads=1`, entering an omp block inside an omp block results in bad behavior. I know for a fact that gcc's openmp doesn't start the thread pool when given `num_threads(1)` but it seems clang behaves differently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26963 Differential Revision: D17626981 Pulled By: soumith fbshipit-source-id: 484ffe6cc172382bb5ff49ce1fceda7eba20a512	2019-10-03 23:31:39 -07:00
sribkain	42e7eb0426	Minor readability fixes to C++ documentation (#27338 ) Summary: Changed `yieldings` to `yielding`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27338 Differential Revision: D17758406 Pulled By: yf225 fbshipit-source-id: 1633834a6ad80449c061ebc330ac24f3e42f5506	2019-10-03 21:45:35 -07:00
Zachary DeVito	2ea1d3d01f	refactor extra sugared values (#26270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26270 We've accumulated a lot of sugared values whose only purpose is to be instanced-checked against in emitApplyExpr. I need to add another one to insert an unchecked_cast, and do not want to continue the pattern. This creates an abstraction for this concept (SpecialFormValue), and removes all the unneeded sugared values. There is no functionality change here just a bunch of code movement in compiler.cpp Test Plan: Imported from OSS Differential Revision: D17412854 Pulled By: zdevito fbshipit-source-id: 15877c91decaea5a00d1fe737ed2d0f0f8a79a28	2019-10-03 21:25:05 -07:00
Zachary DeVito	9ade1e6944	improve error messages when a method or attribute is missing (#27110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27110 Previously missing methods on some types like tensors would talk about 'builtins' which are only a thing inside of the compiler. Furthermore, the error would only occur when the builtin was applied and it was discovered that no builtin existed. This changes the error message so that it discovers that method on our builtin types does not exist on attribute lookup. Test Plan: Imported from OSS Differential Revision: D17677616 Pulled By: zdevito fbshipit-source-id: 2f7cf6c6093a9c832569c44f4b1044a2e56fe205	2019-10-03 21:25:01 -07:00
Zafar Takhirov	ef97841147	Show a warning that not all dir members of quantized work. (#27339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27339 This PR just shows a warning message. Eventually we will show a correct __dir__ Test Plan: Imported from OSS Differential Revision: D17751333 Pulled By: zafartahirov fbshipit-source-id: e9bc62fd8dd0147979291d0aac3f1afe5b8c7a9f	2019-10-03 20:48:04 -07:00
Zafar Takhirov	6bb7433ad5	Replacing the skip_list with white_list in the qconfig propagation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27183 Test Plan: Imported from OSS Differential Revision: D17700548 Pulled By: zafartahirov fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16	2019-10-03 20:40:17 -07:00
Negin Raoof	c874dd91a7	export remainder (#24410 ) Summary: Added ONNX export support for torch.remainder and torch.fmod Pull Request resolved: https://github.com/pytorch/pytorch/pull/24410 Reviewed By: hl475 Differential Revision: D17466791 Pulled By: houseroad fbshipit-source-id: afe6519e5f370824e3b4a45b69036a7260fb72cf	2019-10-03 20:15:20 -07:00
Tao Xu	736c754739	add sdk support for xcodebuild script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27358 Test Plan: Imported from OSS Differential Revision: D17757389 Pulled By: xta0 fbshipit-source-id: ed8e470b9c6329b96297ee7c65ba08759251baad	2019-10-03 20:11:08 -07:00
Junjie Bai	c3d97c2638	Update to ROCm 2.8 (#27337 ) Summary: New docker images built with tag 324. Related jenkins changes: `83ec813357` `aa235a14c8` Triggered CI runs: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-devtoolset7-rocmrpm-centos7.5-trigger-test/48682/ https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-trigger/55638/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/27337 Differential Revision: D17753827 Pulled By: bddppq fbshipit-source-id: 2c3f77b0b7c680013c7cc6d7953fe0da4922fe48	2019-10-03 20:03:28 -07:00
Shihao Xu	86a8971ebb	Add a test case to RpcTest, check src/dst (#27322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27322 # Problem Existing test cases are too symmetric, so that didn't detect this error, request sent to the wrong worker. Because of wrong `worker_names` setup, worker0 sends request to itself, while it should had sent to worker1. # Solution Add a test case, letting the dst side to check if it's an request from the expected src. ghstack-source-id: 91299312 Reviewed By: satgera Differential Revision: D17069062 fbshipit-source-id: ef7a532dd497bfc0f0ee8446fcd5d29656aaf175	2019-10-03 18:59:59 -07:00
Jeremy Lilley	f5df46ce39	Set MINIZ_NO_TIME to avoid computing localtime on each pickle/unpickle (#27268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27268 For small pickle/unpickle, we spend a disproportionate amount of time in time functions - roughly 23% in __tzset() for unpickle case. We're currently not using the .m_time currently, though we can add this feature back if it's ever needed. An alternative would be to -DMINIZ_NO_TIME in compiler_flags, but we would need to also consistently # define MINIZ_NO_TIME in any .cpp including this .h, since this # define modifies the struct length in an unfortunate manner. Test Plan: buck test mode/dev-nosan caffe2/test/... Run benchmark: buck-out/opt/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ThriftRpcAgentBench Differential Revision: D17724198 fbshipit-source-id: b44a0217b1d9f8ce6c0f24297f59045c7cadf4b1	2019-10-03 17:59:33 -07:00
Shen Li	2486b0ba82	Add Python RRef as args and return value (#25499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25499 See #23110 for model parallel design details, and #26759 for the RRef protocol. This commit add support for using RRef as Python UDF arguments and return value. RRefs can now be shared from owner to user, from user to owner, or from user to user. Limitations: 1. No implicit type conversion yet. (#27099) 2. No failure handling and retry. (#26116) 3. UDF is not yet blocked until all RRefs are confirmed. (#27098) 4. Internal RRef control messages are not idempotent yet. (#26116) 5. Cannot delete RRefs correctly when there are circular dependencies. (#27096) Main changes: 1. Added `SCRIPT_REMOTE_CALL` and `PYTHON_REMOTE_CALL` to `Message.h` to represent `dist.remote` invocations. 2. Added `SCRIPT_RREF_FETCH_CALL`, `PYTHON_RREF_FETCH_CALL`, `RREF_USER_ACCEPT`, `RREF_USER_DELETE`, `RREF_CHILD_ACCEPT`, and `RREF_FORK_REQUEST` to `Message.h` as internal RRef control messages. 3. New message request handling code is added to `functions.cpp`, and message format is added in `script_remote_call.h`, `python_remote_call.h`, and `rref_proto.h`. 4. Added a `PyRRef` type in `py_rref.h` and `py_rref.cpp` which holds a shared pointer to C++ `RRef` type. `PyRRef` wraps the C++ API and also implements RRef pickling and unpickling. RRef fork related control messages will be sent during RRef pickling/unpickling procedure. 5. Update `RRef.h` and `RRef.cpp` accordingly to support `py::object` RRefs. 6. RRef context (reference count, etc.) are tracked in `rref_context.h` and `rref_context.cpp`. Test Plan: Imported from OSS buck test mode/dev-nosan //caffe2/test:rpc_fork Differential Revision: D17184146 Pulled By: mrshenli fbshipit-source-id: a3a268efc087ac1ef489136ab957080382629265	2019-10-03 17:47:12 -07:00
davidriazati	8fe5dcf699	Skip tests that use numpy if it's not present Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27165 Pulled By: driazati Differential Revision: D17695078 fbshipit-source-id: d25c920f4c43285028537f88761d47a2c9db7b8f	2019-10-03 17:18:41 -07:00
Wanchao Liang	827a00cf63	Support interface python assignment as an attribute (#26734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26734 This PR added the python assignment for interface as an attribute in the module, it enables any object that implicitly inheriting the specific interface to be able to be assigned to the interface type in python. Serialization support for interface/class assignment will be done in the follow up PR Test Plan: Imported from OSS Differential Revision: D17742708 Pulled By: wanchaol fbshipit-source-id: a0a2d8c74b60ed3fa6c05e1b0d49b7ad1abc670b	2019-10-03 17:18:37 -07:00
Orion Reblitz-Richardson	cc964765a5	Add method add_hparams to API doc (#27344 ) Summary: Adds the method `add_hparams` to `torch.utils.tensorboard` API docs. Will want to have this in PyTorch 1.3 release. cc sanekmelnikov lanpa natalialunova Pull Request resolved: https://github.com/pytorch/pytorch/pull/27344 Differential Revision: D17753689 Pulled By: orionr fbshipit-source-id: cc8636e0bdcf3f434444cd29471c62105491039d	2019-10-03 17:07:45 -07:00
Summer Deng	99c32d97fa	Migrate the cpu and gpu implementations of resize nearest 3D from vision to caffe2 Summary: As title. Fix the build failures in unicorn-build-restrictions as discussed in D17330625 Test Plan: buck test mode/opt caffe2/caffe2/quantization/server:resize_nearest_3d_dnnlowp_op_test In vision libs, no need to explicitly add dep to resize 3d op as the caffe2_cpu dep is added by default. Reviewed By: stephenyan1231 Differential Revision: D17676082 fbshipit-source-id: c034ab67a9078f72077b396991ffb9e54e6ab40b	2019-10-03 16:14:00 -07:00
Ilia Cherniavskii	74572fc985	Relax restrictions on set_num_threads (#27190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27190 Allow set_num_threads to be called multiple times in case of TBB parallel backend Test Plan: BUILD_BINARY=1 USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake ./build/bin/test_parallel ./build/bin/thread_init_test Reviewed By: kostmo Differential Revision: D17704236 Pulled By: ilia-cher fbshipit-source-id: 274380795e78ba417301c5faa18c9e9d3198bd5e	2019-10-03 15:51:03 -07:00
Ilia Cherniavskii	a444054d4b	Fix build (#27318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27318 Fix TBB build USE_TBB=1 ATEN_THREADING=TBB python setup.py develop install --cmake Test Plan: Imported from OSS Differential Revision: D17747449 Pulled By: ilia-cher fbshipit-source-id: 421f362bd10f3be34bffe86ae4f26e8f1c15f1a4	2019-10-03 15:43:06 -07:00
Pavel Belevich	05df6b67c6	C++ API parity: TensorTest.BackwardNonScalarOutputs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27314 Test Plan: Imported from OSS Differential Revision: D17746371 Pulled By: pbelevich fbshipit-source-id: 246fae22a60ed9a6d7b9843239b4b3391cc9dc3e	2019-10-03 15:36:35 -07:00
Nathan Goldbaum	0c4bc27539	Mention magma-cuda101 package in install instructions (#27325 ) Summary: There is a magma package for the newest CUDA verson (10.1), mention it here lest someone try to mistakenly use the version for CUDA 10.0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27325 Differential Revision: D17749535 Pulled By: soumith fbshipit-source-id: 2d34a7af1218e6157935bfd5e03f4d2c0f00f200	2019-10-03 15:21:53 -07:00

1769 changed files with 117202 additions and 57098 deletions

									
										12

.circleci/README.md
									
												View File
												
				@ -340,12 +340,12 @@ Libtorch packages are built in the wheel build scripts: manywheel/build_*.sh for

				All linux builds occur in docker images. The docker images are

				* soumith/conda-cuda

				* pytorch/conda-cuda

				    * Has ALL CUDA versions installed. The script pytorch/builder/conda/switch_cuda_version.sh sets /usr/local/cuda to a symlink to e.g. /usr/local/cuda-10.0 to enable different CUDA builds

				    * Also used for cpu builds

				* soumith/manylinux-cuda90

				* soumith/manylinux-cuda92

				* soumith/manylinux-cuda100

				* pytorch/manylinux-cuda90

				* pytorch/manylinux-cuda92

				* pytorch/manylinux-cuda100

				    * Also used for cpu builds

				The Dockerfiles are available in pytorch/builder, but there is no circleci job or script to build these docker images, and they cannot be run locally (unless you have the correct local packages/paths). Only Soumith can build them right now.

				@ -411,7 +411,7 @@ You can build Linux binaries locally easily using docker.

				```

				# Run the docker

				# Use the correct docker image, soumith/conda-cuda used here as an example

				# Use the correct docker image, pytorch/conda-cuda used here as an example

				#

				# -v path/to/foo:path/to/bar makes path/to/foo on your local machine (the

				#    machine that you're running the command on) accessible to the docker

				@ -426,7 +426,7 @@ docker run \

				    -v your/pytorch/repo:/pytorch \

				    -v your/builder/repo:/builder \

				    -v where/you/want/packages/to/appear:/final_pkgs \

				    -it soumith/conda-cuda /bin/bash

				    -it pytorch/conda-cuda /bin/bash

				# Export whatever variables are important to you. All variables that you'd

				# possibly need are in .circleci/scripts/binary_populate_env.sh

									
										2

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -1,5 +1,3 @@

				#!/usr/bin/env python3

				"""

				This module models the tree of configuration variants

				for "smoketest" builds.

									
										58

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -1,11 +1,8 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				import cimodel.data.binary_build_data as binary_build_data

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.visualization as visualization

				class Conf(object):

				@ -27,7 +24,7 @@ class Conf(object):

				    def gen_docker_image(self):

				        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

				            return miniutils.quote("soumith/conda-cuda-cxx11-ubuntu1604:latest")

				            return miniutils.quote("pytorch/conda-cuda-cxx11-ubuntu1604:latest")

				        docker_word_substitution = {

				            "manywheel": "manylinux",

				@ -36,18 +33,23 @@ class Conf(object):

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        # The cpu nightlies are built on the soumith/manylinux-cuda100 docker image

				        # The cpu nightlies are built on the pytorch/manylinux-cuda100 docker image

				        alt_docker_suffix = self.cuda_version or "100"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        return miniutils.quote("soumith/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				        if self.cuda_version == "101":

				            return "soumith/manylinux-cuda101@sha256:5d62be90d5b7777121180e6137c7eed73d37aaf9f669c51b783611e37e0b4916"

				        return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				    def get_name_prefix(self):

				        return "smoke" if self.smoke else "binary"

				    def gen_build_name(self, build_or_test):

				    def gen_build_name(self, build_or_test, nightly):

				        parts = [self.get_name_prefix(), self.os] + self.gen_build_env_parms()

				        if nightly:

				            parts.append("nightly")

				        if self.libtorch_variant:

				            parts.append(self.libtorch_variant)

				@ -57,17 +59,22 @@ class Conf(object):

				        joined = "_".join(parts)

				        return joined.replace(".", "_")

				    def gen_workflow_job(self, phase, upload_phase_dependency=None):

				    def gen_workflow_job(self, phase, upload_phase_dependency=None, nightly=False):

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase)

				        job_def["name"] = self.gen_build_name(phase, nightly)

				        job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))

				        job_def["requires"] = ["setup"]

				        job_def["filters"] = {"branches": {"only": "nightly"}}

				        if self.smoke:

				            job_def["requires"].append("update_s3_htmls_for_nightlies")

				            job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")

				            job_def["filters"] = {"branches": {"only": "postnightly"}}

				        else:

				            job_def["filters"] = {"branches": {"only": "nightly"}}

				        if self.libtorch_variant:

				            job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)

				        if phase == "test":

				            if not self.smoke:

				                job_def["requires"].append(self.gen_build_name("build"))

				                job_def["requires"].append(self.gen_build_name("build", nightly))

				            if not (self.smoke and self.os == "macos"):

				                job_def["docker_image"] = self.gen_docker_image()

				@ -82,7 +89,7 @@ class Conf(object):

				                job_def["resource_class"] = "gpu.medium"

				        if phase == "upload":

				            job_def["context"] = "org-member"

				            job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency)]

				            job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency, nightly)]

				        os_name = miniutils.override(self.os, {"macos": "mac"})

				        job_name = "_".join([self.get_name_prefix(), os_name, phase])

				@ -127,7 +134,7 @@ def get_nightly_uploads():

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"

				        mylist.append(conf.gen_workflow_job("upload", phase_dependency))

				        mylist.append(conf.gen_workflow_job("upload", phase_dependency, nightly=True))

				    return mylist

				@ -138,32 +145,25 @@ def get_nightly_tests():

				    tests = []

				    for conf_options in filtered_configs:

				        yaml_item = conf_options.gen_workflow_job("test")

				        yaml_item = conf_options.gen_workflow_job("test", nightly=True)

				        tests.append(yaml_item)

				    return tests

				def add_jobs_and_render(jobs_dict, toplevel_key, smoke, cron_schedule):

				    jobs_list = ["setup"]

				def get_jobs(toplevel_key, smoke):

				    jobs_list = []

				    configs = gen_build_env_list(smoke)

				    phase = "build" if toplevel_key == "binarybuilds" else "test"

				    for build_config in configs:

				        jobs_list.append(build_config.gen_workflow_job(phase))

				        jobs_list.append(build_config.gen_workflow_job(phase, nightly=True))

				    jobs_dict[toplevel_key] = OrderedDict(

				        jobs=jobs_list,

				    )

				    graph = visualization.generate_graph(get_root(smoke, toplevel_key))

				    graph.draw(toplevel_key + "-config-dimensions.png", prog="twopi")

				    return jobs_list

				def add_binary_build_jobs(jobs_dict):

				    add_jobs_and_render(jobs_dict, "binarybuilds", False, "5 5 * * *")

				def get_binary_build_jobs():

				    return get_jobs("binarybuilds", False)

				def add_binary_smoke_test_jobs(jobs_dict):

				    add_jobs_and_render(jobs_dict, "binarysmoketests", True, "15 16 * * *")

				def get_binary_smoke_test_jobs():

				    return get_jobs("binarysmoketests", True)

									
										38

.circleci/cimodel/data/caffe2_build_data.py
									
												View File
												
				@ -1,38 +1,11 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				from cimodel.lib.conf_tree import ConfigNode, XImportant

				from cimodel.lib.conf_tree import Ver

				CONFIG_TREE_DATA = [

				    (Ver("ubuntu", "14.04"), [

				        (Ver("gcc", "4.8"), [X("py2")]),

				        (Ver("gcc", "4.9"), [X("py2")]),

				    ]),

				    (Ver("ubuntu", "16.04"), [

				        (Ver("cuda", "9.0"), [

				            # TODO make explicit that this is a "secret TensorRT build"

				            #  (see https://github.com/pytorch/pytorch/pull/17323#discussion_r259446749)

				            # TODO Uh oh, were we supposed to make this one important?!

				            X("py2"),

				            XImportant("cmake"),

				        ]),

				        (Ver("cuda", "10.1"), [XImportant("py3.5")]),  # TensorRT 6 build

				        (Ver("mkl"), [XImportant("py2")]),

				        (Ver("gcc", "5"), [XImportant("onnx_py2")]),

				        (Ver("clang", "3.8"), [X("py2")]),

				        (Ver("clang", "3.9"), [X("py2")]),

				        (Ver("clang", "7"), [XImportant("py2"), XImportant("onnx_py3.6")]),

				        (Ver("android"), [XImportant("py2")]),

				    ]),

				    (Ver("centos", "7"), [

				        (Ver("cuda", "9.0"), [X("py2")]),

				    ]),

				    (Ver("macos", "10.13"), [

				        # TODO ios and system aren't related. system qualifies where the python comes

				        #  from (use the system python instead of homebrew or anaconda)

				        (Ver("ios"), [X("py2")]),

				        (Ver("system"), [XImportant("py2")]),

				        ([Ver("gcc", "5")], [XImportant("onnx_py2")]),

				        ([Ver("clang", "7")], [XImportant("onnx_py3.6")]),

				    ]),

				]

				@ -56,13 +29,12 @@ class TreeConfigNode(ConfigNode):

				    def is_build_only(self):

				        if str(self.find_prop("language_version")) == "onnx_py3.6":

				            return False

				        return str(self.find_prop("compiler_version")) in [

				            "gcc4.9",

				        return set(str(c) for c in self.find_prop("compiler_version")).intersection({

				            "clang3.8",

				            "clang3.9",

				            "clang7",

				            "android",

				        ] or self.find_prop("distro_version").name == "macos"

				        }) or self.find_prop("distro_version").name == "macos"

				class TopLevelNode(TreeConfigNode):

									
										31

.circleci/cimodel/data/caffe2_build_definitions.py
									
												View File
												
				@ -1,5 +1,3 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				import cimodel.data.dimensions as dimensions

				@ -14,23 +12,29 @@ from dataclasses import dataclass

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = 315

				DOCKER_IMAGE_VERSION = 345

				@dataclass

				class Conf:

				    language: str

				    distro: Ver

				    compiler: Ver

				    # There could be multiple compiler versions configured (e.g. nvcc

				    # for gpu files and host compiler (gcc/clang) for cpu files)

				    compilers: [Ver]

				    build_only: bool

				    is_important: bool

				    @property

				    def compiler_names(self):

				        return [c.name for c in self.compilers]

				    # TODO: Eventually we can probably just remove the cudnn7 everywhere.

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_py2" \

				            or self.language == "onnx_py3.6" \

				            or self.compiler.name in ["android", "mkl", "clang"] \

				            or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				        return [] if omit else ["cudnn7"]

				@ -42,7 +46,7 @@ class Conf:

				        ] + self.get_build_name_middle_parts()

				    def get_build_name_middle_parts(self):

				        return [str(self.compiler)] + self.get_cudnn_insertion() + [str(self.distro)]

				        return [str(c) for c in self.compilers] + self.get_cudnn_insertion() + [str(self.distro)]

				    def construct_phase_name(self, phase):

				        root_parts = self.get_build_name_root_parts()

				@ -82,11 +86,11 @@ class Conf:

				        build_env_name = "-".join(parts)

				        parameters["build_environment"] = miniutils.quote(build_env_name)

				        if self.compiler.name == "ios":

				        if "ios" in self.compiler_names:

				            parameters["build_ios"] = miniutils.quote("1")

				        if phase == "test":

				            # TODO cuda should not be considered a compiler

				            if self.compiler.name == "cuda":

				            if "cuda" in self.compiler_names:

				                parameters["use_cuda_docker_runtime"] = miniutils.quote("1")

				        if self.distro.name != "macos":

				@ -94,7 +98,7 @@ class Conf:

				            if self.build_only:

				                parameters["build_only"] = miniutils.quote("1")

				        if phase == "test":

				            resource_class = "large" if self.compiler.name != "cuda" else "gpu.medium"

				            resource_class = "large" if "cuda" not in self.compiler_names else "gpu.medium"

				            parameters["resource_class"] = resource_class

				        return parameters

				@ -127,11 +131,10 @@ def instantiate_configs():

				    root = get_root()

				    found_configs = conf_tree.dfs(root)

				    for fc in found_configs:

				        c = Conf(

				            language=fc.find_prop("language_version"),

				            distro=fc.find_prop("distro_version"),

				            compiler=fc.find_prop("compiler_version"),

				            compilers=fc.find_prop("compiler_version"),

				            build_only=fc.find_prop("build_only"),

				            is_important=fc.find_prop("important"),

				        )

				@ -145,12 +148,8 @@ def get_workflow_jobs():

				    configs = instantiate_configs()

				    # TODO Why don't we build this config?

				    # See https://github.com/pytorch/pytorch/pull/17323#discussion_r259450540

				    filtered_configs = filter(lambda x: not (str(x.distro) == "ubuntu14.04" and str(x.compiler) == "gcc4.9"), configs)

				    x = []

				    for conf_options in filtered_configs:

				    for conf_options in configs:

				        phases = ["build"]

				        if not conf_options.build_only:

									
										3

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -1,6 +1,3 @@

				#!/usr/bin/env python3

				PHASES = ["build", "test"]

				CUDA_VERSIONS = [

									
										56

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -1,5 +1,3 @@

				#!/usr/bin/env python3

				from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				@ -12,27 +10,25 @@ CONFIG_TREE_DATA = [

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("4.8", [X("3.6")]),

				            ("5.4", [  # All this subtree rebases to master and then build

				                XImportant("3.6"),

				                ("3.6", [

				                    ("namedtensor", [XImportant(True)]),

				                    ("parallel_tbb", [XImportant(True)]),

				                    ("parallel_native", [XImportant(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				            ("7", [X("3.6")]),

				        ]),

				        ("clang", [

				            ("5", [

				                XImportant("3.6"),  # This is actually the ASAN build

				                ("3.6", [

				                    ("namedtensor", [XImportant(True)]),  # ASAN

				                ]),

				            ]),

				            ("7", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                ]),

				            ]),

				            # ("7", [

				            #     ("3.6", [

				            #         ("xla", [XImportant(True)]),

				            #     ]),

				            # ]),

				        ]),

				        ("cuda", [

				            ("9", [

				@ -43,10 +39,9 @@ CONFIG_TREE_DATA = [

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153

				                # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)

				                X("2.7"),

				                XImportant("3.6"),

				                ("2.7", [

				                    ("namedtensor", [XImportant(True)]),

				                ("3.6", [

				                    ("libtorch", [XImportant(True)])

				                ]),

				            ]),

				            ("9.2", [X("3.6")]),

				@ -129,7 +124,9 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				        next_nodes = {

				            "xla": XlaConfigNode,

				            "namedtensor": NamedTensorConfigNode,

				            "parallel_tbb": ParallelTBBConfigNode,

				            "parallel_native": ParallelNativeConfigNode,

				            "libtorch": LibTorchConfigNode,

				            "important": ImportantConfigNode,

				            "android_abi": AndroidAbiConfigNode,

				        }

				@ -146,13 +143,32 @@ class XlaConfigNode(TreeConfigNode):

				    def child_constructor(self):

				        return ImportantConfigNode

				class NamedTensorConfigNode(TreeConfigNode):

				class ParallelTBBConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "NAMEDTENSOR=" + str(label)

				        return "PARALLELTBB=" + str(label)

				    def init2(self, node_name):

				        self.props["is_namedtensor"] = node_name

				        self.props["parallel_backend"] = "paralleltbb"

				    def child_constructor(self):

				        return ImportantConfigNode

				class ParallelNativeConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PARALLELNATIVE=" + str(label)

				    def init2(self, node_name):

				        self.props["parallel_backend"] = "parallelnative"

				    def child_constructor(self):

				        return ImportantConfigNode

				class LibTorchConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "BUILD_TEST_LIBTORCH=" + str(label)

				    def init2(self, node_name):

				        self.props["is_libtorch"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

									
										29

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -1,5 +1,3 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

				from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA

				@ -15,7 +13,7 @@ DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_VERSION = 347

				DOCKER_IMAGE_VERSION = 405

				@dataclass

				@ -33,8 +31,9 @@ class Conf:

				    gpu_resource: Optional[str] = None

				    dependent_tests: List = field(default_factory=list)

				    parent_build: Optional['Conf'] = None

				    is_namedtensor: bool = False

				    is_libtorch: bool = False

				    is_important: bool = False

				    parallel_backend: Optional[str] = None

				    # TODO: Eliminate the special casing for docker paths

				    # In the short term, we *will* need to support special casing as docker images are merged for caffe2 and pytorch

				@ -47,8 +46,10 @@ class Conf:

				        leading.append("pytorch")

				        if self.is_xla and not for_docker:

				            leading.append("xla")

				        if self.is_namedtensor and not for_docker:

				            leading.append("namedtensor")

				        if self.is_libtorch and not for_docker:

				            leading.append("libtorch")

				        if self.parallel_backend is not None and not for_docker:

				            leading.append(self.parallel_backend)

				        cuda_parms = []

				        if self.cuda_version:

				@ -159,7 +160,7 @@ def gen_dependent_configs(xenial_parent_config):

				        configs.append(c)

				    for x in ["pytorch_short_perf_test_gpu", "pytorch_python_doc_push", "pytorch_cpp_doc_push"]:

				    for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				    return configs

				@ -209,6 +210,7 @@ def instantiate_configs():

				            android_abi = fc.find_prop("android_abi")

				            parms_list_ignored_for_docker_image.append(android_abi)

				            restrict_phases = ["build"]

				            fc.props["is_important"] = True

				        elif compiler_name:

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				@ -224,8 +226,9 @@ def instantiate_configs():

				            # TODO The gcc version is orthogonal to CUDA version?

				            parms_list.append("gcc7")

				        is_namedtensor = fc.find_prop("is_namedtensor") or False

				        is_libtorch = fc.find_prop("is_libtorch") or False

				        is_important = fc.find_prop("is_important") or False

				        parallel_backend = fc.find_prop("parallel_backend") or None

				        gpu_resource = None

				        if cuda_version and cuda_version != "10":

				@ -240,22 +243,24 @@ def instantiate_configs():

				            is_xla,

				            restrict_phases,

				            gpu_resource,

				            is_namedtensor=is_namedtensor,

				            is_libtorch=is_libtorch,

				            is_important=is_important,

				            parallel_backend=parallel_backend,

				        )

				        if cuda_version == "9" and python_version == "3.6":

				        if cuda_version == "9" and python_version == "3.6" and not is_libtorch:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (compiler_name == "gcc"

				                and compiler_version == "5.4"

				                and not is_namedtensor):

				                and not is_libtorch

				                and parallel_backend is None):

				            bc_breaking_check = Conf(

				                "backward-compatibility-check",

				                [],

				                is_xla=False,

				                restrict_phases=["test"],

				                is_namedtensor=False,

				                is_libtorch=False,

				                is_important=True,

				                parent_build=c,

				            )

									
										3

.circleci/cimodel/lib/conf_tree.py
									
												View File
												
				@ -1,6 +1,3 @@

				#!/usr/bin/env python3

				from dataclasses import dataclass, field

				from typing import Optional, Dict

									
										3

.circleci/cimodel/lib/miniutils.py
									
												View File
												
				@ -1,6 +1,3 @@

				#!/usr/bin/env python3

				def quote(s):

				    return sandwich('"', s)

									
										3

.circleci/cimodel/lib/miniyaml.py
									
												View File
												
				@ -1,6 +1,3 @@

				#!/usr/bin/env python3

				from collections import OrderedDict

									
										2

.circleci/cimodel/lib/visualization.py
									
												View File
												
				@ -1,5 +1,3 @@

				#!/usr/bin/env python3

				"""

				This module encapsulates dependencies on pygraphviz

				"""

2431

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										19

.circleci/docker/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,19 @@

				# Docker images for Jenkins

				This directory contains everything needed to build the Docker images

				that are used in our CI

				The Dockerfiles located in subdirectories are parameterized to

				conditionally run build stages depending on build arguments passed to

				`docker build`. This lets us use only a few Dockerfiles for many

				images. The different configurations are identified by a freeform

				string that we call a _build environment_. This string is persisted in

				each image as the `BUILD_ENVIRONMENT` environment variable.

				See `build.sh` for valid build environments (it's the giant switch).

				## Contents

				* `build.sh` -- dispatch script to launch all builds

				* `common` -- scripts used to execute individual Docker build stages

				* `ubuntu-cuda` -- Dockerfile for Ubuntu image with CUDA support for nvidia-docker

									
										1

.circleci/docker/android/AndroidManifest.xml
									
										Normal file
									
												View File
												
				@ -0,0 +1 @@

				<manifest package="org.pytorch.deps" />

									
										68

.circleci/docker/android/build.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,68 @@

				buildscript {

				    ext {

				        minSdkVersion = 21

				        targetSdkVersion = 28

				        compileSdkVersion = 28

				        buildToolsVersion = '28.0.3'

				        coreVersion = "1.2.0"

				        extJUnitVersion = "1.1.1"

				        runnerVersion = "1.2.0"

				        rulesVersion = "1.2.0"

				        junitVersion = "4.12"

				    }

				    repositories {

				        google()

				        mavenLocal()

				        mavenCentral()

				        jcenter()

				    }

				    dependencies {

				        classpath 'com.android.tools.build:gradle:3.3.2'

				        classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:1.8.0"

				        classpath "com.github.dcendents:android-maven-gradle-plugin:2.1"

				        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"

				    }

				}

				repositories {

				    google()

				    jcenter()

				}

				apply plugin: 'com.android.library'

				android {

				    compileSdkVersion rootProject.compileSdkVersion

				    buildToolsVersion rootProject.buildToolsVersion

				    defaultConfig {

				        minSdkVersion minSdkVersion

				        targetSdkVersion targetSdkVersion

				    }

				    sourceSets {

				        main {

				            manifest.srcFile 'AndroidManifest.xml'

				        }

				    }

				}

				dependencies {

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'androidx.appcompat:appcompat:1.0.0'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'

				    implementation 'com.google.code.findbugs:jsr305:3.0.1'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    implementation 'junit:junit:' + rootProject.junitVersion

				    implementation 'androidx.test:core:' + rootProject.coreVersion

				    implementation 'junit:junit:' + rootProject.junitVersion

				    implementation 'androidx.test:core:' + rootProject.coreVersion

				    implementation 'androidx.test.ext:junit:' + rootProject.extJUnitVersion

				    implementation 'androidx.test:rules:' + rootProject.rulesVersion

				    implementation 'androidx.test:runner:' + rootProject.runnerVersion

				}

									
										275

.circleci/docker/build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,275 @@

				#!/bin/bash

				set -ex

				image="$1"

				shift

				if [ -z "${image}" ]; then

				  echo "Usage: $0 IMAGE"

				  exit 1

				fi

				# TODO: Generalize

				OS="ubuntu"

				DOCKERFILE="${OS}/Dockerfile"

				if [[ "$image" == *-cuda* ]]; then

				  DOCKERFILE="${OS}-cuda/Dockerfile"

				fi

				if [[ "$image" == *-trusty* ]]; then

				  UBUNTU_VERSION=14.04

				elif [[ "$image" == *-xenial* ]]; then

				  UBUNTU_VERSION=16.04

				elif [[ "$image" == *-artful* ]]; then

				  UBUNTU_VERSION=17.10

				elif [[ "$image" == *-bionic* ]]; then

				  UBUNTU_VERSION=18.04

				fi

				# It's annoying to rename jobs every time you want to rewrite a

				# configuration, so we hardcode everything here rather than do it

				# from scratch

				case "$image" in

				  pytorch-linux-bionic-clang9-thrift-llvmdev)

				    CLANG_VERSION=9

				    THRIFT=yes

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ;;

				  pytorch-linux-xenial-py2.7.9)

				    TRAVIS_PYTHON_VERSION=2.7.9

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py2.7)

				    TRAVIS_PYTHON_VERSION=2.7

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3.5)

				    TRAVIS_PYTHON_VERSION=3.5

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc4.8)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=4.8

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc5.4)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7)

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-pynightly)

				    TRAVIS_PYTHON_VERSION=nightly

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py2)

				    CUDA_VERSION=8.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=2.7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py3)

				    CUDA_VERSION=8.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9-cudnn7-py2)

				    CUDA_VERSION=9.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=2.7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9-cudnn7-py3)

				    CUDA_VERSION=9.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=9.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.1

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				    GRADLE_VERSION=4.10.3

				    CMAKE_VERSION=3.7.0

				    NINJA_VERSION=1.9.0

				    ;;

				  pytorch-linux-xenial-py3.6-clang7)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				esac

				# Set Jenkins UID and GID if running Jenkins

				if [ -n "${JENKINS:-}" ]; then

				  JENKINS_UID=$(id -u jenkins)

				  JENKINS_GID=$(id -g jenkins)

				fi

				tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"

				# Build image

				docker build \

				       --no-cache \

				       --build-arg "BUILD_ENVIRONMENT=${image}" \

				       --build-arg "PROTOBUF=${PROTOBUF:-}" \

				       --build-arg "THRIFT=${THRIFT:-}" \

				       --build-arg "LLVMDEV=${LLVMDEV:-}" \

				       --build-arg "DB=${DB:-}" \

				       --build-arg "VISION=${VISION:-}" \

				       --build-arg "EC2=${EC2:-}" \

				       --build-arg "JENKINS=${JENKINS:-}" \

				       --build-arg "JENKINS_UID=${JENKINS_UID:-}" \

				       --build-arg "JENKINS_GID=${JENKINS_GID:-}" \

				       --build-arg "UBUNTU_VERSION=${UBUNTU_VERSION}" \

				       --build-arg "CLANG_VERSION=${CLANG_VERSION}" \

				       --build-arg "ANACONDA_PYTHON_VERSION=${ANACONDA_PYTHON_VERSION}" \

				       --build-arg "TRAVIS_PYTHON_VERSION=${TRAVIS_PYTHON_VERSION}" \

				       --build-arg "GCC_VERSION=${GCC_VERSION}" \

				       --build-arg "CUDA_VERSION=${CUDA_VERSION}" \

				       --build-arg "CUDNN_VERSION=${CUDNN_VERSION}" \

				       --build-arg "ANDROID=${ANDROID}" \

				       --build-arg "ANDROID_NDK=${ANDROID_NDK_VERSION}" \

				       --build-arg "GRADLE_VERSION=${GRADLE_VERSION}" \

				       --build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \

				       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \

				       --build-arg "KATEX=${KATEX:-}" \

				       -f $(dirname ${DOCKERFILE})/Dockerfile \

				       -t "$tmp_tag" \

				       "$@" \

				       .

				function drun() {

				  docker run --rm "$tmp_tag" $*

				}

				if [[ "$OS" == "ubuntu" ]]; then

				  if !(drun lsb_release -a 2>&1 | grep -qF Ubuntu); then

				    echo "OS=ubuntu, but:"

				    drun lsb_release -a

				    exit 1

				  fi

				  if !(drun lsb_release -a 2>&1 | grep -qF "$UBUNTU_VERSION"); then

				    echo "UBUNTU_VERSION=$UBUNTU_VERSION, but:"

				    drun lsb_release -a

				    exit 1

				  fi

				fi

				if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]]; then

				    if !(drun python --version 2>&1 | grep -qF "Python $TRAVIS_PYTHON_VERSION"); then

				      echo "TRAVIS_PYTHON_VERSION=$TRAVIS_PYTHON_VERSION, but:"

				      drun python --version

				      exit 1

				    fi

				  else

				    echo "Please manually check nightly is OK:"

				    drun python --version

				  fi

				fi

				if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  if !(drun python --version 2>&1 | grep -qF "Python $ANACONDA_PYTHON_VERSION"); then

				    echo "ANACONDA_PYTHON_VERSION=$ANACONDA_PYTHON_VERSION, but:"

				    drun python --version

				    exit 1

				  fi

				fi

				if [ -n "$GCC_VERSION" ]; then

				  if !(drun gcc --version 2>&1 | grep -q " $GCC_VERSION\\W"); then

				    echo "GCC_VERSION=$GCC_VERSION, but:"

				    drun gcc --version

				    exit 1

				  fi

				fi

				if [ -n "$CLANG_VERSION" ]; then

				  if !(drun clang --version 2>&1 | grep -qF "clang version $CLANG_VERSION"); then

				    echo "CLANG_VERSION=$CLANG_VERSION, but:"

				    drun clang --version

				    exit 1

				  fi

				fi

				if [ -n "$KATEX" ]; then

				  if !(drun katex --version); then

				    echo "KATEX=$KATEX, but:"

				    drun katex --version

				    exit 1

				  fi

				fi

									
										49

.circleci/docker/build_docker.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,49 @@

				#!/bin/bash

				set -ex

				retry () {

				    $*  || (sleep 1 && $*) || (sleep 2 && $*)

				}

				# If UPSTREAM_BUILD_ID is set (see trigger job), then we can

				# use it to tag this build with the same ID used to tag all other

				# base image builds. Also, we can try and pull the previous

				# image first, to avoid rebuilding layers that haven't changed.

				#until we find a way to reliably reuse previous build, this last_tag is not in use

				# last_tag="$(( CIRCLE_BUILD_NUM - 1 ))"

				tag="${CIRCLE_WORKFLOW_ID}"

				registry="308535385114.dkr.ecr.us-east-1.amazonaws.com"

				image="${registry}/pytorch/${IMAGE_NAME}"

				login() {

				  aws ecr get-authorization-token --region us-east-1 --output text --query 'authorizationData[].authorizationToken' |

				    base64 -d |

				    cut -d: -f2 |

				    docker login -u AWS --password-stdin "$1"

				}

				# Retry on timeouts (can happen on job stampede).

				retry login "${registry}"

				# Logout on exit

				trap "docker logout ${registry}" EXIT

				# export EC2=1

				# export JENKINS=1

				# Try to pull the previous image (perhaps we can reuse some layers)

				# if [ -n "${last_tag}" ]; then

				#   docker pull "${image}:${last_tag}" || true

				# fi

				# Build new image

				./build.sh ${IMAGE_NAME} -t "${image}:${tag}"

				docker push "${image}:${tag}"

				docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

									
										129

.circleci/docker/common/install_android.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,129 @@

				#!/bin/bash

				set -ex

				[ -n "${ANDROID_NDK}" ]

				apt-get update

				apt-get install -y --no-install-recommends autotools-dev autoconf unzip

				apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				pushd /tmp

				curl -Os https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				popd

				_ndk_dir=/opt/ndk

				mkdir -p "$_ndk_dir"

				unzip -qo /tmp/android*.zip -d "$_ndk_dir"

				_versioned_dir=$(find "$_ndk_dir/" -mindepth 1 -maxdepth 1 -type d)

				mv "$_versioned_dir"/* "$_ndk_dir"/

				rmdir "$_versioned_dir"

				rm -rf /tmp/*

				# Install OpenJDK

				# https://hub.docker.com/r/picoded/ubuntu-openjdk-8-jdk/dockerfile/

				sudo apt-get update && \

				    apt-get install -y openjdk-8-jdk && \

				    apt-get install -y ant && \

				    apt-get clean && \

				    rm -rf /var/lib/apt/lists/* && \

				    rm -rf /var/cache/oracle-jdk8-installer;

				# Fix certificate issues, found as of

				# https://bugs.launchpad.net/ubuntu/+source/ca-certificates-java/+bug/983302

				sudo apt-get update && \

				    apt-get install -y ca-certificates-java && \

				    apt-get clean && \

				    update-ca-certificates -f && \

				    rm -rf /var/lib/apt/lists/* && \

				    rm -rf /var/cache/oracle-jdk8-installer;

				export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/

				# Installing android sdk

				# https://github.com/circleci/circleci-images/blob/staging/android/Dockerfile.m4

				_sdk_version=sdk-tools-linux-3859397.zip

				_android_home=/opt/android/sdk

				rm -rf $_android_home

				sudo mkdir -p $_android_home

				curl --silent --show-error --location --fail --retry 3 --output /tmp/$_sdk_version https://dl.google.com/android/repository/$_sdk_version

				sudo unzip -q /tmp/$_sdk_version -d $_android_home

				rm /tmp/$_sdk_version

				sudo chmod -R 777 $_android_home

				export ANDROID_HOME=$_android_home

				export ADB_INSTALL_TIMEOUT=120

				export PATH="${ANDROID_HOME}/emulator:${ANDROID_HOME}/tools:${ANDROID_HOME}/tools/bin:${ANDROID_HOME}/platform-tools:${PATH}"

				echo "PATH:${PATH}"

				alias sdkmanager="$ANDROID_HOME/tools/bin/sdkmanager"

				sudo mkdir ~/.android && sudo echo '### User Sources for Android SDK Manager' > ~/.android/repositories.cfg

				sudo chmod -R 777 ~/.android

				yes | sdkmanager --licenses

				yes | sdkmanager --update

				sdkmanager \

				  "tools" \

				  "platform-tools" \

				  "emulator"

				sdkmanager \

				  "build-tools;28.0.3" \

				  "build-tools;29.0.2"

				sdkmanager \

				  "platforms;android-28" \

				  "platforms;android-29"

				sdkmanager --list

				# Installing Gradle

				echo "GRADLE_VERSION:${GRADLE_VERSION}"

				_gradle_home=/opt/gradle

				sudo rm -rf $gradle_home

				sudo mkdir -p $_gradle_home

				wget --no-verbose --output-document=/tmp/gradle.zip \

				"https://services.gradle.org/distributions/gradle-${GRADLE_VERSION}-bin.zip"

				sudo unzip -q /tmp/gradle.zip -d $_gradle_home

				rm /tmp/gradle.zip

				sudo chmod -R 777 $_gradle_home

				export GRADLE_HOME=$_gradle_home/gradle-$GRADLE_VERSION

				alias gradle="${GRADLE_HOME}/bin/gradle"

				export PATH="${GRADLE_HOME}/bin/:${PATH}"

				echo "PATH:${PATH}"

				gradle --version

				mkdir /var/lib/jenkins/gradledeps

				cp build.gradle /var/lib/jenkins/gradledeps

				cp AndroidManifest.xml /var/lib/jenkins/gradledeps

				pushd /var/lib/jenkins

				export GRADLE_LOCAL_PROPERTIES=gradledeps/local.properties

				rm -f $GRADLE_LOCAL_PROPERTIES

				echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				chown -R jenkins /var/lib/jenkins/gradledeps

				chgrp -R jenkins /var/lib/jenkins/gradledeps

				sudo -H -u jenkins $GRADLE_HOME/bin/gradle -p /var/lib/jenkins/gradledeps -g /var/lib/jenkins/.gradle --refresh-dependencies --debug --stacktrace assemble

				chown -R jenkins /var/lib/jenkins/.gradle

				chgrp -R jenkins /var/lib/jenkins/.gradle

				popd

				rm -rf /var/lib/jenkins/.gradle/daemon

									
										75

.circleci/docker/common/install_base.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,75 @@

				#!/bin/bash

				set -ex

				if [[ "$UBUNTU_VERSION" == "14.04" ]]; then

				  # cmake 2 is too old

				  cmake3=cmake3

				else

				  cmake3=cmake

				fi

				if [[ "$UBUNTU_VERSION" == "18.04" ]]; then

				  cmake3="cmake=3.10*"

				else

				  cmake3="${cmake3}=3.5*"

				fi

				# Install common dependencies

				apt-get update

				# TODO: Some of these may not be necessary

				# TODO: libiomp also gets installed by conda, aka there's a conflict

				ccache_deps="asciidoc docbook-xml docbook-xsl xsltproc"

				numpy_deps="gfortran"

				apt-get install -y --no-install-recommends \

				  $ccache_deps \

				  $numpy_deps \

				  ${cmake3} \

				  apt-transport-https \

				  autoconf \

				  automake \

				  build-essential \

				  ca-certificates \

				  curl \

				  git \

				  libatlas-base-dev \

				  libc6-dbg \

				  libiomp-dev \

				  libyaml-dev \

				  libz-dev \

				  libjpeg-dev \

				  libasound2-dev \

				  libsndfile-dev \

				  python \

				  python-dev \

				  python-setuptools \

				  python-wheel \

				  software-properties-common \

				  sudo \

				  wget \

				  vim

				# Install Valgrind separately since the apt-get version is too old.

				mkdir valgrind_build && cd valgrind_build

				if ! wget http://valgrind.org/downloads/valgrind-3.14.0.tar.bz2

				then

				  wget https://sourceware.org/ftp/valgrind/valgrind-3.14.0.tar.bz2

				fi

				tar -xjf valgrind-3.14.0.tar.bz2

				cd valgrind-3.14.0

				./configure --prefix=/usr/local

				make

				sudo make install

				cd ../../

				rm -rf valgrind_build

				alias valgrind="/usr/local/bin/valgrind"

				# TODO: THIS IS A HACK!!!

				# distributed nccl(2) tests are a bit busted, see https://github.com/pytorch/pytorch/issues/5877

				if dpkg -s libnccl-dev; then

				  apt-get remove -y libnccl-dev libnccl2 --allow-change-held-packages

				fi

				# Cleanup package manager

				apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

									
										35

.circleci/docker/common/install_cache.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,35 @@

				#!/bin/bash

				set -ex

				mkdir -p /opt/cache/bin

				mkdir -p /opt/cache/lib

				sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment

				export PATH="/opt/cache/bin:$PATH"

				# Setup compiler cache

				curl https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				chmod a+x /opt/cache/bin/sccache

				function write_sccache_stub() {

				  printf "#!/bin/sh\nexec sccache $(which $1) \$*" > "/opt/cache/bin/$1"

				  chmod a+x "/opt/cache/bin/$1"

				}

				write_sccache_stub cc

				write_sccache_stub c++

				write_sccache_stub gcc

				write_sccache_stub g++

				write_sccache_stub clang

				write_sccache_stub clang++

				if [ -n "$CUDA_VERSION" ]; then

				  # TODO: This is a workaround for the fact that PyTorch's FindCUDA

				  # implementation cannot find nvcc if it is setup this way, because it

				  # appears to search for the nvcc in PATH, and use its path to infer

				  # where CUDA is installed.  Instead, we install an nvcc symlink outside

				  # of the PATH, and set CUDA_NVCC_EXECUTABLE so that we make use of it.

				  printf "#!/bin/sh\nexec sccache $(which nvcc) \"\$@\"" > /opt/cache/lib/nvcc

				  chmod a+x /opt/cache/lib/nvcc

				fi

									
										44

.circleci/docker/common/install_clang.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,44 @@

				#!/bin/bash

				set -ex

				if [ -n "$CLANG_VERSION" ]; then

				  if [[ $CLANG_VERSION == 7 && $UBUNTU_VERSION == 16.04 ]]; then

				    wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -

				    sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-7 main"

				  elif [[ $CLANG_VERSION == 9 && $UBUNTU_VERSION == 18.04 ]]; then

				    sudo apt-get update

				    # gpg-agent is not available by default on 18.04

				    sudo apt-get install  -y --no-install-recommends gpg-agent

				    wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add  -

				    apt-add-repository "deb http://apt.llvm.org/bionic/ llvm-toolchain-bionic-${CLANG_VERSION} main"

				  fi

				  sudo apt-get update

				  apt-get install -y --no-install-recommends clang-"$CLANG_VERSION"

				  apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"

				  # Install dev version of LLVM.

				  if [ -n "$LLVMDEV" ]; then

				    sudo apt-get install -y --no-install-recommends llvm-"$CLANG_VERSION"-dev

				  fi

				  # Use update-alternatives to make this version the default

				  # TODO: Decide if overriding gcc as well is a good idea

				  # update-alternatives --install /usr/bin/gcc gcc /usr/bin/clang-"$CLANG_VERSION" 50

				  # update-alternatives --install /usr/bin/g++ g++ /usr/bin/clang++-"$CLANG_VERSION" 50

				  update-alternatives --install /usr/bin/clang clang /usr/bin/clang-"$CLANG_VERSION" 50

				  update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-"$CLANG_VERSION" 50

				  # clang's packaging is a little messed up (the runtime libs aren't

				  # added into the linker path), so give it a little help

				  clang_lib=("/usr/lib/llvm-$CLANG_VERSION/lib/clang/"*"/lib/linux")

				  echo "$clang_lib" > /etc/ld.so.conf.d/clang.conf

				  ldconfig

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				fi

									
										16

.circleci/docker/common/install_cmake.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,16 @@

				#!/bin/bash

				set -ex

				[ -n "$CMAKE_VERSION" ]

				# Turn 3.6.3 into v3.6

				path=$(echo "${CMAKE_VERSION}" | sed -e 's/\([0-9].[0-9]\+\).*/v\1/')

				file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

				# Download and install specific CMake version in /usr/local

				pushd /tmp

				curl -Os "https://cmake.org/files/${path}/${file}"

				tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz

				rm -f cmake-*.tar.gz

				popd

									
										94

.circleci/docker/common/install_conda.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,94 @@

				#!/bin/bash

				set -ex

				# Optionally install conda

				if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  BASE_URL="https://repo.continuum.io/miniconda"

				  MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)

				  case "$MAJOR_PYTHON_VERSION" in

				    2)

				      CONDA_FILE="Miniconda2-latest-Linux-x86_64.sh"

				    ;;

				    3)

				      CONDA_FILE="Miniconda3-latest-Linux-x86_64.sh"

				    ;;

				    *)

				      echo "Unsupported ANACONDA_PYTHON_VERSION: $ANACONDA_PYTHON_VERSION"

				      exit 1

				      ;;

				  esac

				  mkdir /opt/conda

				  chown jenkins:jenkins /opt/conda

				  as_jenkins() {

				    # NB: unsetting the environment variables works around a conda bug

				    # https://github.com/conda/conda/issues/6576

				    # NB: Pass on PATH and LD_LIBRARY_PATH to sudo invocation

				    # NB: This must be run from a directory that jenkins has access to,

				    # works around https://github.com/conda/conda-package-handling/pull/34

				    sudo -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*

				  }

				  pushd /tmp

				  wget -q "${BASE_URL}/${CONDA_FILE}"

				  chmod +x "${CONDA_FILE}"

				  as_jenkins ./"${CONDA_FILE}" -b -f -p "/opt/conda"

				  popd

				  # NB: Don't do this, rely on the rpath to get it right

				  #echo "/opt/conda/lib" > /etc/ld.so.conf.d/conda-python.conf

				  #ldconfig

				  sed -e 's|PATH="\(.*\)"|PATH="/opt/conda/bin:\1"|g' -i /etc/environment

				  export PATH="/opt/conda/bin:$PATH"

				  # Ensure we run conda in a directory that jenkins has write access to

				  pushd /opt/conda

				  # Track latest conda update

				  as_jenkins conda update -n base conda

				  # Install correct Python version

				  as_jenkins conda install python="$ANACONDA_PYTHON_VERSION"

				  conda_install() {

				    # Ensure that the install command don't upgrade/downgrade Python

				    # This should be called as

				    #   conda_install pkg1 pkg2 ... [-c channel]

				    as_jenkins conda install -q -y python="$ANACONDA_PYTHON_VERSION" $*

				  }

				  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six

				  if [[ "$CUDA_VERSION" == 8.0* ]]; then

				    conda_install magma-cuda80 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.0* ]]; then

				    conda_install magma-cuda90 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.1* ]]; then

				    conda_install magma-cuda91 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.2* ]]; then

				    conda_install magma-cuda92 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.0* ]]; then

				    conda_install magma-cuda100 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

				    conda_install magma-cuda101 -c pytorch

				  fi

				  # TODO: This isn't working atm

				  conda_install nnpack -c killeent

				  # Install some other packages

				  # TODO: Why is scipy pinned

				  # numba & llvmlite is pinned because of https://github.com/numba/numba/issues/4368

				  # scikit-learn is pinned because of

				  # https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5

				  # only)

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.43.1 llvmlite==0.28.0

				  popd

				fi

									
										61

.circleci/docker/common/install_db.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,61 @@

				#!/bin/bash

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

				          libhiredis-dev \

				          libleveldb-dev \

				          liblmdb-dev \

				          libsnappy-dev

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  # Need EPEL for many packages we depend on.

				  # See http://fedoraproject.org/wiki/EPEL

				  yum --enablerepo=extras install -y epel-release

				  yum install -y \

				      hiredis-devel \

				      leveldb-devel \

				      lmdb-devel \

				      snappy-devel

				  # Cleanup

				  yum clean all

				  rm -rf /var/cache/yum

				  rm -rf /var/lib/yum/yumdb

				  rm -rf /var/lib/yum/history

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

									
										19

.circleci/docker/common/install_gcc.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,19 @@

				#!/bin/bash

				set -ex

				if [ -n "$GCC_VERSION" ]; then

				  # Need the official toolchain repo to get alternate packages

				  add-apt-repository ppa:ubuntu-toolchain-r/test

				  apt-get update

				  apt-get install -y g++-$GCC_VERSION

				  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				fi

									
										6

.circleci/docker/common/install_jni.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,6 @@

				#!/bin/bash

				set -ex

				mkdir -p /usr/local/include

				cp jni.h /usr/local/include

									
										20

.circleci/docker/common/install_katex.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				#!/bin/bash

				set -ex

				if [ -n "$KATEX" ]; then

				  curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash -

				  sudo apt-get install -y nodejs

				  curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -

				  echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list

				  apt-get update

				  apt-get install -y --no-install-recommends yarn

				  yarn global add katex --prefix /usr/local

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				fi

									
										13

.circleci/docker/common/install_ninja.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				#!/bin/bash

				set -ex

				[ -n "$NINJA_VERSION" ]

				url="https://github.com/ninja-build/ninja/releases/download/v${NINJA_VERSION}/ninja-linux.zip"

				pushd /tmp

				wget --no-verbose --output-document=ninja-linux.zip "$url"

				unzip ninja-linux.zip -d /usr/local/bin

				rm -f ninja-linux.zip

				popd

									
										56

.circleci/docker/common/install_protobuf.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,56 @@

				#!/bin/bash

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  # Ubuntu 14.04 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we install that here if on 14.04

				  # Ubuntu 14.04 also has cmake 2.8.12 as the default option, so we will

				  # install cmake3 here and use cmake3.

				  apt-get update

				  if [[ "$UBUNTU_VERSION" == 14.04 ]]; then

				    apt-get install -y --no-install-recommends cmake3

				    install_protobuf_26

				  else

				    apt-get install -y --no-install-recommends \

				            libprotobuf-dev \

				            protobuf-compiler

				  fi

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  # Centos7 ships with protobuf 2.5, but ONNX needs protobuf >= 2.6

				  # so we always install install that here

				  install_protobuf_26

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

									
										14

.circleci/docker/common/install_thrift.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,14 @@

				apt-get update

				apt-get install -y sudo wget libboost-dev libboost-test-dev libboost-program-options-dev libboost-filesystem-dev libboost-thread-dev libevent-dev automake libtool flex bison pkg-config g++ libssl-dev

				wget https://www-us.apache.org/dist/thrift/0.12.0/thrift-0.12.0.tar.gz

				tar -xvf thrift-0.12.0.tar.gz

				cd thrift-0.12.0

				for file in ./compiler/cpp/Makefile*; do

				  sed -i 's/\-Werror//' $file

				done

				./bootstrap.sh

				./configure --without-php --without-java --without-python --without-nodejs --without-go --without-ruby

				sudo make

				sudo make install

				cd ..

				rm thrift-0.12.0.tar.gz

									
										94

.circleci/docker/common/install_travis_python.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,94 @@

				#!/bin/bash

				set -ex

				as_jenkins() {

				  # NB: Preserve PATH and LD_LIBRARY_PATH changes

				  sudo -H -u jenkins env "PATH=$PATH" "LD_LIBRARY_PATH=$LD_LIBRARY_PATH" $*

				}

				if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  mkdir -p /opt/python

				  chown jenkins:jenkins /opt/python

				  # Download Python binary from Travis

				  pushd tmp

				  as_jenkins wget --quiet https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  # NB: The tarball also comes with /home/travis virtualenv that we

				  # don't care about.  (Maybe we should, but we've worked around the

				  # "how do I install to python" issue by making this entire directory

				  # user-writable "lol")

				  # NB: Relative ordering of opt/python and flags matters

				  as_jenkins tar xjf python-$TRAVIS_PYTHON_VERSION.tar.bz2 --strip-components=2 --directory /opt/python opt/python

				  popd

				  echo "/opt/python/$TRAVIS_PYTHON_VERSION/lib" > /etc/ld.so.conf.d/travis-python.conf

				  ldconfig

				  sed -e 's|PATH="\(.*\)"|PATH="/opt/python/'"$TRAVIS_PYTHON_VERSION"'/bin:\1"|g' -i /etc/environment

				  export PATH="/opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH"

				  python --version

				  pip --version

				  # Install pip from source.

				  # The python-pip package on Ubuntu Trusty is old

				  # and upon install numpy doesn't use the binary

				  # distribution, and fails to compile it from source.

				  pushd tmp

				  as_jenkins curl -L -O https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz

				  as_jenkins tar zxf pip-9.0.1.tar.gz

				  pushd pip-9.0.1

				  as_jenkins python setup.py install

				  popd

				  rm -rf pip-9.0.1*

				  popd

				  # Install pip packages

				  as_jenkins pip install --upgrade pip

				  pip --version

				  if [[ "$TRAVIS_PYTHON_VERSION" == nightly ]]; then

				      # These two packages have broken Cythonizations uploaded

				      # to PyPi, see:

				      #

				      #  - https://github.com/numpy/numpy/issues/10500

				      #  - https://github.com/yaml/pyyaml/issues/117

				      #

				      # Furthermore, the released version of Cython does not

				      # have these issues fixed.

				      #

				      # While we are waiting on fixes for these, we build

				      # from Git for now.  Feel free to delete this conditional

				      # branch if things start working again (you may need

				      # to do this if these packages regress on Git HEAD.)

				      as_jenkins pip install git+https://github.com/cython/cython.git

				      as_jenkins pip install git+https://github.com/numpy/numpy.git

				      as_jenkins pip install git+https://github.com/yaml/pyyaml.git

				  else

				      as_jenkins pip install numpy pyyaml

				  fi

				  as_jenkins pip install \

				      future \

				      hypothesis \

				      protobuf \

				      pytest \

				      pillow \

				      typing

				  as_jenkins pip install mkl mkl-devel

				  # SciPy does not support Python 3.7 or Python 2.7.9

				  if [[ "$TRAVIS_PYTHON_VERSION" != nightly ]] && [[ "$TRAVIS_PYTHON_VERSION" != "2.7.9" ]]; then

				      as_jenkins pip install scipy==1.1.0 scikit-image librosa>=0.6.2

				  fi

				  # Install psutil for dataloader tests

				  as_jenkins pip install psutil

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				fi

									
										20

.circleci/docker/common/install_user.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,20 @@

				#!/bin/bash

				set -ex

				# Mirror jenkins user in container

				echo "jenkins:x:1014:1014::/var/lib/jenkins:" >> /etc/passwd

				echo "jenkins:x:1014:" >> /etc/group

				# Create $HOME

				mkdir -p /var/lib/jenkins

				chown jenkins:jenkins /var/lib/jenkins

				mkdir -p /var/lib/jenkins/.ccache

				chown jenkins:jenkins /var/lib/jenkins/.ccache

				# Allow writing to /usr/local (for make install)

				chown jenkins:jenkins /usr/local

				# Allow sudo

				# TODO: Maybe we shouldn't

				echo 'jenkins ALL=(ALL) NOPASSWD:ALL' > /etc/sudoers.d/jenkins

									
										57

.circleci/docker/common/install_vision.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,57 @@

				#!/bin/bash

				set -ex

				# This function installs protobuf 2.6

				install_protobuf_26() {

				  pb_dir="/usr/temp_pb_install_dir"

				  mkdir -p $pb_dir

				  # On the nvidia/cuda:9-cudnn7-devel-centos7 image we need this symlink or

				  # else it will fail with

				  #   g++: error: ./../lib64/crti.o: No such file or directory

				  ln -s /usr/lib64 "$pb_dir/lib64"

				  curl -LO "https://github.com/google/protobuf/releases/download/v2.6.1/protobuf-2.6.1.tar.gz"

				  tar -xvz -C "$pb_dir" --strip-components 1 -f protobuf-2.6.1.tar.gz

				  pushd "$pb_dir" && ./configure && make && make check && sudo make install && sudo ldconfig

				  popd

				  rm -rf $pb_dir

				}

				install_ubuntu() {

				  apt-get update

				  apt-get install -y --no-install-recommends \

				          libopencv-dev \

				          libavcodec-dev

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  # Need EPEL for many packages we depend on.

				  # See http://fedoraproject.org/wiki/EPEL

				  yum --enablerepo=extras install -y epel-release

				  yum install -y \

				      opencv-devel \

				      ffmpeg-devel

				  # Cleanup

				  yum clean all

				  rm -rf /var/cache/yum

				  rm -rf /var/lib/yum/yumdb

				  rm -rf /var/lib/yum/history

				}

				# Install base packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

1143

.circleci/docker/java/jni.h Normal file

View File

File diff suppressed because it is too large Load Diff

									
										85

.circleci/docker/ubuntu-cuda/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,85 @@

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION}

				ARG UBUNTU_VERSION

				ARG CUDA_VERSION

				ARG CUDNN_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Install user

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install katex

				ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				RUN bash ./install_conda.sh && rm install_conda.sh

				# Install gcc

				ARG GCC_VERSION

				ADD ./common/install_gcc.sh install_gcc.sh

				RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				ADD ./common/install_protobuf.sh install_protobuf.sh

				RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi

				RUN rm install_protobuf.sh

				ENV INSTALLED_PROTOBUF ${PROTOBUF}

				# (optional) Install database packages like LMDB and LevelDB

				ARG DB

				ADD ./common/install_db.sh install_db.sh

				RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi

				RUN rm install_db.sh

				ENV INSTALLED_DB ${DB}

				# (optional) Install vision packages like OpenCV and ffmpeg

				ARG VISION

				ADD ./common/install_vision.sh install_vision.sh

				RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi

				RUN rm install_vision.sh

				ENV INSTALLED_VISION ${VISION}

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				ENV CUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc

				# Add jni.h for java host build

				ADD ./common/install_jni.sh install_jni.sh

				ADD ./java/jni.h jni.h

				RUN bash ./install_jni.sh && rm install_jni.sh

				# Include BUILD_ENVIRONMENT environment variable in image

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# AWS specific CUDA build guidance

				ENV TORCH_CUDA_ARCH_LIST Maxwell

				ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				USER jenkins

				CMD ["bash"]

									
										114

.circleci/docker/ubuntu/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,114 @@

				ARG UBUNTU_VERSION

				FROM ubuntu:${UBUNTU_VERSION}

				ARG UBUNTU_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Install clang

				ARG LLVMDEV

				ARG CLANG_VERSION

				ADD ./common/install_clang.sh install_clang.sh

				RUN bash ./install_clang.sh && rm install_clang.sh

				# (optional) Install thrift.

				ARG THRIFT

				ADD ./common/install_thrift.sh install_thrift.sh

				RUN if [ -n "${THRIFT}" ]; then bash ./install_thrift.sh; fi

				RUN rm install_thrift.sh

				ENV INSTALLED_THRIFT ${THRIFT}

				# Install user

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install katex

				ARG KATEX

				ADD ./common/install_katex.sh install_katex.sh

				RUN bash ./install_katex.sh && rm install_katex.sh

				# Install conda

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				RUN bash ./install_conda.sh && rm install_conda.sh

				# Install gcc

				ARG GCC_VERSION

				ADD ./common/install_gcc.sh install_gcc.sh

				RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				ADD ./common/install_protobuf.sh install_protobuf.sh

				RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi

				RUN rm install_protobuf.sh

				ENV INSTALLED_PROTOBUF ${PROTOBUF}

				# (optional) Install database packages like LMDB and LevelDB

				ARG DB

				ADD ./common/install_db.sh install_db.sh

				RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi

				RUN rm install_db.sh

				ENV INSTALLED_DB ${DB}

				# (optional) Install vision packages like OpenCV and ffmpeg

				ARG VISION

				ADD ./common/install_vision.sh install_vision.sh

				RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi

				RUN rm install_vision.sh

				ENV INSTALLED_VISION ${VISION}

				# (optional) Install Android NDK

				ARG ANDROID

				ARG ANDROID_NDK

				ARG GRADLE_VERSION

				ADD ./common/install_android.sh install_android.sh

				ADD ./android/AndroidManifest.xml AndroidManifest.xml

				ADD ./android/build.gradle build.gradle

				RUN if [ -n "${ANDROID}" ]; then bash ./install_android.sh; fi

				RUN rm install_android.sh

				RUN rm AndroidManifest.xml

				RUN rm build.gradle

				ENV INSTALLED_ANDROID ${ANDROID}

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				# (optional) Install non-default Ninja version

				ARG NINJA_VERSION

				ADD ./common/install_ninja.sh install_ninja.sh

				RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi

				RUN rm install_ninja.sh

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				# Add jni.h for java host build

				ADD ./common/install_jni.sh install_jni.sh

				ADD ./java/jni.h jni.h

				RUN bash ./install_jni.sh && rm install_jni.sh

				# Include BUILD_ENVIRONMENT environment variable in image

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				USER jenkins

				CMD ["bash"]

									
										10

.circleci/generate_config_yml.py
									
												View File
												
				@ -88,17 +88,18 @@ YAML_SOURCES = [

				    File("job-specs-custom.yml"),

				    File("binary_update_htmls.yml"),

				    File("binary-build-tests.yml"),

				    File("docker_build_job.yml"),

				    File("workflows.yml"),

				    Listgen(pytorch_build_definitions.get_workflow_jobs, 3),

				    File("workflows-pytorch-macos-builds.yml"),

				    File("workflows-pytorch-android-gradle-build.yml"),

				    File("workflows-pytorch-ios-builds.yml"),

				    File("workflows-pytorch-mobile-builds.yml"),

				    File("workflows-pytorch-ge-config-tests.yml"),

				    Listgen(caffe2_build_definitions.get_workflow_jobs, 3),

				    File("workflows-binary-builds-smoke-subset.yml"),

				    Header("Daily smoke test trigger"),

				    Treegen(binary_build_definitions.add_binary_smoke_test_jobs, 1),

				    Header("Daily binary build trigger"),

				    Treegen(binary_build_definitions.add_binary_build_jobs, 1),

				    Listgen(binary_build_definitions.get_binary_smoke_test_jobs, 3),

				    Listgen(binary_build_definitions.get_binary_build_jobs, 3),

				    File("workflows-nightly-ios-binary-builds.yml"),

				    File("workflows-nightly-android-binary-builds.yml"),

				    Header("Nightly tests"),

				@ -106,6 +107,7 @@ YAML_SOURCES = [

				    File("workflows-nightly-uploads-header.yml"),

				    Listgen(binary_build_definitions.get_nightly_uploads, 3),

				    File("workflows-s3-html.yml"),

				    File("workflows-docker-builder.yml")

				]

									
										4

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -1,8 +1,8 @@

				#!/bin/bash

				set -eux -o pipefail

				set -ex -o pipefail

				echo ""

				echo "PWD: ${PWD}"

				echo "DIR: $(pwd)"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				export TCLLIBPATH="/usr/local/lib"

									
										29

.circleci/scripts/binary_ios_test.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,29 @@

				#!/bin/bash

				set -ex -o pipefail

				echo ""

				echo "DIR: $(pwd)"

				PROJ_ROOT=/Users/distiller/project

				cd ${PROJ_ROOT}/ios/TestApp

				# install fastlane

				sudo gem install bundler && bundle install

				# install certificates

				echo "${IOS_CERT_KEY}" >> cert.txt

				base64 --decode cert.txt -o Certificates.p12

				rm cert.txt

				bundle exec fastlane install_cert

				# install the provisioning profile

				PROFILE=TestApp_CI.mobileprovision

				PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				mkdir -pv "${PROVISIONING_PROFILES}"

				cd "${PROVISIONING_PROFILES}"

				echo "${IOS_SIGN_KEY}" >> cert.txt

				base64 --decode cert.txt -o ${PROFILE}

				rm cert.txt

				# run the ruby build script

				if ! [ -x "$(command -v xcodebuild)" ]; then

				    echo 'Error: xcodebuild is not installed.'

				    exit 1

				fi 

				PROFILE=TestApp_CI

				ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

									
										4

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -1,8 +1,8 @@

				#!/bin/bash

				set -eux -o pipefail

				set -ex -o pipefail

				echo ""

				echo "PWD: $(pwd)"

				echo "DIR: $(pwd)"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				ARTIFACTS_DIR=${WORKSPACE}/ios

									
										2

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -11,6 +11,8 @@ if [[ "$PACKAGE_TYPE" == conda ]]; then

				  source activate testenv >/dev/null

				elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then

				  export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"

				elif [[ "$DESIRED_PYTHON" == 3.8m ]]; then

				  export PATH="/opt/python/cp38-cp38/bin:\$PATH"

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

									
										28

.circleci/scripts/binary_macos_test.sh
									
												View File
												
				@ -5,26 +5,30 @@ source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				pkg="$workdir/final_pkgs/$(ls $workdir/final_pkgs)"

				# Don't test libtorch

				# TODO we should test libtorch

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  exit 0

				fi

				# Create a new test env

				# TODO cut all this out into a separate test job and have an entirely different

				# miniconda

				source deactivate || true

				conda create -qyn test python="$DESIRED_PYTHON"

				source activate test >/dev/null

				if [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  source deactivate || true

				  conda create -qyn test python="$DESIRED_PYTHON"

				  source activate test >/dev/null

				fi

				# Install the package

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  pkg="$(ls $workdir/final_pkgs/*-latest.zip)"

				  unzip "$pkg" -d /tmp

				  cd /tmp/libtorch

				elif [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "$pkg" --offline

				else

				  pip install "$pkg" --no-index --no-dependencies -v

				fi

				# Test

				pushd "$workdir/pytorch"

				$workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  $workdir/builder/check_binary.sh

				else

				  pushd "$workdir/pytorch"

				  $workdir/builder/run_tests.sh "$PACKAGE_TYPE" "$DESIRED_PYTHON" "$DESIRED_CUDA"

				fi

									
										37

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -32,11 +32,11 @@ fi

				export DOCKER_IMAGE=${DOCKER_IMAGE:-}

				if [[ -z "$DOCKER_IMAGE" ]]; then

				  if [[ "$PACKAGE_TYPE" == conda ]]; then

				    export DOCKER_IMAGE="soumith/conda-cuda"

				    export DOCKER_IMAGE="pytorch/conda-cuda"

				  elif [[ "$DESIRED_CUDA" == cpu ]]; then

				    export DOCKER_IMAGE="soumith/manylinux-cuda100"

				    export DOCKER_IMAGE="pytorch/manylinux-cuda100"

				  else

				    export DOCKER_IMAGE="soumith/manylinux-cuda${DESIRED_CUDA:2}"

				    export DOCKER_IMAGE="pytorch/manylinux-cuda${DESIRED_CUDA:2}"

				  fi

				fi

				@ -55,13 +55,34 @@ fi

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu100" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE"

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE"

				else

				  export PYTORCH_BUILD_VERSION="1.3.0.dev$DATE+$DESIRED_CUDA"

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE+$DESIRED_CUDA"

				fi

				export PYTORCH_BUILD_NUMBER=1

				JAVA_HOME=

				BUILD_JNI=OFF

				if [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  POSSIBLE_JAVA_HOMES=()

				  POSSIBLE_JAVA_HOMES+=(/usr/local)

				  POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)

				  POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)

				  for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do

				    if [[ -e "$JH/include/jni.h" ]] ; then

				      echo "Found jni.h under $JH"

				      JAVA_HOME="$JH"

				      BUILD_JNI=ON

				      break

				    fi

				  done

				  if [ -z "$JAVA_HOME" ]; then

				    echo "Did not find jni.h"

				  fi

				fi

				cat >>"$envfile" <<EOL

				# =================== The following code will be executed inside Docker container ===================

				export TZ=UTC

				@ -75,7 +96,7 @@ export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.3.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.4.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				@ -85,6 +106,8 @@ export TORCH_PACKAGE_NAME='torch'

				export TORCH_CONDA_BUILD_FOLDER='pytorch-nightly'

				export USE_FBGEMM=1

				export JAVA_HOME=$JAVA_HOME

				export BUILD_JNI=$BUILD_JNI

				export PIP_UPLOAD_FOLDER="$PIP_UPLOAD_FOLDER"

				export DOCKER_IMAGE="$DOCKER_IMAGE"

									
										4

.circleci/scripts/binary_run_in_docker.sh
									
												View File
												
				@ -18,9 +18,9 @@ chmod +x /home/circleci/project/ci_test_script.sh

				# Run the docker

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  export id=$(docker run --runtime=nvidia -t -d "${DOCKER_IMAGE}")

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d "${DOCKER_IMAGE}")

				else

				  export id=$(docker run -t -d "${DOCKER_IMAGE}")

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d "${DOCKER_IMAGE}")

				fi

				# Copy the envfile and script with all the code to run into the docker.

									
										36

.circleci/scripts/build_android_gradle.sh
									
												View File
												
				@ -4,6 +4,8 @@ set -eux -o pipefail

				export ANDROID_NDK_HOME=/opt/ndk

				export ANDROID_HOME=/opt/android/sdk

				# Must be in sync with GRADLE_VERSION in docker image for android

				# https://github.com/pietern/pytorch-dockerfiles/blob/master/build.sh#L155

				export GRADLE_VERSION=4.10.3

				export GRADLE_HOME=/opt/gradle/gradle-$GRADLE_VERSION

				export GRADLE_PATH=$GRADLE_HOME/bin/gradle

				@ -45,15 +47,39 @@ fi

				env

				echo "BUILD_ENVIRONMENT:$BUILD_ENVIRONMENT"

				GRADLE_PARAMS="-p android assembleRelease --debug --stacktrace"

				if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then

				    GRADLE_PARAMS+=" -PABI_FILTERS=x86"

				fi

				if [ -n "{GRADLE_OFFLINE:-}" ]; then

				    GRADLE_PARAMS+=" --offline"

				fi

				# touch gradle cache files to prevent expiration

				while IFS= read -r -d '' file

				do

				  touch "$file" || true

				done < <(find /var/lib/jenkins/.gradle -type f -print0)

				env

				export GRADLE_LOCAL_PROPERTIES=~/workspace/android/local.properties

				rm -f $GRADLE_LOCAL_PROPERTIES

				echo "sdk.dir=/opt/android/sdk" >> $GRADLE_LOCAL_PROPERTIES

				echo "ndk.dir=/opt/ndk" >> $GRADLE_LOCAL_PROPERTIES

				echo "cmake.dir=/usr/local" >> $GRADLE_LOCAL_PROPERTIES

				if [[ "${BUILD_ENVIRONMENT}" == *-gradle-build-only-x86_32* ]]; then

				    $GRADLE_PATH -PABI_FILTERS=x86 -p ~/workspace/android/ assembleRelease

				else

				    $GRADLE_PATH -p ~/workspace/android/ assembleRelease

				fi

				$GRADLE_PATH $GRADLE_PARAMS

				find . -type f -name "*.a" -exec ls -lh {} \;

				while IFS= read -r -d '' file

				do

				  echo

				  echo "$file"

				  ls -lah "$file"

				  zipinfo -l "$file"

				done < <(find . -type f -name '*.aar' -print0)

				find . -type f -name *aar -print | xargs tar cfvz ~/workspace/android/artifacts.tgz

									
										2

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -53,7 +53,7 @@ sudo apt-get -y install doxygen

				# Generate ATen files

				pushd "${pt_checkout}"

				pip install -r requirements.txt

				time GEN_TO_SOURCE=1 python aten/src/ATen/gen.py \

				time python aten/src/ATen/gen.py \

				  -s aten/src/ATen \

				  -d build/aten/src/ATen \

				  aten/src/ATen/Declarations.cwrap \

									
										26

.circleci/scripts/should_run_job.py
									
												View File
												
				@ -18,6 +18,8 @@ default_set = set([

				    'pytorch-linux-xenial-py3-clang5-asan',

				    # PyTorch DEBUG

				    'pytorch-linux-xenial-py3.6-gcc5.4',

				    # LibTorch

				    'pytorch-libtorch-linux-xenial-cuda9-cudnn7-py3',

				    # Caffe2 CPU

				    'caffe2-py2-mkl-ubuntu16.04',

				@ -30,14 +32,17 @@ default_set = set([

				    'caffe2-py2-clang7-ubuntu16.04',

				    # Caffe2 CMake

				    'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',

				    # Caffe2 CentOS

				    'caffe2-py3.6-devtoolset7-cuda9.0-cudnn7-centos7',

				    # Binaries

				    'manywheel 2.7mu cpu devtoolset7',

				    'libtorch 2.7m cpu devtoolset7',

				    'libtorch 2.7m cpu gcc5.4_cxx11-abi',

				    'libtorch-ios-10.2.1-nightly-x86_64-build',

				    'libtorch-ios-10.2.1-nightly-arm64-build',

				    'libtorch-ios-10.2.1-nightly-binary-build-upload',

				    'libtorch 2.7 cpu',

				    'libtorch-ios-11.2.1-nightly-x86_64-build',

				    'libtorch-ios-11.2.1-nightly-arm64-build',

				    'libtorch-ios-11.2.1-nightly-binary-build-upload',

				    # Caffe2 Android

				    'caffe2-py2-android-ubuntu16.04',

				@ -48,11 +53,15 @@ default_set = set([

				    'pytorch-macos-10.13-cuda9.2-cudnn7-py3',

				    # PyTorch Android

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19',

				    # PyTorch Android gradle

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',

				    # Pytorch iOS builds

				    'pytorch-ios-10.2.1-x86_64_build',

				    'pytorch-ios-10.2.1-arm64_build',

				    'pytorch-ios-11.2.1-x86_64_build',

				    'pytorch-ios-11.2.1-arm64_build',

				    # PyTorch Mobile builds

				    'pytorch-linux-xenial-py3-clang5-mobile-build',

				    # Pytorch backward compatibility check

				    'pytorch-linux-backward-compatibility-check-test',

				@ -60,10 +69,9 @@ default_set = set([

				    # XLA

				    'pytorch-xla-linux-xenial-py3.6-clang7',

				    # Named tensor

				    "pytorch-namedtensor-linux-xenial-py3.6-gcc5.4",

				    "pytorch-namedtensor-linux-xenial-py3-clang5-asan",

				    "pytorch-namedtensor-linux-xenial-cuda9-cudnn7-py2",

				    # GraphExecutor config jobs

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test',

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test',

				    # Other checks

				    'pytorch-short-perf-test-gpu',

									
										12

.circleci/verbatim-sources/binary-job-specs.yml
									
												View File
												
				@ -223,7 +223,7 @@

				  binary_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				      xcode: "11.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				@ -232,12 +232,18 @@

				    - run_brew_for_ios_build

				    - run:

				        name: Build

				        contxt: org-member

				        no_output_timeout: "1h"

				        command: |

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_build.sh"

				          cat "$script"

				          source "$script"

				    - run:

				        name: Test

				        no_output_timeout: "30m"

				        command: |

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_test.sh"

				          cat "$script"

				          source "$script"

				    - persist_to_workspace:

				        root: /Users/distiller/workspace/

				        paths: ios

				@ -245,7 +251,7 @@

				  binary_ios_upload: 

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				      xcode: "11.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

									
										16

.circleci/verbatim-sources/caffe2-job-specs.yml
									
												View File
												
				@ -41,7 +41,7 @@

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				@ -112,9 +112,9 @@

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				@ -146,12 +146,7 @@

				            # Reinitialize path (see man page for path_helper(8))

				            eval `/usr/libexec/path_helper -s`

				            # Use Homebrew Python if configured to do so

				            if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then

				              export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				            fi

				            pip -q install numpy

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				            # Install Anaconda if we need to

				            if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				@ -164,6 +159,8 @@

				              source ${TMPDIR}/anaconda/bin/activate

				            fi

				            pip -q install numpy

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				@ -201,4 +198,3 @@

				            if which sccache > /dev/null; then

				              sccache --show-stats

				            fi

									
										21

.circleci/verbatim-sources/docker_build_job.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,21 @@

				  docker_build_job:

				      parameters:

				        image_name:

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_<< parameters.image_name >>

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

									
										134

.circleci/verbatim-sources/job-specs-custom.yml
									
												View File
												
				@ -1,43 +1,8 @@

				  pytorch_short_perf_test_gpu:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-short-perf-test-gpu

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				      PYTHON_VERSION: "3.6"

				      USE_CUDA_DOCKER_RUNTIME: "1"

				    resource_class: gpu.medium

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Perf Test

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          docker cp $id:/var/lib/jenkins/workspace/env /home/circleci/project/env

				          # This IAM user allows write access to S3 bucket for perf test numbers

				          set +x

				          echo "declare -x AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env

				          echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_PERF_TEST_S3_BUCKET_V4}" >> /home/circleci/project/env

				          set -x

				          docker cp /home/circleci/project/env $id:/var/lib/jenkins/workspace/env

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/short-perf-test-gpu.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				  pytorch_python_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				@ -54,7 +19,7 @@

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				@ -82,7 +47,7 @@

				  pytorch_cpp_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:347"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				@ -99,7 +64,7 @@

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          # master branch docs push

				          if [[ "${CIRCLE_BRANCH}" == "master" ]]; then

				@ -186,6 +151,8 @@

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				      - store_test_results:

				          path: test/test-reports

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    environment:

				@ -238,7 +205,7 @@

				  pytorch_android_gradle_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				@ -268,14 +235,14 @@

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # arm-v7a

				          time docker pull ${docker_image_libtorch_android_arm_v7a} >/dev/null

				          export id_arm_v7a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})

				          export id_arm_v7a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v7a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -285,7 +252,7 @@

				          # x86_64

				          time docker pull ${docker_image_libtorch_android_x86_64} >/dev/null

				          export id_x86_64=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})

				          export id_x86_64=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_64})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -295,7 +262,7 @@

				          # arm-v8a

				          time docker pull ${docker_image_libtorch_android_arm_v8a} >/dev/null

				          export id_arm_v8a=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})

				          export id_arm_v8a=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_arm_v8a})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -308,7 +275,7 @@

				          docker cp ~/workspace/build_android_install_arm_v8a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v8a

				          # run gradle buildRelease

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_artifacts

				@ -324,7 +291,7 @@

				  pytorch_android_publish_snapshot:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				@ -348,7 +315,7 @@

				          # x86_32

				          time docker pull ${docker_image_libtorch_android_x86_32_gradle} >/dev/null

				          export id_x86_32=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})

				          export id_x86_32=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32_gradle})

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export SONATYPE_NEXUS_USERNAME=${SONATYPE_NEXUS_USERNAME}" && echo "export SONATYPE_NEXUS_PASSWORD=${SONATYPE_NEXUS_PASSWORD}" && echo "export ANDROID_SIGN_KEY=${ANDROID_SIGN_KEY}" && echo "export ANDROID_SIGN_PASS=${ANDROID_SIGN_PASS}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/publish_android_snapshot.sh") | docker exec -u jenkins -i "$id_x86_32" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				@ -360,7 +327,7 @@

				  pytorch_android_gradle_build-x86_32:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				@ -388,9 +355,9 @@

				          # x86

				          time docker pull ${docker_image_libtorch_android_x86_32} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image_libtorch_android_x86_32})

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "source ./workspace/env" && echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GRADLE_OFFLINE=1" && echo "sudo chown -R jenkins workspace && cd workspace && ./.circleci/scripts/build_android_gradle.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_android_x86_32_artifacts

				@ -406,12 +373,34 @@

				  pytorch_ios_build:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "10.2.1"

				      xcode: "11.2.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_ios_build

				      - run_brew_for_ios_build    

				      - run:

				          name: Run Fastlane

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            PROJ_ROOT=/Users/distiller/project

				            cd ${PROJ_ROOT}/ios/TestApp

				            # install fastlane

				            sudo gem install bundler && bundle install

				            # install certificates

				            echo ${IOS_CERT_KEY} >> cert.txt

				            base64 --decode cert.txt -o Certificates.p12

				            rm cert.txt

				            bundle exec fastlane install_cert

				            # install the provisioning profile

				            PROFILE=TestApp_CI.mobileprovision

				            PROVISIONING_PROFILES=~/Library/MobileDevice/Provisioning\ Profiles

				            mkdir -pv "${PROVISIONING_PROFILES}"

				            cd "${PROVISIONING_PROFILES}"

				            echo ${IOS_SIGN_KEY} >> cert.txt

				            base64 --decode cert.txt -o ${PROFILE}

				            rm cert.txt

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				@ -421,7 +410,6 @@

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            export TCLLIBPATH="/usr/local/lib"

				            # Install conda

				            curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/Downloads/conda.sh

				@ -444,3 +432,43 @@

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				      - run:

				          name: Run Build Tests

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            PROJ_ROOT=/Users/distiller/project

				            PROFILE=TestApp_CI

				            # run the ruby build script

				            if ! [ -x "$(command -v xcodebuild)" ]; then

				              echo 'Error: xcodebuild is not installed.'

				              exit 1

				            fi

				            echo ${IOS_DEV_TEAM_ID}

				            ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				            if ! [ "$?" -eq "0" ]; then

				              echo 'xcodebuild failed!'

				              exit 1

				            fi

				      - run:

				          name: Run Simulator Tests

				          no_output_timeout: "2h"

				          command: |

				            set -e

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then 

				              echo "not SIMULATOR build, skip it."

				              exit 0

				            fi

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            #install the latest version of PyTorch and TorchVision

				            pip install torch torchvision

				            #run unit test

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            python trace_model.py

				            ruby setup.rb

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            fastlane scan

									
										4

.circleci/verbatim-sources/job-specs-setup.yml
									
												View File
												
				@ -4,10 +4,6 @@

				      - image: circleci/python:3.7.3

				    steps:

				      - checkout

				      - run:

				          name: Ensure config is up to date

				          command: ./ensure-consistency.py

				          working_directory: .circleci

				      - run:

				          name: Save commit message

				          command: git log --format='%B' -n 1 HEAD > .circleci/scripts/COMMIT_MSG

									
										106

.circleci/verbatim-sources/pytorch-job-specs.yml
									
												View File
												
				@ -17,52 +17,57 @@ jobs:

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          # TODO We may want to move the rebase logic to a separate step after checkout

				          # Rebase to master only if in xenial_py3_6_gcc5_4 case

				          if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				            echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				            set -x

				            git config --global user.email "circleci.ossci@gmail.com"

				            git config --global user.name "CircleCI"

				            git config remote.origin.url https://github.com/pytorch/pytorch.git

				            git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				            git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=50 --quiet

				            export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				            echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				            export GIT_COMMIT=${CIRCLE_SHA1}

				            echo "GIT_COMMIT: " ${GIT_COMMIT}

				            git checkout -f ${GIT_COMMIT}

				            git reset --hard ${GIT_COMMIT}

				            git merge --no-edit --no-ff ${GIT_MERGE_TARGET}

				            set +x

				          else

				            echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          fi

				          # NB: Temporarily disable the rebase logic in v1.4.0, don't merge this change into master

				          # # TODO We may want to move the rebase logic to a separate step after checkout

				          # # Rebase to master only if in xenial_py3_6_gcc5_4 case

				          # if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				          #   echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				          #   set -x

				          #   git config --global user.email "circleci.ossci@gmail.com"

				          #   git config --global user.name "CircleCI"

				          #   git config remote.origin.url https://github.com/pytorch/pytorch.git

				          #   git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				          #   git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				          #   export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				          #   echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				          #   export GIT_COMMIT=${CIRCLE_SHA1}

				          #   echo "GIT_COMMIT: " ${GIT_COMMIT}

				          #   git checkout -f ${GIT_COMMIT}

				          #   git reset --hard ${GIT_COMMIT}

				          #   git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}

				          #   set +x

				          # else

				          #   echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          # fi

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				            NAMED_FLAG="export BUILD_NAMEDTENSOR=1"

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$PARALLEL_FLAGS"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$NAMED_FLAG"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The namedtensor and xla builds use the same docker image as

				            # The xla build uses the same docker image as

				            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				            elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then

				@ -94,24 +99,43 @@ jobs:

				          set -e

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"namedtensor"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-namedtensor

				            export NAMED_FLAG="export BUILD_NAMEDTENSOR=1 && export TEST_NAMEDTENSOR=1"

				          elif [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${NAMED_FLAG}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          retrieve_test_reports() {

				            echo "retrieving test reports"

				            docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

				          }

				          trap "retrieve_test_reports" ERR

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          retrieve_test_reports

				    - store_test_results:

				        path: test-reports

									
										20

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml
									
												View File
												
				@ -10,19 +10,19 @@

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_build

				          build_environment: "conda 2.7 cpu devtoolset7"

				          requires:

				            - setup

				          docker_image: "soumith/conda-cuda"

				          docker_image: "pytorch/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_build

				      - binary_linux_build:

				@ -31,14 +31,14 @@

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"

				          docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

				      # TODO we should test a libtorch cuda build, but they take too long

				      # - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build

				      - binary_mac_build:

				@ -63,14 +63,14 @@

				          requires:

				            - setup

				            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - binary_linux_test:

				@ -79,7 +79,7 @@

				          requires:

				            - setup

				            - binary_linux_conda_2_7_cpu_devtoolset7_build

				          docker_image: "soumith/conda-cuda"

				          docker_image: "pytorch/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_test:

				      - binary_linux_test:

				@ -89,7 +89,7 @@

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "soumith/manylinux-cuda100"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				@ -97,5 +97,5 @@

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "yf225/pytorch-binary-docker-image-ubuntu16.04:latest"

				          docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

									
										66

.circleci/verbatim-sources/workflows-docker-builder.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,66 @@

				  docker_build:

				    triggers:

				      - schedule:

				          cron: "0 15 * * 0"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				          image_name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9-cudnn7-py3"

				          image_name: "pytorch-linux-xenial-cuda9-cudnn7-py3"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py2.7.9"

				          image_name: "pytorch-linux-xenial-py2.7.9"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py2.7"

				          image_name: "pytorch-linux-xenial-py2.7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				          image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3-clang5-asan"

				          image_name: "pytorch-linux-xenial-py3-clang5-asan"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.5"

				          image_name: "pytorch-linux-xenial-py3.5"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-clang7"

				          image_name: "pytorch-linux-xenial-py3.6-clang7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc4.8"

				          image_name: "pytorch-linux-xenial-py3.6-gcc4.8"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc5.4"

				          image_name: "pytorch-linux-xenial-py3.6-gcc5.4"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc7.2"

				          image_name: "pytorch-linux-xenial-py3.6-gcc7.2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc7"

				          image_name: "pytorch-linux-xenial-py3.6-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-pynightly"

				          image_name: "pytorch-linux-xenial-pynightly"

									
										8

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml
									
												View File
												
				@ -3,7 +3,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				@ -12,7 +12,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				@ -21,7 +21,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				@ -30,7 +30,7 @@

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:347"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

									
										16

.circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml
									
												View File
												
				@ -1,7 +1,8 @@

				      # Pytorch iOS binary builds

				      - binary_ios_build:

				          name: pytorch_ios_10_2_1_nightly_x86_64_build

				          build_environment: "libtorch-ios-10.2.1-nightly-x86_64-build"

				          name: pytorch_ios_11_2_1_nightly_x86_64_build

				          build_environment: "libtorch-ios-11.2.1-nightly-x86_64-build"

				          context: org-member

				          ios_platform: "SIMULATOR"

				          ios_arch: "x86_64"

				          requires: 

				@ -10,8 +11,9 @@

				            branches:

				              only: nightly

				      - binary_ios_build:

				          name: pytorch_ios_10_2_1_nightly_arm64_build

				          build_environment: "libtorch-ios-10.2.1-nightly-arm64-build"

				          name: pytorch_ios_11_2_1_nightly_arm64_build

				          build_environment: "libtorch-ios-11.2.1-nightly-arm64-build"

				          context: org-member

				          ios_arch: "arm64"

				          ios_platform: "OS"

				          requires: 

				@ -20,12 +22,12 @@

				            branches:

				              only: nightly

				      - binary_ios_upload:

				          build_environment: "libtorch-ios-10.2.1-nightly-binary-build-upload"

				          build_environment: "libtorch-ios-11.2.1-nightly-binary-build-upload"

				          context: org-member

				          requires:

				            - setup

				            - pytorch_ios_10_2_1_nightly_x86_64_build

				            - pytorch_ios_10_2_1_nightly_arm64_build

				            - pytorch_ios_11_2_1_nightly_x86_64_build

				            - pytorch_ios_11_2_1_nightly_arm64_build

				          filters:

				            branches:

				              only: nightly

									
										16

.circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,16 @@

				      - pytorch_linux_test:

				          name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test

				          requires:

				            - setup

				            - pytorch_linux_xenial_py3_6_gcc5_4_build

				          build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"

				          resource_class: large

				      - pytorch_linux_test:

				          name: pytorch_linux_xenial_py3_6_gcc5_4_ge_config_simple_test

				          requires:

				            - setup

				            - pytorch_linux_xenial_py3_6_gcc5_4_build

				          build_environment: "pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:405"

				          resource_class: large

									
										12

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml
									
												View File
												
				@ -1,13 +1,17 @@

				      # Pytorch iOS PR builds

				      - pytorch_ios_build:

				          name: pytorch_ios_10_2_1_x86_64_build

				          build_environment: "pytorch-ios-10.2.1-x86_64_build"

				          name: pytorch_ios_11_2_1_x86_64_build

				          context: org-member

				          build_environment: "pytorch-ios-11.2.1-x86_64_build"

				          ios_arch: "x86_64"

				          ios_platform: "SIMULATOR"

				          requires:

				            - setup

				      - pytorch_ios_build:

				          name: pytorch_ios_10_2_1_arm64_build

				          build_environment: "pytorch-ios-10.2.1-arm64_build"

				          name: pytorch_ios_11_2_1_arm64_build

				          context: org-member

				          build_environment: "pytorch-ios-11.2.1-arm64_build"

				          ios_arch: "arm64"

				          ios_platform: "OS"

				          requires:

				            - setup

									
										7

.circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,7 @@

				      # PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

				      - pytorch_linux_build:

				          name: pytorch_linux_xenial_py3_clang5_mobile_build

				          requires:

				            - setup

				          build_environment: "pytorch-linux-xenial-py3-clang5-mobile-build"

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:405"

									
										13

.circleci/verbatim-sources/workflows-s3-html.yml
									
												View File
												
				@ -1,16 +1,3 @@

				  # Scheduled to run 4 hours after the binary jobs start

				  # These jobs need to run after all the binary jobs run, regardless of if the

				  # jobs failed or not. There's no way to do this in CircleCI right now, so we

				  # just schedule this to run after all the binary jobs should've finished.

				  # These jobs are all idempotent and very lightweight; they just upload html

				  # files that track what binaries are available and what their sizes are.

				  update_s3_htmls:

				    jobs:

				      - setup:

				          filters:

				            branches:

				              only: postnightly

				      - update_s3_htmls_for_nightlies:

				          context: org-member

				          requires:

56

.clang-tidy

View File

 @ -1,34 +1,32 @@
 ---
 # NOTE there must be no spaces before the '-', so put the comma first.
 Checks: '
   -*
   ,bugprone-*
   ,-bugprone-forward-declaration-namespace
   ,-bugprone-macro-parentheses
   ,-bugprone-lambda-function-name
   ,cppcoreguidelines-*
   ,-cppcoreguidelines-interfaces-global-init
   ,-cppcoreguidelines-owning-memory
   ,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-bounds-pointer-arithmetic
   ,-cppcoreguidelines-pro-type-cstyle-cast
   ,-cppcoreguidelines-pro-type-reinterpret-cast
   ,-cppcoreguidelines-pro-type-static-cast-downcast
   ,-cppcoreguidelines-pro-type-union-access
   ,-cppcoreguidelines-pro-type-vararg
   ,-cppcoreguidelines-special-member-functions
   ,hicpp-exception-baseclass
   ,hicpp-avoid-goto
   ,modernize-*
   ,-modernize-return-braced-init-list
   ,-modernize-use-auto
   ,-modernize-use-default-member-init
   ,-modernize-use-using
   ,performance-*
   ,-performance-noexcept-move-constructor
 # NOTE there must be no spaces before the '-', so put the comma last.
 Checks: '-*,
 bugprone-*,
 -bugprone-forward-declaration-namespace,
 -bugprone-macro-parentheses,
 -bugprone-lambda-function-name,
 cppcoreguidelines-*,
 -cppcoreguidelines-interfaces-global-init,
 -cppcoreguidelines-owning-memory,
 -cppcoreguidelines-pro-bounds-array-to-pointer-decay,
 -cppcoreguidelines-pro-bounds-constant-array-index,
 -cppcoreguidelines-pro-bounds-pointer-arithmetic,
 -cppcoreguidelines-pro-type-cstyle-cast,
 -cppcoreguidelines-pro-type-reinterpret-cast,
 -cppcoreguidelines-pro-type-static-cast-downcast,
 -cppcoreguidelines-pro-type-union-access,
 -cppcoreguidelines-pro-type-vararg,
 -cppcoreguidelines-special-member-functions,
 hicpp-exception-baseclass,
 hicpp-avoid-goto,
 modernize-*,
 -modernize-return-braced-init-list,
 -modernize-use-auto,
 -modernize-use-default-member-init,
 -modernize-use-using,
 performance-*,
 -performance-noexcept-move-constructor,
   '
 WarningsAsErrors: '*'
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 CheckOptions:

5

.flake8

View File

 @ -5,10 +5,9 @@ max-line-length = 120
 # E501 is not flexible enough, we're using B950 instead
 ignore =
     E203,E305,E402,E501,E721,E741,F403,F405,F821,F841,F999,W503,W504,C408,E302,W291,E303,
     # EXE001 is skipped for now because some files use shebang to determine Python version.
     EXE001,
     # these ignores are from flake8-bugbear; please fix!
     B007,B008,
     # these ignores are from flake8-comprehensions; please fix!
     C400,C401,C402,C403,C404,C405,C407,C411,
 exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,tools/amd_build/pyHIPIFY,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi
 per-file-ignores = __init__.py: F401
 exclude = docs/src,venv,third_party,caffe2,scripts,docs/caffe2,torch/lib/include,torch/lib/tmp_install,build,torch/include,*.pyi,.git

									
										182

.github/workflows/lint.yml
									
										vendored
									
												View File
												
				@ -7,29 +7,60 @@ on:

				  pull_request:

				jobs:

				  quick-checks:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				        with:

				          python-version: 3.x

				          architecture: x64

				      - name: Checkout PyTorch

				        uses: actions/checkout@v1

				      - name: Ensure consistent CircleCI YAML config

				        run: |

				          pip install -r requirements.txt

				          cd .circleci && ./ensure-consistency.py

				      - name: Ensure Docker version is correctly deployed

				        run: .circleci/validate-docker-version.py

				      - name: Shellcheck Jenkins scripts

				        run: |

				          sudo apt-get install -y shellcheck

				          .jenkins/run-shellcheck.sh

				      - name: Ensure no tabs

				        run: |

				          (! git grep -I -l $'\t' -- . ':(exclude)*.svg' ':(exclude)**Makefile' ':(exclude)**/contrib/**' ':(exclude)third_party' ':(exclude).gitattributes' ':(exclude).gitmodules' || (echo "The above files have tabs; please convert them to spaces"; false))

				      - name: Ensure C++ source files are not executable

				        run: |

				          (! find . \( -path ./third_party -o -path ./.git -o -path ./torch/bin -o -path ./build \) -prune -o -type f -executable -regextype posix-egrep -not -regex '.+(\.(bash|sh|py|so)|git-pre-commit)$' -print | grep . || (echo 'The above files have executable permission; please remove their executable permission by using `chmod -x`'; false))

				      - name: MyPy typecheck

				        run: |

				          pip install mypy mypy-extensions

				          mypy @mypy-files.txt

				      - name: C++ docs check

				        run: |

				          sudo apt-get install -y doxygen && pip install -r requirements.txt

				          cd docs/cpp/source && ./check-doxygen.sh

				  flake8-py3:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				        with:

				          python-version: 3.7.4

				          python-version: 3.x

				          architecture: x64

				      - name: Fetch PyTorch

				        uses: actions/checkout@master

				        uses: actions/checkout@v1

				      - name: Checkout PR tip

				        run: |

				          set -eux

				          if [ -z "${GITHUB_HEAD_REF}" ]; then

				            # We are on master, just set the SHA from our current location

				            echo ::set-output name=commit_sha::${GITHUB_SHA}

				          else

				            # We are on a PR, so actions/checkout leaves us on merge commit.

				          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

				            # We are on a PR, so actions/checkout leaves us on a merge commit.

				            # Check out the actual tip of the branch.

				            PR_TIP=$(git rev-parse HEAD^2)

				            git checkout ${PR_TIP}

				            echo ::set-output name=commit_sha::${PR_TIP}

				            git checkout ${{ github.event.pull_request.head.sha }}

				          fi

				          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

				        id: get_pr_tip

				      - name: Run flake8

				        run: |

				@ -46,3 +77,134 @@ jobs:

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  flake8-py2:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				        with:

				          python-version: 2.x

				          architecture: x64

				      - name: Fetch PyTorch

				        uses: actions/checkout@v1

				      - name: Checkout PR tip

				        run: |

				          set -eux

				          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

				            # We are on a PR, so actions/checkout leaves us on a merge commit.

				            # Check out the actual tip of the branch.

				            git checkout ${{ github.event.pull_request.head.sha }}

				          fi

				          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

				        id: get_pr_tip

				      - name: Run flake8

				        run: |

				          set -eux

				          pip install flake8

				          rm -rf .circleci

				          flake8 --exit-zero > ${GITHUB_WORKSPACE}/flake8-output.txt

				          cat ${GITHUB_WORKSPACE}/flake8-output.txt

				      - name: Add annotations

				        uses: pytorch/add-annotations-github-action@master

				        with:

				          check_name: 'flake8-py2'

				          linter_output_path: 'flake8-output.txt'

				          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorCode>\w\d+) (?<errorDesc>.*)'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				  clang-tidy:

				    if: github.event_name == 'pull_request'

				    runs-on: ubuntu-latest

				    steps:

				      - name: Setup Python

				        uses: actions/setup-python@v1

				        with:

				          python-version: 3.x

				          architecture: x64

				      - name: Checkout PyTorch

				        uses: actions/checkout@v1

				      - name: Checkout PR tip

				        run: |

				          set -eux

				          if [[ "${{ github.event_name }}" == "pull_request" ]]; then

				            # We are on a PR, so actions/checkout leaves us on a merge commit.

				            # Check out the actual tip of the branch.

				            git checkout ${{ github.event.pull_request.head.sha }}

				          fi

				          echo ::set-output name=commit_sha::$(git rev-parse HEAD)

				        id: get_pr_tip

				      - name: Install dependencies

				        run: |

				          set -eux

				          # Install CUDA

				          wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

				          sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

				          sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

				          sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /"

				          sudo apt-get update

				          sudo apt-get --no-install-recommends -y install cuda

				          # Install dependencies

				          pip install pyyaml

				          wget -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -

				          sudo apt-add-repository "deb http://apt.llvm.org/xenial/ llvm-toolchain-xenial-8 main"

				          sudo apt-get update

				          sudo apt-get install -y clang-tidy-8

				          sudo update-alternatives --install /usr/bin/clang-tidy clang-tidy /usr/bin/clang-tidy-8 1000

				      - name: Run clang-tidy

				        run: |

				          set -eux

				          git remote add upstream https://github.com/pytorch/pytorch

				          git fetch upstream "$GITHUB_BASE_REF"

				          BASE_SHA=${{ github.event.pull_request.base.sha }}

				          HEAD_SHA=${{ github.event.pull_request.head.sha }}

				          MERGE_BASE=$(git merge-base $BASE_SHA $HEAD_SHA)

				          if [[ ! -d build ]]; then

				            git submodule update --init --recursive

				            export USE_NCCL=0

				            # We really only need compile_commands.json, so no need to build!

				            time python setup.py --cmake-only build

				            # Generate ATen files.

				            time python aten/src/ATen/gen.py \

				              -s aten/src/ATen \

				              -d build/aten/src/ATen \

				              aten/src/ATen/Declarations.cwrap \

				              aten/src/THNN/generic/THNN.h \

				              aten/src/THCUNN/generic/THCUNN.h \

				              aten/src/ATen/nn.yaml \

				              aten/src/ATen/native/native_functions.yaml

				            # Generate PyTorch files.

				            time python tools/setup_helpers/generate_code.py            \

				              --declarations-path build/aten/src/ATen/Declarations.yaml \

				              --nn-path aten/src

				          fi

				          # Run Clang-Tidy

				          # The negative filters below are to exclude files that include onnx_pb.h or

				          # caffe2_pb.h, otherwise we'd have to build protos as part of this CI job.

				          python tools/clang_tidy.py                  \

				            --verbose                                 \

				            --paths torch/csrc/                       \

				            --diff "$MERGE_BASE"                      \

				            -g"-torch/csrc/jit/export.cpp"            \

				            -g"-torch/csrc/jit/import.cpp"            \

				            -g"-torch/csrc/jit/netdef_converter.cpp"  \

				            "$@" > ${GITHUB_WORKSPACE}/clang-tidy-output.txt

				          cat ${GITHUB_WORKSPACE}/clang-tidy-output.txt

				      - name: Add annotations

				        uses: suo/add-annotations-github-action@master

				        with:

				          check_name: 'clang-tidy'

				          linter_output_path: 'clang-tidy-output.txt'

				          commit_sha: ${{ steps.get_pr_tip.outputs.commit_sha }}

				          regex: '^(?<filename>.*?):(?<lineNumber>\d+):(?<columnNumber>\d+): (?<errorDesc>.*?) \[(?<errorCode>.*)\]'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

1

.gitignore vendored

View File

 @ -42,6 +42,7 @@ dropout_model.pt
 test/generated_type_hints_smoketest.py
 test/htmlcov
 test/cpp_extensions/install/
 test/test-reports/
 third_party/build/
 tools/shared/_utils_internal.py
 torch.egg-info/

2

.gitmodules vendored

View File

 @ -117,4 +117,4 @@
 [submodule "android/libs/fbjni"]
     ignore = dirty
     path = android/libs/fbjni
     url = https://github.com/IvanKobzarev/fbjni.git
     url = https://github.com/facebookincubator/fbjni.git

									
										9

.jenkins/caffe2/build.sh
									
												View File
												
				@ -4,14 +4,6 @@ set -ex

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# TODO: Migrate all centos jobs to use proper devtoolset

				if [[ "$BUILD_ENVIRONMENT" == *py2-cuda9.0-cudnn7-centos7* ]]; then

				  # There is a bug in pango packge on Centos7 that causes undefined

				  # symbols, upgrading glib2 to >=2.56.1 solves the issue. See

				  # https://bugs.centos.org/view.php?id=15495

				  sudo yum install -y -q glib2-2.56.1

				fi

				# CMAKE_ARGS are only passed to 'cmake' and the -Dfoo=bar does not work with

				# setup.py, so we build a list of foo=bars and then either convert it to

				# -Dfoo=bars or export them before running setup.py

				@ -169,7 +161,6 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  export PATH="/usr/local/cuda/bin:$PATH"

				fi

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  build_args+=("USE_ROCM=ON")

				  # This is needed to enable ImageInput operator in resnet50_trainer

				  build_args+=("USE_OPENCV=ON")

				  # This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

									
										27

.jenkins/caffe2/test.sh
									
												View File
												
				@ -64,7 +64,7 @@ if [[ "$BUILD_ENVIRONMENT" == *cmake* ]]; then

				  exit 0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				# if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				  # Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04

				  # See comments on

				  # https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830

				@ -74,9 +74,9 @@ if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				  sudo pip -q install attrs==18.1.0 -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl

				  sudo pip -q install coverage==4.5.1 -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl

				  sudo pip -q install hypothesis==3.44.6 -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl

				else

				  pip install --user --no-cache-dir hypothesis==3.59.0

				fi

				# else

				#   pip install --user --no-cache-dir hypothesis==3.59.0

				# fi

				# Collect additional tests to run (outside caffe2/python)

				EXTRA_TESTS=()

				@ -93,6 +93,10 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  # On ROCm, RCCL (distributed) development isn't complete.

				  # https://github.com/ROCmSoftwarePlatform/rccl

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/data_parallel_model_test.py")

				  # This test has been flaky in ROCm CI (but note the tests are

				  # cpu-only so should be unrelated to ROCm)

				  rocm_ignore_test+=("--ignore $caffe2_pypath/python/operator_test/blobs_queue_db_test.py")

				fi

				# NB: Warnings are disabled because they make it harder to see what

				@ -100,8 +104,13 @@ fi

				echo "Running Python tests.."

				if [[ "$BUILD_ENVIRONMENT" == *py3* ]]; then

				  # locale setting is required by click package with py3

				  export LC_ALL=C.UTF-8

				  export LANG=C.UTF-8

				  for loc in "en_US.utf8" "C.UTF-8"; do

				    if locale -a | grep "$loc" >/dev/null 2>&1; then

				      export LC_ALL="$loc"

				      export LANG="$loc"

				      break;

				    fi

				  done

				fi

				pip install --user pytest-sugar

				@ -115,6 +124,7 @@ pip install --user pytest-sugar

				  --ignore "$caffe2_pypath/python/operator_test/matmul_op_test.py" \

				  --ignore "$caffe2_pypath/python/operator_test/pack_ops_test.py" \

				  --ignore "$caffe2_pypath/python/mkl/mkl_sbn_speed_test.py" \

				  --ignore "$caffe2_pypath/python/trt/test_pt_onnx_trt.py" \

				  ${rocm_ignore_test[@]} \

				  "$caffe2_pypath/python" \

				  "${EXTRA_TESTS[@]}"

				@ -123,7 +133,7 @@ pip install --user pytest-sugar

				# torchvision tests #

				#####################

				if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				  pip install -q --user git+https://github.com/pytorch/vision.git

				  pip install -q --user git+https://github.com/pytorch/vision.git@v0.5.0

				  pip install -q --user ninja

				  # JIT C++ extensions require ninja, so put it into PATH.

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				@ -131,8 +141,7 @@ if [[ "$BUILD_ENVIRONMENT" == *onnx* ]]; then

				    # default pip version is too old(9.0.2), unable to support tag `manylinux2010`.

				    # Fix the pip error: Couldn't find a version that satisfies the requirement

				    sudo pip install --upgrade pip

				    pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==0.5.0.dev905

				    pip install -q --user -i https://test.pypi.org/simple/ ort-nightly==1.1.0.dev1228

				  fi

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										135

.jenkins/pytorch/build.sh
									
												View File
												
				@ -36,6 +36,12 @@ if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-asan* ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" "$@"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-linux-xenial-py3-clang5-mobile* ]]; then

				  # Use linux host toolchain + mobile build options in order to build & test

				  # mobile libtorch without having to setup Android/iOS toolchain/simulator.

				  exec ./scripts/build_mobile.sh -DBUILD_BINARY=ON "$@"

				fi

				echo "Python version:"

				python --version

				@ -61,6 +67,24 @@ if ! which conda; then

				  fi

				fi

				if [[ "$BUILD_ENVIRONMENT" == *libtorch* ]]; then

				  POSSIBLE_JAVA_HOMES=()

				  POSSIBLE_JAVA_HOMES+=(/usr/local)

				  POSSIBLE_JAVA_HOMES+=(/usr/lib/jvm/java-8-openjdk-amd64)

				  POSSIBLE_JAVA_HOMES+=(/Library/Java/JavaVirtualMachines/*.jdk/Contents/Home)

				  for JH in "${POSSIBLE_JAVA_HOMES[@]}" ; do

				    if [[ -e "$JH/include/jni.h" ]] ; then

				      echo "Found jni.h under $JH"

				      export JAVA_HOME="$JH"

				      export BUILD_JNI=ON

				      break

				    fi

				  done

				  if [ -z "$JAVA_HOME" ]; then

				    echo "Did not find jni.h"

				  fi

				fi

				# Use special scripts for Android builds

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				@ -112,9 +136,7 @@ if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  fi

				  python tools/amd_build/build_amd.py

				  # OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer

				  # LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user

				  python setup.py install --user

				  # runtime compilation of MIOpen kernels manages to crash sccache - hence undo the wrapping

				  bash tools/amd_build/unwrap_clang.sh

				@ -137,10 +159,6 @@ if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  export TORCH_CUDA_ARCH_LIST="6.0"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *xenial-py3.6-gcc5.4* ]]; then

				  export DEBUG=1

				fi

				# Patch required to build xla

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  git clone --recursive https://github.com/pytorch/xla.git

				@ -159,47 +177,60 @@ echo "The next three invocations are expected to fail with invalid command error

				( ! get_exit_code python setup.py clean] )

				( ! get_exit_code python setup.py clean bad_argument )

				# ppc64le build fails when WERROR=1

				# set only when building other architectures

				# only use for "python setup.py install" line

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le*  && "$BUILD_ENVIRONMENT" != *clang* ]]; then

				  WERROR=1 python setup.py install

				if [[ "$BUILD_ENVIRONMENT" != *libtorch* ]]; then

				  # ppc64le build fails when WERROR=1

				  # set only when building other architectures

				  # only use for "python setup.py install" line

				  if [[ "$BUILD_ENVIRONMENT" != *ppc64le*  && "$BUILD_ENVIRONMENT" != *clang* ]]; then

				    WERROR=1 python setup.py install

				  else

				    python setup.py install

				  fi

				  # TODO: I'm not sure why, but somehow we lose verbose commands

				  set -x

				  if which sccache > /dev/null; then

				    echo 'PyTorch Build Statistics'

				    sccache --show-stats

				  fi

				  assert_git_not_dirty

				  # Test documentation build

				  if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				    pushd docs

				    # TODO: Don't run this here

				    pip_install -r requirements.txt || true

				    LC_ALL=C make html

				    popd

				    assert_git_not_dirty

				  fi

				  # Build custom operator tests.

				  CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				  CUSTOM_OP_TEST="$PWD/test/custom_operator"

				  python --version

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  mkdir "$CUSTOM_OP_BUILD"

				  pushd "$CUSTOM_OP_BUILD"

				  cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)"

				  make VERBOSE=1

				  popd

				  assert_git_not_dirty

				else

				  python setup.py install

				fi

				  # Test standalone c10 build

				  if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				    mkdir -p c10/build

				    pushd c10/build

				    cmake ..

				    make -j

				    popd

				    assert_git_not_dirty

				  fi

				# TODO: I'm not sure why, but somehow we lose verbose commands

				set -x

				if which sccache > /dev/null; then

				  echo 'PyTorch Build Statistics'

				  sccache --show-stats

				fi

				assert_git_not_dirty

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip_install -r requirements.txt || true

				  LC_ALL=C make html

				  popd

				  assert_git_not_dirty

				fi

				# Test standalone c10 build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda9-cudnn7-py3* ]]; then

				  mkdir -p c10/build

				  pushd c10/build

				  cmake ..

				  make -j

				  popd

				  assert_git_not_dirty

				fi

				# Test no-Python build

				if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  # Test no-Python build

				  echo "Building libtorch"

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				@ -210,18 +241,6 @@ if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  popd

				fi

				# Build custom operator tests.

				CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				CUSTOM_OP_TEST="$PWD/test/custom_operator"

				python --version

				SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				mkdir "$CUSTOM_OP_BUILD"

				pushd "$CUSTOM_OP_BUILD"

				cmake "$CUSTOM_OP_TEST" -DCMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" -DPYTHON_EXECUTABLE="$(which python)"

				make VERBOSE=1

				popd

				assert_git_not_dirty

				# Test XLA build

				if [[ "${BUILD_ENVIRONMENT}" == *xla* ]]; then

				  # TODO: Move this to Dockerfile.

									
										8

.jenkins/pytorch/common.sh
									
												View File
												
				@ -158,7 +158,9 @@ fi

				function pip_install() {

				  # retry 3 times

				  pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@"

				  # old versions of pip don't have the "--progress-bar" flag

				  pip install --progress-bar off "$@" || pip install --progress-bar off "$@" || pip install --progress-bar off "$@" ||\

				  pip install "$@" || pip install "$@" || pip install "$@"

				}

				function pip_uninstall() {

				@ -166,6 +168,10 @@ function pip_uninstall() {

				  pip uninstall -y "$@" || pip uninstall -y "$@"

				}

				retry () {

				  $*  || (sleep 1 && $*) || (sleep 2 && $*)

				}

				function get_exit_code() {

				  set +e

				  "$@"

									
										12

.jenkins/pytorch/macos-common.sh
									
												View File
												
				@ -13,12 +13,12 @@ mkdir -p ${WORKSPACE_DIR}

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${WORKSPACE_DIR}/miniconda3" ]; then

				  mkdir -p ${WORKSPACE_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh

				  bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3

				  retry curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${WORKSPACE_DIR}/miniconda3.sh

				  retry bash ${WORKSPACE_DIR}/miniconda3.sh -b -p ${WORKSPACE_DIR}/miniconda3

				fi

				export PATH="${WORKSPACE_DIR}/miniconda3/bin:$PATH"

				source ${WORKSPACE_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				retry conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				# The torch.hub tests make requests to GitHub.

				#

				@ -29,13 +29,13 @@ conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				# > certificate verify failed: unable to get local issuer certificate

				# > (_ssl.c:1056)

				#

				conda install -y -c conda-forge certifi

				retry conda install -y -c conda-forge certifi

				# Needed by torchvision, which is imported from TestHub in test_utils.py.

				conda install -y pillow

				retry conda install -y pillow

				# Building with USE_DISTRIBUTED=1 requires libuv (for Gloo).

				conda install -y libuv pkg-config

				retry conda install -y libuv pkg-config

				# Image commit tag is used to persist the build from the build job

				# and to retrieve the build from the test job.

									
										3

.jenkins/pytorch/macos-test.sh
									
												View File
												
				@ -6,6 +6,9 @@ source "$(dirname "${BASH_SOURCE[0]}")/macos-common.sh"

				conda install -y six

				pip install -q hypothesis "librosa>=0.6.2" psutil

				# TODO move this to docker

				pip install unittest-xml-reporting

				# faulthandler become built-in since 3.3

				if [[ ! $(python -c "import sys; print(int(sys.version_info >= (3, 3)))") == "1" ]]; then

				  pip install -q faulthandler

									
										6

.jenkins/pytorch/multigpu-test.sh
									
												View File
												
				@ -10,8 +10,10 @@ COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch (distributed only)"

				if [ -n "${IN_CIRCLECI}" ]; then

				  # TODO move this to docker

				  pip_install unittest-xml-reporting

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				@ -28,7 +30,7 @@ if [ -n "${IN_CIRCLECI}" ]; then

				fi

				python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" "$PWD/../cpp-build"/caffe2/build/bin/test_api

				OMP_NUM_THREADS=2 TORCH_CPP_TEST_MNIST_PATH="test/cpp/api/mnist" build/bin/test_api

				time python test/run_test.py --verbose -i distributed

				time python test/run_test.py --verbose -i c10d

				time python test/run_test.py --verbose -i c10d_spawn

									
										40

.jenkins/pytorch/test.sh
									
												View File
												
				@ -12,6 +12,9 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch"

				if [ -n "${IN_CIRCLECI}" ]; then

				  # TODO move this to docker

				  pip_install unittest-xml-reporting

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get -qq update

				@ -32,6 +35,12 @@ if [ -n "${IN_CIRCLECI}" ]; then

				  fi

				fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # TODO: Move this to Docker

				  sudo apt-get -qq update

				  sudo apt-get -qq install --no-install-recommends libsndfile1

				fi

				# --user breaks ppc64le builds and these packages are already in ppc64le docker

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  # JIT C++ extensions require ninja.

				@ -40,7 +49,7 @@ if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  # TODO: move this to Docker

				  pip_install --user hypothesis

				  pip_install --user "hypothesis==4.53.2"

				  # TODO: move this to Docker

				  PYTHON_VERSION=$(python -c 'import platform; print(platform.python_version())'|cut -c1)

				@ -103,8 +112,18 @@ test_python_nn() {

				  assert_git_not_dirty

				}

				test_python_ge_config_simple() {

				  time python test/run_test.py --include jit_simple --verbose

				  assert_git_not_dirty

				}

				test_python_ge_config_legacy() {

				  time python test/run_test.py --include jit_legacy jit_fuser_legacy --verbose

				  assert_git_not_dirty

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods quantizer

				  time python test/run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose --bring-to-front quantization quantized quantized_tensor quantized_nn_mods

				  assert_git_not_dirty

				}

				@ -134,7 +153,7 @@ test_aten() {

				}

				test_torchvision() {

				  pip_install --user git+https://github.com/pytorch/vision.git@2b73a4846773a670632b29fb2fc2ac57df7bce5d

				  pip_install --user git+https://github.com/pytorch/vision.git@44a5bae933655ed7ff798669a43452b833f9ce01

				}

				test_libtorch() {

				@ -195,15 +214,17 @@ test_backward_compatibility() {

				  pushd test/backward_compatibility

				  python dump_all_function_schemas.py --filename new_schemas.txt

				  pip_uninstall torch

				  pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

				  pip_install torch==1.3.1+cpu -f https://download.pytorch.org/whl/torch_stable.html

				  python check_backward_compatibility.py --new-schemas new_schemas.txt

				  popd

				  set +x

				  assert_git_not_dirty

				}

				(cd test && python -c "import torch; print(torch.__config__.show())")

				(cd test && python -c "import torch; print(torch.__config__.parallel_info())")

				if ! [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then

				  (cd test && python -c "import torch; print(torch.__config__.show())")

				  (cd test && python -c "import torch; print(torch.__config__.parallel_info())")

				fi

				if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then

				  test_backward_compatibility

				@ -211,6 +232,13 @@ if [[ "${BUILD_ENVIRONMENT}" == *backward* ]]; then

				elif [[ "${BUILD_ENVIRONMENT}" == *xla* || "${JOB_BASE_NAME}" == *xla* ]]; then

				  test_torchvision

				  test_xla

				elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_legacy* || "${JOB_BASE_NAME}" == *ge_config_legacy* ]]; then

				  test_python_ge_config_legacy

				elif [[ "${BUILD_ENVIRONMENT}" == *ge_config_simple* || "${JOB_BASE_NAME}" == *ge_config_simple* ]]; then

				  test_python_ge_config_simple

				elif [[ "${BUILD_ENVIRONMENT}" == *libtorch* ]]; then

				  # TODO: run some C++ tests

				  echo "no-op at the moment"

				elif [[ "${BUILD_ENVIRONMENT}" == *-test1 || "${JOB_BASE_NAME}" == *-test1 ]]; then

				  test_torchvision

				  test_python_nn

									
										2

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat
									
												View File
												
				@ -22,7 +22,7 @@ if NOT "%BUILD_ENVIRONMENT%"=="" (

				    :: Numba is pinned to 0.44.0 to avoid https://github.com/numba/numba/issues/4352

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3 protobuf numba==0.44.0

				)

				pip install -q ninja future hypothesis "librosa>=0.6.2" psutil pillow

				pip install -q ninja future "hypothesis==4.53.2" "librosa>=0.6.2" psutil pillow

				:: No need to install faulthandler since we only test Python >= 3.6 on Windows

				:: faulthandler is builtin since Python 3.3

									
										2

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat
									
												View File
												
				@ -1,3 +1,3 @@

				call %SCRIPT_HELPERS_DIR%\setup_pytorch_env.bat

				cd test && python run_test.py --exclude nn --verbose && cd ..

				cd test && python run_test.py --exclude nn jit_simple jit_legacy jit_fuser_legacy --verbose && cd ..

				if ERRORLEVEL 1 exit /b 1

									
										22

CMakeLists.txt
									
												View File
												
				@ -15,6 +15,10 @@ if(NOT CMAKE_VERSION VERSION_LESS 3.15.0)

				  cmake_policy(SET CMP0092 NEW)

				endif()

				if(NOT CMAKE_VERSION VERSION_LESS 3.10)

				  set(FIND_CUDA_MODULE_DEPRECATED ON)

				endif()

				# ---[ Project and semantic versioning.

				project(Caffe2 CXX C)

				@ -103,7 +107,8 @@ option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				option(BUILD_PYTHON "Build Python binaries" ON)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				cmake_dependent_option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON

				  "NOT MSVC" OFF)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				option(BUILD_CAFFE2_MOBILE "Build libcaffe2 for mobile (deprecating)" ON)

				option(BUILD_NAMEDTENSOR "Experimental: compile with namedtensor support" OFF)

				@ -115,6 +120,7 @@ cmake_dependent_option(

				    CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON

				    "NOT BUILD_SHARED_LIBS" OFF)

				option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)

				option(BUILD_JNI "Build JNI bindings" OFF)

				cmake_dependent_option(

				    INSTALL_TEST "Install test binaries if BUILD_TEST is on" ON

				    "BUILD_TEST" OFF)

				@ -140,7 +146,7 @@ option(USE_METAL "Use Metal for iOS build" ON)

				option(USE_NATIVE_ARCH "Use -march=native" OFF)

				cmake_dependent_option(

				    USE_NCCL "Use NCCL" ON

				    "USE_CUDA;UNIX;NOT APPLE" OFF)

				    "USE_CUDA OR USE_ROCM;UNIX;NOT APPLE" OFF)

				cmake_dependent_option(

				    USE_STATIC_NCCL "Use static NCCL" OFF

				    "USE_NCCL" OFF)

				@ -199,6 +205,8 @@ cmake_dependent_option(

				  "MSVC" OFF)

				set(ONNX_NAMESPACE "onnx_torch" CACHE STRING "A namespace for ONNX; needed to build with other frameworks that share ONNX.")

				set(SELECTED_OP_LIST "" CACHE STRING

				    "Path to the yaml file that contains the list of operators to include for custom build. Include all operators by default.")

				# This is a fix for a rare build issue on Ubuntu:

				# symbol lookup error: miniconda3/envs/pytorch-py3.7/lib/libmkl_intel_lp64.so: undefined symbol: mkl_blas_dsyrk

				@ -312,7 +320,6 @@ if (INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE)

				  set(USE_FBGEMM OFF)

				  set(USE_PYTORCH_QNNPACK ON)

				  set(USE_QNNPACK OFF)

				  set(USE_STATIC_DISPATCH ON)

				  set(INTERN_DISABLE_ONNX ON)

				  set(INTERN_DISABLE_AUTOGRAD ON)

				  set(INTERN_USE_EIGEN_BLAS ON)

				@ -490,7 +497,7 @@ if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				endif()

				if(ANDROID)

				if(ANDROID AND (NOT ANDROID_DEBUG_SYMBOLS))

				  if(CMAKE_COMPILER_IS_GNUCXX)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")

				  else()

				@ -642,5 +649,12 @@ if (BUILD_BINARY)

				  add_subdirectory(binaries)

				endif()

				# ---[ JNI

				if (BUILD_JNI)

				  set(BUILD_LIBTORCH_WITH_JNI 1)

				  set(FBJNI_SKIP_TESTS 1)

				  add_subdirectory(android/pytorch_android)

				endif()

				include(cmake/Summary.cmake)

				caffe2_print_configuration_summary()

14

CODEOWNERS

View File

 @ -4,10 +4,10 @@
 /docs/cpp @goldsborough @ebetica @yf225
 /torch/csrc/api/ @ebetica @goldsborough @yf225
 /test/cpp/api/ @ebetica @goldsborough @yf225
 /torch/lib/c10d/ @pietern @mrshenli
 /torch/csrc/distributed/ @pietern @mrshenli
 /torch/distributed/ @apaszke @pietern @mrshenli
 /test/test_c10d.py @pietern @mrshenli
 /torch/lib/c10d/ @pietern @mrshenli @zhaojuanmao
 /torch/csrc/distributed/ @pietern @mrshenli @zhaojuanmao
 /torch/distributed/ @apaszke @pietern @mrshenli @zhaojuanmao
 /test/test_c10d.py @pietern @mrshenli @zhaojuanmao
 /torch/utils/cpp_extension.py @goldsborough @fmassa @soumith @ezyang
 # Not there to stricly require the approval, but to be tagged as a reviewer
 @ -19,3 +19,9 @@
 /torch/autograd/ @apaszke
 /torch/jit/ @apaszke
 /torch/utils/data/ @apaszke
 # Distributed RPC Framework.
 /torch/csrc/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/csrc/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/distributed/rpc @mrshenli @pritamdamania87 @zhaojuanmao
 /torch/distributed/autograd @mrshenli @pritamdamania87 @zhaojuanmao

									
										89

CONTRIBUTING.md
									
												View File
												
				@ -216,8 +216,10 @@ To build the documentation:

				cd docs

				pip install -r requirements.txt

				# `katex` must also be available in your PATH.

				# If you are using Ubuntu or Debian, you can install it with:

				# sudo apt install katex

				# You can either install katex globally if you have properly configured npm:

				# npm install -g katex

				# Or if you prefer an uncontaminated global executable environment or do not want to go through the node configuration:

				# npm install katex && export PATH="$PATH:$(pwd)/node_modules/.bin"

				```

				3. Generate the documentation HTML files. The generated files will be in `docs/build/html`.

				@ -284,6 +286,57 @@ cd docs

				make doctest

				```

				## Profiling with `py-spy`

				Evaluating the performance impact of code changes in PyTorch can be complicated,

				particularly if code changes happen in compiled code. One simple way to profile

				both Python and C++ code in PyTorch is to use

				[`py-spy`](https://github.com/benfred/py-spy), a sampling profiler for Python

				that has the ability to profile native code and Python code in the same session.

				`py-spy` can be installed via `pip`:

				```bash

				$ pip install py-spy

				```

				To use `py-spy`, first write a Python test script that exercises the

				functionality you would like to profile. For example, this script profiles

				`torch.add`:

				```python

				import torch

				t1 = torch.tensor([[1, 1], [1, 1.]])

				t2 = torch.tensor([[0, 0], [0, 0.]])

				for _ in range(1000000):

				    torch.add(t1, t2)

				```

				Since the `torch.add` operation happens in microseconds, we repeat it a large

				number of times to get good statistics. The most straightforward way to use

				`py-spy` with such a script is to generate a [flame

				graph](http://www.brendangregg.com/flamegraphs.html):

				```bash

				$ py-spy record -o profile.svg --native -- python test_tensor_tensor_add.py

				```

				This will output a file named `profile.svg` containing a flame graph you can

				view in a web browser or SVG viewer. Individual stack frame entries in the graph

				can be selected interactively with your mouse to zoom in on a particular part of

				the program execution timeline. The `--native` command-line option tells

				`py-spy` to record stack frame entries for PyTorch C++ code. To get line numbers

				for C++ code it may be necessary to compile PyTorch in debug mode by prepending

				your `setup.py develop` call to compile PyTorch with `DEBUG=1`. Depending on

				your operating system it may also be necessary to run `py-spy` with root

				privileges.

				`py-spy` can also work in an `htop`-like "live profiling" mode and can be

				tweaked to adjust the stack sampling rate, see the `py-spy` readme for more

				details.

				## Managing Multiple Build Trees

				One downside to using `python setup.py develop` is that your development

				@ -438,6 +491,38 @@ ccache -F 0

				# deploy (and add to ~/.bashrc for later)

				export PATH="/usr/lib/ccache:$PATH"

				```

				It is also possible to install `ccache` via `conda` by installing it from the

				community-maintained `conda-forge` channel. Here is how to set up `ccache` this

				way:

				```bash

				# install ccache

				conda install -c conda-forge ccache

				# set up ccache compiler symlinks

				mkdir ~/ccache

				mkdir ~/ccache/lib

				mkdir ~/ccache/cuda

				ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/cc

				ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/c++

				ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/gcc

				ln -s $CONDA_PREFIX/bin/ccache ~/ccache/lib/g++

				ln -s $CONDA_PREFIX/bin/ccache ~/ccache/cuda/nvcc

				# update PATH to reflect symlink locations, consider

				# adding this to your .bashrc

				export PATH=~/ccache/lib:$PATH

				export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc

				# increase ccache cache size to 25 GiB

				ccache -M 25Gi

				```

				To check this is working, do two clean builds of pytorch in a row. The second

				build should be substantially and noticeably faster than the first build.

				#### Use a faster linker

				If you are editing a single file and rebuilding in a tight loop, the time spent

				linking will dominate. The system linker available in most Linux distributions

									
										22

README.md
									
												View File
												
				@ -44,7 +44,7 @@ At a granular level, PyTorch is a library that consists of the following compone

				| [**torch.multiprocessing**](https://pytorch.org/docs/stable/multiprocessing.html) | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |

				| [**torch.utils**](https://pytorch.org/docs/stable/data.html) | DataLoader and other utility functions for convenience |

				Usually one uses PyTorch either as:

				Usually PyTorch is used either as:

				- a replacement for NumPy to use the power of GPUs.

				- a deep learning research platform that provides maximum flexibility and speed.

				@ -88,7 +88,7 @@ You get the best of speed and flexibility for your crazy research.

				PyTorch is not a Python binding into a monolithic C++ framework.

				It is built to be deeply integrated into Python.

				You can use it naturally like you would use [NumPy](http://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](http://scikit-learn.org) etc.

				You can use it naturally like you would use [NumPy](https://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](https://scikit-learn.org) etc.

				You can write your new neural network layers in Python itself, using your favorite libraries

				and use packages such as Cython and Numba.

				Our goal is to not reinvent the wheel where appropriate.

				@ -124,7 +124,7 @@ You can write new neural network layers in Python using the torch API

				[or your favorite NumPy-based libraries such as SciPy](https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).

				If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.

				There is no wrapper code that needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				No wrapper code needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				## Installation

				@ -145,12 +145,12 @@ Python wheels for NVIDIA's Jetson Nano, Jetson TX2, and Jetson AGX Xavier are av

				  - Python 2.7: https://nvidia.box.com/v/torch-weekly-cp27-jetson-jp42

				  - Python 3.6: https://nvidia.box.com/v/torch-weekly-cp36-jetson-jp42

				They requires JetPack 4.2 and above and are maintained by @dusty-nv

				They require JetPack 4.2 and above, and @dusty-nv maintains them

				### From Source

				If you are installing from source, we highly recommend installing an [Anaconda](https://www.anaconda.com/distribution/#download-section) environment.

				If you are installing from source, you will need a C++14 compiler. Also, we highly recommend installing an [Anaconda](https://www.anaconda.com/distribution/#download-section) environment.

				You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

				Once you have [Anaconda](https://www.anaconda.com/distribution/#download-section) installed, here are the instructions.

				@ -167,7 +167,7 @@ If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xa

				#### Install Dependencies

				Common

				Common (only install `typing` for Python <3.5)

				```

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing

				```

				@ -175,7 +175,7 @@ conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing

				On Linux

				```bash

				# Add LAPACK support for the GPU if needed

				conda install -c pytorch magma-cuda90 # or [magma-cuda92 | magma-cuda100 ] depending on your cuda version

				conda install -c pytorch magma-cuda90 # or [magma-cuda92 | magma-cuda100 | magma-cuda101 ] depending on your cuda version

				```

				#### Get the PyTorch Source

				@ -209,13 +209,13 @@ If the version of Visual Studio 2017 is higher than 15.4.5, installing of "VC++

				<br/> There is no guarantee of the correct building with VC++ 2017 toolsets, others than version 15.4 v14.11.

				<br/> "VC++ 2017 version 15.4 v14.11 toolset" might be installed onto already installed Visual Studio 2017 by running its installation once again and checking the corresponding checkbox under "Individual components"/"Compilers, build tools, and runtimes".

				NVTX is a part of CUDA distributive, where it is called "Nsight Compute". For installing it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.

				NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox.

				Be sure that CUDA with Nsight Compute is installed after Visual Studio 2017.

				Currently VS 2017, VS 2019 and Ninja are supported as the generator of CMake. If `ninja.exe` is detected in `PATH`, then Ninja will be used as the default generator, otherwise it will use VS 2017.

				<br/> If Ninja is selected as the generator, the latest MSVC which is newer than VS 2015 (14.0) will get selected as the underlying toolchain if you have Python > 3.5, otherwise VS 2015 will be selected so you'll have to activate the environment. If you use CMake <= 3.14.2 and has VS 2019 installed, then even if you specify VS 2017 as the generator, VS 2019 will get selected as the generator.

				CUDA and MSVC has strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).

				CUDA and MSVC have strong version dependencies, so even if you use VS 2017 / 2019, you will get build errors like `nvcc fatal : Host compiler targets unsupported OS`. For this kind of problem, please install the corresponding VS toolchain in the table below and then you can either specify the toolset during activation (recommended) or set `CUDAHOSTCXX` to override the cuda host compiler (not recommended if there are big version differences).

				| CUDA version | Newest supported VS version                             |

				| ------------ | ------------------------------------------------------- |

				@ -234,7 +234,7 @@ set FORCE_PY27_BUILD=1

				:: Note: This value is useless if Ninja is detected. However, you can force that by using `set USE_NINJA=OFF`.

				set CMAKE_GENERATOR=Visual Studio 15 2017

				:: Read the content in the previous section carefully before you preceed.

				:: Read the content in the previous section carefully before you proceed.

				:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.

				:: "Visual Studio 2017 Developer Command Prompt" will be run automatically.

				:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.

				@ -331,7 +331,7 @@ Sending a PR without discussion might end up resulting in a rejected PR, because

				PyTorch is a community driven project with several skillful engineers and researchers contributing to it.

				PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [Sam Gross](https://github.com/colesbury), [Soumith Chintala](http://soumith.ch) and [Gregory Chanan](https://github.com/gchanan) with major contributions coming from hundreds of talented individuals in various forms and means.

				A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

				A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

				Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.

10

android/.gitignore vendored

View File

 @ -6,11 +6,5 @@ gradle/wrapper
 .idea/*
 .externalNativeBuild
 build
 pytorch_android/src/main/cpp/libtorch_include/x86/**
 pytorch_android/src/main/cpp/libtorch_include/x86_64/**
 pytorch_android/src/main/cpp/libtorch_include/armeabi-v7a/**
 pytorch_android/src/main/cpp/libtorch_include/arm64-v8a/**
 pytorch_android/src/main/jniLibs/x86/**
 pytorch_android/src/main/jniLibs/x86_64/**
 pytorch_android/src/main/jniLibs/armeabi-v7a/**
 pytorch_android/src/main/jniLibs/arm64-v8a/**
 pytorch_android/src/main/cpp/libtorch_include/**
 pytorch_android/src/main/jniLibs/**

									
										118

android/README.md
									
										Normal file
									
												View File
												
				@ -0,0 +1,118 @@

				# Android

				## Demo applications and tutorials

				Demo applications with code walk-through can be find in [this github repo](https://github.com/pytorch/android-demo-app).

				## Publishing

				##### Release

				Release artifacts are published to jcenter:

				```

				repositories {

				    jcenter()

				}

				dependencies {

				    implementation 'org.pytorch:pytorch_android:1.3.0'

				    implementation 'org.pytorch:pytorch_android_torchvision:1.3.0'

				}

				```

				##### Nightly

				Nightly(snapshots) builds are published every night from `master` branch to [nexus sonatype snapshots repository](https://oss.sonatype.org/#nexus-search;quick~pytorch_android)

				To use them repository must be specified explicitly:

				```

				repositories {

				    maven {

				        url "https://oss.sonatype.org/content/repositories/snapshots"

				    }

				}

				dependencies {

				    ...

				    implementation 'org.pytorch:pytorch_android:1.4.0-SNAPSHOT'

				    implementation 'org.pytorch:pytorch_android_torchvision:1.4.0-SNAPSHOT'

				    ...

				}

				```

				The current nightly(snapshots) version is the value of `VERSION_NAME` in `gradle.properties` in current folder, at this moment it is `1.4.0-SNAPSHOT`.

				## Building PyTorch Android from Source

				In some cases you might want to use a local build of pytorch android, for example you may build custom libtorch binary with another set of operators or to make local changes.

				For this you can use `./scripts/build_pytorch_android.sh` script.

				```

				git clone https://github.com/pytorch/pytorch.git

				cd pytorch

				sh ./scripts/build_pytorch_android.sh

				```

				The workflow contains several steps:

				1\. Build libtorch for android for all 4 android abis (armeabi-v7a, arm64-v8a, x86, x86_64)

				2\. Create symbolic links to the results of those builds:

				`android/pytorch_android/src/main/jniLibs/${abi}` to the directory with output libraries

				`android/pytorch_android/src/main/cpp/libtorch_include/${abi}` to the directory with headers. These directories are used to build `libpytorch.so` library that will be loaded on android device.

				3\. And finally run `gradle` in `android/pytorch_android` directory with task `assembleRelease`

				Script requires that Android SDK, Android NDK and gradle are installed.

				They are specified as environment variables:

				`ANDROID_HOME` - path to [Android SDK](https://developer.android.com/studio/command-line/sdkmanager.html)

				`ANDROID_NDK` - path to [Android NDK](https://developer.android.com/studio/projects/install-ndk)

				`GRADLE_HOME` - path to [gradle](https://gradle.org/releases/)

				After successful build you should see the result as aar file:

				```

				$ find pytorch_android/build/ -type f -name *aar

				pytorch_android/build/outputs/aar/pytorch_android.aar

				pytorch_android_torchvision/build/outputs/aar/pytorch_android.aar

				libs/fbjni_local/build/outputs/aar/pytorch_android_fbjni.aar

				```

				It can be used directly in android projects, as a gradle dependency:

				```

				allprojects {

				    repositories {

				        flatDir {

				            dirs 'libs'

				        }

				    }

				}

				android {

				    ...

				    packagingOptions {

				        pickFirst "**/libfbjni.so"

				    }

				    ...

				}

				dependencies {

				    implementation(name:'pytorch_android', ext:'aar')

				    implementation(name:'pytorch_android_torchvision', ext:'aar')

				    implementation(name:'pytorch_android_fbjni', ext:'aar')

				}

				```

				At the moment for the case of using aar files directly we need additional configuration due to packaging specific (`libfbjni.so` is packaged in both `pytorch_android_fbjni.aar` and `pytorch_android.aar`).

				```

				packagingOptions {

				    pickFirst "**/libfbjni.so"

				}

				```

				## More Details

				You can find more details about the PyTorch Android API in the [Javadoc](https://pytorch.org/docs/stable/packages.html).

									
										58

android/build.gradle
									
												View File
												
				@ -1,33 +1,33 @@

				buildscript {

				    ext {

				        minSdkVersion = 21

				        targetSdkVersion = 28

				        compileSdkVersion = 28

				        buildToolsVersion = '28.0.3'

				        coreVersion = "1.2.0"

				        extJUnitVersion = "1.1.1"

				        runnerVersion = "1.2.0"

				        rulesVersion = "1.2.0"

				        junitVersion = "4.12"

				    }

				    repositories {

				        google()

				        mavenLocal()

				        mavenCentral()

				        jcenter()

				    }

				    dependencies {

				        classpath 'com.android.tools.build:gradle:3.3.2'

				        classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"

				        classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"

				        classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"

				    }

				}

				allprojects {

				    buildscript {

				        ext {

				            minSdkVersion = 21

				            targetSdkVersion = 28

				            compileSdkVersion = 28

				            buildToolsVersion = '28.0.3'

				            coreVersion = "1.2.0"

				            extJUnitVersion = "1.1.1"

				            runnerVersion = "1.2.0"

				            rulesVersion = "1.2.0"

				            junitVersion = "4.12"

				        }

				        repositories {

				            google()

				            mavenLocal()

				            mavenCentral()

				            jcenter()

				        }

				        dependencies {

				            classpath 'com.android.tools.build:gradle:3.3.2'

				            classpath "com.jfrog.bintray.gradle:gradle-bintray-plugin:${GRADLE_BINTRAY_PLUGIN_VERSION}"

				            classpath "com.github.dcendents:android-maven-gradle-plugin:${ANDROID_MAVEN_GRADLE_PLUGIN_VERSION}"

				            classpath "org.jfrog.buildinfo:build-info-extractor-gradle:4.9.8"

				        }

				    }

				    repositories {

				        google()

				        jcenter()

									
										100

android/build_test_app.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,100 @@

				#!/bin/bash

				set -eux

				PYTORCH_DIR="$(cd $(dirname $0)/..; pwd -P)"

				PYTORCH_ANDROID_DIR=$PYTORCH_DIR/android

				WORK_DIR=$PYTORCH_DIR

				echo "PYTORCH_DIR:$PYTORCH_DIR"

				echo "WORK_DIR:$WORK_DIR"

				echo "ANDROID_HOME:$ANDROID_HOME"

				if [ ! -z "$ANDROID_HOME" ]; then

				  echo "ANDROID_HOME not set; please set it to Android sdk directory"

				fi

				if [ ! -d $ANDROID_HOME ]; then

				  echo "ANDROID_HOME not a directory; did you install it under $ANDROID_HOME?"

				  exit 1

				fi

				GRADLE_PATH=gradle

				GRADLE_NOT_FOUND_MSG="Unable to find gradle, please add it to PATH or set GRADLE_HOME"

				if [ ! -x "$(command -v gradle)" ]; then

				  if [ -z "$GRADLE_HOME" ]; then

				    echo GRADLE_NOT_FOUND_MSG

				    exit 1

				  fi

				  GRADLE_PATH=$GRADLE_HOME/bin/gradle

				  if [ ! -f "$GRADLE_PATH" ]; then

				    echo GRADLE_NOT_FOUND_MSG

				    exit 1

				  fi

				fi

				echo "GRADLE_PATH:$GRADLE_PATH"

				ABIS_LIST="armeabi-v7a,arm64-v8a,x86,x86_64"

				CUSTOM_ABIS_LIST=false

				if [ $# -gt 0 ]; then

				  ABIS_LIST=$1

				  CUSTOM_ABIS_LIST=true

				fi

				echo "ABIS_LIST:$ABIS_LIST"

				LIB_DIR=$PYTORCH_ANDROID_DIR/pytorch_android/src/main/jniLibs

				INCLUDE_DIR=$PYTORCH_ANDROID_DIR/pytorch_android/src/main/cpp/libtorch_include

				mkdir -p $LIB_DIR

				rm -f $LIB_DIR/*

				mkdir -p $INCLUDE_DIR

				for abi in $(echo $ABIS_LIST | tr ',' '\n')

				do

				echo "abi:$abi"

				OUT_DIR=$WORK_DIR/build_android_$abi

				rm -rf $OUT_DIR

				mkdir -p $OUT_DIR

				pushd $PYTORCH_DIR

				python $PYTORCH_DIR/setup.py clean

				ANDROID_ABI=$abi BUILD_PYTORCH_MOBILE=1 VERBOSE=1 ANDROID_DEBUG_SYMBOLS=1 $PYTORCH_DIR/scripts/build_android.sh -DANDROID_CCACHE=$(which ccache)

				cp -R $PYTORCH_DIR/build_android/install/lib $OUT_DIR/

				cp -R $PYTORCH_DIR/build_android/install/include $OUT_DIR/

				echo "$abi build output lib,include copied to $OUT_DIR"

				LIB_LINK_PATH=$LIB_DIR/$abi

				INCLUDE_LINK_PATH=$INCLUDE_DIR/$abi

				rm -f $LIB_LINK_PATH

				rm -f $INCLUDE_LINK_PATH

				ln -s $OUT_DIR/lib $LIB_LINK_PATH

				ln -s $OUT_DIR/include $INCLUDE_LINK_PATH

				done

				# To set proxy for gradle add following lines to ./gradle/gradle.properties:

				# systemProp.http.proxyHost=...

				# systemProp.http.proxyPort=8080

				# systemProp.https.proxyHost=...

				# systemProp.https.proxyPort=8080

				if [ "$CUSTOM_ABIS_LIST" = true ]; then

				  NDK_DEBUG=1 $GRADLE_PATH -PnativeLibsDoNotStrip=true -PABI_FILTERS=$ABIS_LIST -p $PYTORCH_ANDROID_DIR clean test_app:assembleDebug

				else

				  NDK_DEBUG=1 $GRADLE_PATH -PnativeLibsDoNotStrip=true -p $PYTORCH_ANDROID_DIR clean test_app:assembleDebug

				fi

				find $PYTORCH_ANDROID_DIR -type f -name *apk

				find $PYTORCH_ANDROID_DIR -type f -name *apk | xargs echo "To install apk run: $ANDROID_HOME/platform-tools/adb install -r "

				popd

									
										7

android/gradle.properties
									
												View File
												
				@ -1,6 +1,6 @@

				ABI_FILTERS=armeabi-v7a,arm64-v8a,x86,x86_64

				VERSION_NAME=0.0.7-SNAPSHOT

				VERSION_NAME=1.4.0-SNAPSHOT

				GROUP=org.pytorch

				MAVEN_GROUP=org.pytorch

				POM_URL=https://github.com/pytorch/pytorch/tree/master/android

				@ -22,3 +22,8 @@ ANDROID_MAVEN_GRADLE_PLUGIN_VERSION=2.1

				# Gradle internals

				org.gradle.internal.repository.max.retries=1

				org.gradle.jvmargs=-XX:MaxMetaspaceSize=1024m

				android.useAndroidX=true

				android.enableJetifier=true

				nativeLibsDoNotStrip=false

									
										18

android/gradle/gradle_maven_push.gradle
									
												View File
												
				@ -25,6 +25,18 @@ def getRepositoryPassword() {

				  return hasProperty('SONATYPE_NEXUS_PASSWORD') ? SONATYPE_NEXUS_PASSWORD : ""

				}

				def getHttpProxyHost() {

				  return project.properties['systemProp.http.proxyHost']

				}

				def getHttpProxyPort() {

				  return project.properties['systemProp.http.proxyPort']

				}

				def needProxy() {

				  return (getHttpProxyHost() != null) && (getHttpProxyPort() != null)

				}

				afterEvaluate { project ->

				  uploadArchives {

				    repositories {

				@ -37,9 +49,15 @@ afterEvaluate { project ->

				        repository(url: getReleaseRepositoryUrl()) {

				          authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())

				          if (needProxy()) {

				            proxy(host: getHttpProxyHost(), port: getHttpProxyPort() as Integer, type: 'http')

				          }

				        }

				        snapshotRepository(url: getSnapshotRepositoryUrl()) {

				          authentication(userName: getRepositoryUsername(), password: getRepositoryPassword())

				          if (needProxy()) {

				            proxy(host: getHttpProxyHost(), port: getHttpProxyPort() as Integer, type: 'http')

				          }

				        }

				        pom.project {

2

android/libs/fbjni

Submodule android/libs/fbjni updated: dc916917e1...f908b58be4

									
										6

android/libs/fbjni_local/build.gradle
									
												View File
												
				@ -11,12 +11,15 @@ android {

				        sourceSets {

				            main {

				                manifest.srcFile '../fbjni/ApplicationManifest.xml'

				                manifest.srcFile '../fbjni/java/com/facebook/jni/AndroidManifest.xml'

				                java {

				                    srcDir '../fbjni/java'

				                }

				            }

				        }

				        ndk {

				            abiFilters ABI_FILTERS.split(",")

				        }

				    }

				    buildTypes {

				        debug {

				@ -35,6 +38,7 @@ android {

				dependencies {

				    compileOnly 'com.google.code.findbugs:jsr305:3.0.1'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				}

				apply from: rootProject.file('gradle/release.gradle')

									
										121

android/pytorch_android/CMakeLists.txt
									
												View File
												
				@ -1,63 +1,110 @@

				cmake_minimum_required(VERSION 3.4.1)

				project(pytorch CXX)

				project(pytorch_jni CXX)

				set(CMAKE_CXX_STANDARD 11)

				set(CMAKE_VERBOSE_MAKEFILE ON)

				set(TRACE_ENABLED OFF)

				if(DEFINED ENV{TRACE_ENABLED})

				  if($ENV{TRACE_ENABLED} STREQUAL "1")

				    message(STATUS "TRACE_ENABLED ON")

				    set(TRACE_ENABLED ON)

				  endif()

				endif()

				if(NOT TRACE_ENABLED)

				  message(STATUS "TRACE_ENABLED OFF")

				endif()

				set(pytorch_android_DIR ${CMAKE_CURRENT_LIST_DIR}/src/main/cpp)

				set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})

				if (ANDROID_ABI)

				  set(libtorch_include_DIR ${pytorch_android_DIR}/libtorch_include/${ANDROID_ABI})

				  set(BUILD_SUBDIR ${ANDROID_ABI})

				elseif(BUILD_LIBTORCH_WITH_JNI)

				  # Don't need LIBTORCH_HOME if we're building from within PyTorch.

				else()

				  # Building against a pre-built libtorch.

				  if (NOT LIBTORCH_HOME)

				    message(FATAL_ERROR

				      "pytorch_android requires LIBTORCH_HOME to be defined for non-Android builds.")

				  endif()

				  set(libtorch_include_DIR ${LIBTORCH_HOME}/include)

				  link_directories(${LIBTORCH_HOME}/lib)

				  set(BUILD_SUBDIR host)

				endif()

				message(STATUS "libtorch dir:${libtorch_DIR}")

				configure_file(

				    ${pytorch_android_DIR}/cmake_macros.h.in

				    ${pytorch_android_DIR}/cmake_macros.h)

				file(GLOB pytorch_android_SOURCES

				  ${pytorch_android_DIR}/*.cpp

				  ${pytorch_android_DIR}/pytorch_jni_jit.cpp

				  ${pytorch_android_DIR}/pytorch_jni_common.cpp

				  ${pytorch_android_DIR}/pytorch_jni_common.h

				)

				add_library(pytorch SHARED

				add_library(pytorch_jni SHARED

				    ${pytorch_android_SOURCES}

				)

				target_compile_options(pytorch PRIVATE

				target_compile_options(pytorch_jni PRIVATE

				  -fexceptions

				)

				target_include_directories(pytorch PUBLIC

				target_include_directories(pytorch_jni PUBLIC

				    ${libtorch_include_DIR}

				)

				set(BUILD_DIR ${CMAKE_SOURCE_DIR}/build)

				file(MAKE_DIRECTORY ${BUILD_DIR})

				set(fbjni_DIR ${CMAKE_CURRENT_LIST_DIR}/../libs/fbjni/)

				set(fbjni_BUILD_DIR ${BUILD_DIR}/fbjni/${ANDROID_ABI})

				set(fbjni_BUILD_DIR ${CMAKE_BINARY_DIR}/fbjni/${BUILD_SUBDIR})

				add_subdirectory(${fbjni_DIR} ${fbjni_BUILD_DIR})

				function(import_static_lib name)

				  add_library(${name} STATIC IMPORTED)

				  set_property(

				      TARGET ${name}

				      PROPERTY IMPORTED_LOCATION

				      ${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)

				endfunction(import_static_lib)

				if (ANDROID_ABI)

				import_static_lib(libtorch)

				import_static_lib(libc10)

				import_static_lib(libnnpack)

				import_static_lib(libpytorch_qnnpack)

				import_static_lib(libeigen_blas)

				import_static_lib(libcpuinfo)

				import_static_lib(libclog)

				  function(import_static_lib name)

				    add_library(${name} STATIC IMPORTED)

				    set_property(

				        TARGET ${name}

				        PROPERTY IMPORTED_LOCATION

				        ${CMAKE_CURRENT_LIST_DIR}/src/main/jniLibs/${ANDROID_ABI}/${name}.a)

				  endfunction(import_static_lib)

				target_link_libraries(pytorch

				    fbjni

				    -Wl,--gc-sections

				    -Wl,--whole-archive

				    libtorch

				    -Wl,--no-whole-archive

				    libc10

				    libnnpack

				    libpytorch_qnnpack

				    libeigen_blas

				    libcpuinfo

				    libclog

				)

				  import_static_lib(libtorch)

				  import_static_lib(libc10)

				  import_static_lib(libnnpack)

				  import_static_lib(libpytorch_qnnpack)

				  import_static_lib(libeigen_blas)

				  import_static_lib(libcpuinfo)

				  import_static_lib(libclog)

				    # Link most things statically on Android.

				  target_link_libraries(pytorch_jni

				      fbjni

				      -Wl,--gc-sections

				      -Wl,--whole-archive

				      libtorch

				      -Wl,--no-whole-archive

				      libc10

				      libnnpack

				      libpytorch_qnnpack

				      libeigen_blas

				      libcpuinfo

				      libclog

				  )

				else()

				  # Prefer dynamic linking on the host

				  target_link_libraries(pytorch_jni

				      fbjni

				      torch

				      c10

				      nnpack

				      pytorch_qnnpack

				      cpuinfo

				      clog

				  )

				endif()

									
										10

android/pytorch_android/build.gradle
									
												View File
												
				@ -27,6 +27,10 @@ android {

				    }

				    sourceSets {

				        main {

				            java {

				                exclude 'org/pytorch/LiteModuleLoader.java'

				                exclude 'org/pytorch/LiteNativePeer.java'

				            }

				            jniLibs.srcDirs = ['src/main/jniLibs']

				        }

				    }

				@ -42,6 +46,10 @@ android {

				        } else {

				            pickFirst '**/libfbjni.so'

				        }

				        if (nativeLibsDoNotStrip.toBoolean()) {

				            doNotStrip "**/*.so"

				            logger.warn('WARNING: nativeLibsDoNotStrip==true; debug symbols included')

				        }

				    }

				    useLibrary 'android.test.runner'

				@ -53,6 +61,7 @@ dependencies {

				    api project(':fbjni')

				    implementation 'com.android.support:appcompat-v7:28.0.0'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    testImplementation 'junit:junit:' + rootProject.junitVersion

				    testImplementation 'androidx.test:core:' + rootProject.coreVersion

				@ -72,4 +81,3 @@ task sourcesJar(type: Jar) {

				}

				artifacts.add('archives', sourcesJar)

									
										36

android/pytorch_android/host/build.gradle
									
										Normal file
									
												View File
												
				@ -0,0 +1,36 @@

				// Copyright (c) Facebook, Inc. and its affiliates.

				//

				// This source code is licensed under the Apache-2 license found in the

				// LICENSE file in the root directory of this source tree.

				plugins {

				    id 'java-library'

				}

				repositories {

				    mavenLocal()

				    jcenter()

				}

				sourceSets {

				    main {

				        java.srcDir '../src/main/java'

				    }

				    test {

				        java {

				            srcDir '../src/androidTest/java'

				            exclude '**/PytorchInstrumented*'

				        }

				        resources.srcDirs = ["../src/androidTest/assets"]

				    }

				}

				dependencies {

				    compileOnly 'com.google.code.findbugs:jsr305:3.0.1'

				    implementation 'com.facebook.soloader:nativeloader:0.8.0'

				    implementation 'com.facebook.fbjni:fbjni-java-only:0.0.3'

				    testImplementation 'junit:junit:4.12'

				}

				apply from: rootProject.file('gradle/release.gradle')

									
										4

android/pytorch_android/host/gradle.properties
									
										Normal file
									
												View File
												
				@ -0,0 +1,4 @@

				POM_NAME=pytorch_java_only pytorch java api

				POM_DESCRIPTION=pytorch_java_only pytorch java api

				POM_ARTIFACT_ID=pytorch_java_only

				POM_PACKAGING=jar

BIN
android/pytorch_android/src/androidTest/assets/test.pt

View File

Binary file not shown.

									
										22

android/pytorch_android/src/androidTest/java/org/pytorch/PytorchHostTests.java
									
										Normal file
									
												View File
												
				@ -0,0 +1,22 @@

				package org.pytorch;

				import org.junit.BeforeClass;

				import java.io.IOException;

				import java.io.InputStream;

				import java.nio.file.Files;

				import java.nio.file.Path;

				import java.nio.file.StandardCopyOption;

				import java.util.Objects;

				public class PytorchHostTests extends PytorchTestBase {

				  @Override

				  protected String assetFilePath(String assetName) throws IOException {

				    Path tempFile = Files.createTempFile("test", ".pt");

				    try (InputStream resource = Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("test.pt"))) {

				      Files.copy(resource, tempFile, StandardCopyOption.REPLACE_EXISTING);

				    }

				    return tempFile.toAbsolutePath().toString();

				  }

				}

									
										287

android/pytorch_android/src/androidTest/java/org/pytorch/PytorchInstrumentedTests.java
									
												View File
												
				@ -2,8 +2,6 @@ package org.pytorch;

				import android.content.Context;

				import org.junit.Before;

				import org.junit.Test;

				import org.junit.runner.RunWith;

				import java.io.File;

				@ -11,294 +9,15 @@ import java.io.FileOutputStream;

				import java.io.IOException;

				import java.io.InputStream;

				import java.io.OutputStream;

				import java.util.HashMap;

				import java.util.Map;

				import androidx.test.ext.junit.runners.AndroidJUnit4;

				import androidx.test.platform.app.InstrumentationRegistry;

				import static org.junit.Assert.assertArrayEquals;

				import static org.junit.Assert.assertFalse;

				import static org.junit.Assert.assertNotNull;

				import static org.junit.Assert.assertTrue;

				@RunWith(AndroidJUnit4.class)

				public class PytorchInstrumentedTests {

				public class PytorchInstrumentedTests extends PytorchTestBase {

				  private static final String TEST_MODULE_ASSET_NAME = "test.pt";

				  @Before

				  public void setUp() {

				    System.loadLibrary("pytorch");

				  }

				  @Test

				  public void testForwardNull() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final IValue input =

				        IValue.tensor(Tensor.newInt8Tensor(new long[] {1}, Tensor.allocateByteBuffer(1)));

				    assertTrue(input.isTensor());

				    final IValue output = module.forward(input);

				    assertTrue(output.isNull());

				  }

				  @Test

				  public void testEqBool() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (boolean value : new boolean[] {false, true}) {

				      final IValue input = IValue.bool(value);

				      assertTrue(input.isBool());

				      assertTrue(value == input.getBool());

				      final IValue output = module.runMethod("eqBool", input);

				      assertTrue(output.isBool());

				      assertTrue(value == output.getBool());

				    }

				  }

				  @Test

				  public void testEqInt() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (long value : new long[] {Long.MIN_VALUE, -1024, -1, 0, 1, 1024, Long.MAX_VALUE}) {

				      final IValue input = IValue.long64(value);

				      assertTrue(input.isLong());

				      assertTrue(value == input.getLong());

				      final IValue output = module.runMethod("eqInt", input);

				      assertTrue(output.isLong());

				      assertTrue(value == output.getLong());

				    }

				  }

				  @Test

				  public void testEqFloat() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    double[] values =

				        new double[] {

				          -Double.MAX_VALUE,

				          Double.MAX_VALUE,

				          -Double.MIN_VALUE,

				          Double.MIN_VALUE,

				          -Math.exp(1.d),

				          -Math.sqrt(2.d),

				          -3.1415f,

				          3.1415f,

				          -1,

				          0,

				          1,

				        };

				    for (double value : values) {

				      final IValue input = IValue.double64(value);

				      assertTrue(input.isDouble());

				      assertTrue(value == input.getDouble());

				      final IValue output = module.runMethod("eqFloat", input);

				      assertTrue(output.isDouble());

				      assertTrue(value == output.getDouble());

				    }

				  }

				  @Test

				  public void testEqTensor() throws IOException {

				    final long[] inputTensorShape = new long[] {1, 3, 224, 224};

				    final long numElements = Tensor.numel(inputTensorShape);

				    final float[] inputTensorData = new float[(int) numElements];

				    for (int i = 0; i < numElements; ++i) {

				      inputTensorData[i] = i;

				    }

				    final Tensor inputTensor = Tensor.newFloat32Tensor(inputTensorShape, inputTensorData);

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final IValue input = IValue.tensor(inputTensor);

				    assertTrue(input.isTensor());

				    assertTrue(inputTensor == input.getTensor());

				    final IValue output = module.runMethod("eqTensor", input);

				    assertTrue(output.isTensor());

				    final Tensor outputTensor = output.getTensor();

				    assertNotNull(outputTensor);

				    assertArrayEquals(inputTensorShape, outputTensor.shape);

				    float[] outputData = outputTensor.getDataAsFloatArray();

				    for (int i = 0; i < numElements; i++) {

				      assertTrue(inputTensorData[i] == outputData[i]);

				    }

				  }

				  @Test

				  public void testEqDictIntKeyIntValue() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final Map<Long, IValue> inputMap = new HashMap<>();

				    inputMap.put(Long.MIN_VALUE, IValue.long64(-Long.MIN_VALUE));

				    inputMap.put(Long.MAX_VALUE, IValue.long64(-Long.MAX_VALUE));

				    inputMap.put(0l, IValue.long64(0l));

				    inputMap.put(1l, IValue.long64(-1l));

				    inputMap.put(-1l, IValue.long64(1l));

				    final IValue input = IValue.dictLongKey(inputMap);

				    assertTrue(input.isDictLongKey());

				    final IValue output = module.runMethod("eqDictIntKeyIntValue", input);

				    assertTrue(output.isDictLongKey());

				    final Map<Long, IValue> outputMap = output.getDictLongKey();

				    assertTrue(inputMap.size() == outputMap.size());

				    for (Map.Entry<Long, IValue> entry : inputMap.entrySet()) {

				      assertTrue(outputMap.get(entry.getKey()).getLong() == entry.getValue().getLong());

				    }

				  }

				  @Test

				  public void testEqDictStrKeyIntValue() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final Map<String, IValue> inputMap = new HashMap<>();

				    inputMap.put("long_min_value", IValue.long64(Long.MIN_VALUE));

				    inputMap.put("long_max_value", IValue.long64(Long.MAX_VALUE));

				    inputMap.put("long_0", IValue.long64(0l));

				    inputMap.put("long_1", IValue.long64(1l));

				    inputMap.put("long_-1", IValue.long64(-1l));

				    final IValue input = IValue.dictStringKey(inputMap);

				    assertTrue(input.isDictStringKey());

				    final IValue output = module.runMethod("eqDictStrKeyIntValue", input);

				    assertTrue(output.isDictStringKey());

				    final Map<String, IValue> outputMap = output.getDictStringKey();

				    assertTrue(inputMap.size() == outputMap.size());

				    for (Map.Entry<String, IValue> entry : inputMap.entrySet()) {

				      assertTrue(outputMap.get(entry.getKey()).getLong() == entry.getValue().getLong());

				    }

				  }

				  @Test

				  public void testListIntSumReturnTuple() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (int n : new int[] {0, 1, 128}) {

				      long[] a = new long[n];

				      long sum = 0;

				      for (int i = 0; i < n; i++) {

				        a[i] = i;

				        sum += a[i];

				      }

				      final IValue input = IValue.longList(a);

				      assertTrue(input.isLongList());

				      final IValue output = module.runMethod("listIntSumReturnTuple", input);

				      assertTrue(output.isTuple());

				      assertTrue(2 == output.getTuple().length);

				      IValue output0 = output.getTuple()[0];

				      IValue output1 = output.getTuple()[1];

				      assertArrayEquals(a, output0.getLongList());

				      assertTrue(sum == output1.getLong());

				    }

				  }

				  @Test

				  public void testOptionalIntIsNone() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    assertFalse(module.runMethod("optionalIntIsNone", IValue.long64(1l)).getBool());

				    assertTrue(module.runMethod("optionalIntIsNone", IValue.optionalNull()).getBool());

				  }

				  @Test

				  public void testIntEq0None() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    assertTrue(module.runMethod("intEq0None", IValue.long64(0l)).isNull());

				    assertTrue(module.runMethod("intEq0None", IValue.long64(1l)).getLong() == 1l);

				  }

				  @Test(expected = IllegalArgumentException.class)

				  public void testRunUndefinedMethod() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    module.runMethod("test_undefined_method_throws_exception");

				  }

				  @Test

				  public void testTensorMethods() {

				    long[] shape = new long[] {1, 3, 224, 224};

				    final int numel = (int) Tensor.numel(shape);

				    int[] ints = new int[numel];

				    float[] floats = new float[numel];

				    byte[] bytes = new byte[numel];

				    for (int i = 0; i < numel; i++) {

				      bytes[i] = (byte) ((i % 255) - 128);

				      ints[i] = i;

				      floats[i] = i / 1000.f;

				    }

				    Tensor tensorBytes = Tensor.newInt8Tensor(shape, bytes);

				    assertTrue(tensorBytes.dtype() == Tensor.DTYPE_INT8);

				    assertArrayEquals(bytes, tensorBytes.getDataAsByteArray());

				    Tensor tensorInts = Tensor.newInt32Tensor(shape, ints);

				    assertTrue(tensorInts.dtype() == Tensor.DTYPE_INT32);

				    assertArrayEquals(ints, tensorInts.getDataAsIntArray());

				    Tensor tensorFloats = Tensor.newFloat32Tensor(shape, floats);

				    assertTrue(tensorFloats.dtype() == Tensor.DTYPE_FLOAT32);

				    float[] floatsOut = tensorFloats.getDataAsFloatArray();

				    assertTrue(floatsOut.length == numel);

				    for (int i = 0; i < numel; i++) {

				      assertTrue(floats[i] == floatsOut[i]);

				    }

				  }

				  @Test(expected = IllegalStateException.class)

				  public void testTensorIllegalStateOnWrongType() {

				    long[] shape = new long[] {1, 3, 224, 224};

				    final int numel = (int) Tensor.numel(shape);

				    float[] floats = new float[numel];

				    Tensor tensorFloats = Tensor.newFloat32Tensor(shape, floats);

				    assertTrue(tensorFloats.dtype() == Tensor.DTYPE_FLOAT32);

				    tensorFloats.getDataAsByteArray();

				  }

				  @Test

				  public void testEqString() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    String[] values =

				        new String[] {

				            "smoketest",

				            "проверка не латинских символов", // not latin symbols check

				            "#@$!@#)($*!@#$)(!@*#$"

				        };

				    for (String value : values) {

				      final IValue input = IValue.string(value);

				      assertTrue(input.isString());

				      assertTrue(value.equals(input.getString()));

				      final IValue output = module.runMethod("eqStr", input);

				      assertTrue(output.isString());

				      assertTrue(value.equals(output.getString()));

				    }

				  }

				  @Test

				  public void testStr3Concat() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    String[] values =

				        new String[] {

				            "smoketest",

				            "проверка не латинских символов", // not latin symbols check

				            "#@$!@#)($*!@#$)(!@*#$"

				        };

				    for (String value : values) {

				      final IValue input = IValue.string(value);

				      assertTrue(input.isString());

				      assertTrue(value.equals(input.getString()));

				      final IValue output = module.runMethod("str3Concat", input);

				      assertTrue(output.isString());

				      String expectedOutput = new StringBuilder().append(value).append(value).append(value).toString();

				      assertTrue(expectedOutput.equals(output.getString()));

				    }

				  }

				  private static String assetFilePath(String assetName) throws IOException {

				  @Override

				  protected String assetFilePath(String assetName) throws IOException {

				    final Context appContext = InstrumentationRegistry.getInstrumentation().getTargetContext();

				    File file = new File(appContext.getFilesDir(), assetName);

				    if (file.exists() && file.length() > 0) {

									
										285

android/pytorch_android/src/androidTest/java/org/pytorch/PytorchTestBase.java
									
										Normal file
									
												View File
												
				@ -0,0 +1,285 @@

				package org.pytorch;

				import org.junit.Before;

				import org.junit.Test;

				import java.io.IOException;

				import java.util.HashMap;

				import java.util.Map;

				import static org.junit.Assert.assertArrayEquals;

				import static org.junit.Assert.assertFalse;

				import static org.junit.Assert.assertNotNull;

				import static org.junit.Assert.assertTrue;

				public abstract class PytorchTestBase {

				  private static final String TEST_MODULE_ASSET_NAME = "test.pt";

				  @Test

				  public void testForwardNull() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final IValue input =

				        IValue.from(Tensor.fromBlob(Tensor.allocateByteBuffer(1), new long[] {1}));

				    assertTrue(input.isTensor());

				    final IValue output = module.forward(input);

				    assertTrue(output.isNull());

				  }

				  @Test

				  public void testEqBool() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (boolean value : new boolean[] {false, true}) {

				      final IValue input = IValue.from(value);

				      assertTrue(input.isBool());

				      assertTrue(value == input.toBool());

				      final IValue output = module.runMethod("eqBool", input);

				      assertTrue(output.isBool());

				      assertTrue(value == output.toBool());

				    }

				  }

				  @Test

				  public void testEqInt() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (long value : new long[] {Long.MIN_VALUE, -1024, -1, 0, 1, 1024, Long.MAX_VALUE}) {

				      final IValue input = IValue.from(value);

				      assertTrue(input.isLong());

				      assertTrue(value == input.toLong());

				      final IValue output = module.runMethod("eqInt", input);

				      assertTrue(output.isLong());

				      assertTrue(value == output.toLong());

				    }

				  }

				  @Test

				  public void testEqFloat() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    double[] values =

				        new double[] {

				            -Double.MAX_VALUE,

				            Double.MAX_VALUE,

				            -Double.MIN_VALUE,

				            Double.MIN_VALUE,

				            -Math.exp(1.d),

				            -Math.sqrt(2.d),

				            -3.1415f,

				            3.1415f,

				            -1,

				            0,

				            1,

				        };

				    for (double value : values) {

				      final IValue input = IValue.from(value);

				      assertTrue(input.isDouble());

				      assertTrue(value == input.toDouble());

				      final IValue output = module.runMethod("eqFloat", input);

				      assertTrue(output.isDouble());

				      assertTrue(value == output.toDouble());

				    }

				  }

				  @Test

				  public void testEqTensor() throws IOException {

				    final long[] inputTensorShape = new long[] {1, 3, 224, 224};

				    final long numElements = Tensor.numel(inputTensorShape);

				    final float[] inputTensorData = new float[(int) numElements];

				    for (int i = 0; i < numElements; ++i) {

				      inputTensorData[i] = i;

				    }

				    final Tensor inputTensor = Tensor.fromBlob(inputTensorData, inputTensorShape);

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final IValue input = IValue.from(inputTensor);

				    assertTrue(input.isTensor());

				    assertTrue(inputTensor == input.toTensor());

				    final IValue output = module.runMethod("eqTensor", input);

				    assertTrue(output.isTensor());

				    final Tensor outputTensor = output.toTensor();

				    assertNotNull(outputTensor);

				    assertArrayEquals(inputTensorShape, outputTensor.shape());

				    float[] outputData = outputTensor.getDataAsFloatArray();

				    for (int i = 0; i < numElements; i++) {

				      assertTrue(inputTensorData[i] == outputData[i]);

				    }

				  }

				  @Test

				  public void testEqDictIntKeyIntValue() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final Map<Long, IValue> inputMap = new HashMap<>();

				    inputMap.put(Long.MIN_VALUE, IValue.from(-Long.MIN_VALUE));

				    inputMap.put(Long.MAX_VALUE, IValue.from(-Long.MAX_VALUE));

				    inputMap.put(0l, IValue.from(0l));

				    inputMap.put(1l, IValue.from(-1l));

				    inputMap.put(-1l, IValue.from(1l));

				    final IValue input = IValue.dictLongKeyFrom(inputMap);

				    assertTrue(input.isDictLongKey());

				    final IValue output = module.runMethod("eqDictIntKeyIntValue", input);

				    assertTrue(output.isDictLongKey());

				    final Map<Long, IValue> outputMap = output.toDictLongKey();

				    assertTrue(inputMap.size() == outputMap.size());

				    for (Map.Entry<Long, IValue> entry : inputMap.entrySet()) {

				      assertTrue(outputMap.get(entry.getKey()).toLong() == entry.getValue().toLong());

				    }

				  }

				  @Test

				  public void testEqDictStrKeyIntValue() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    final Map<String, IValue> inputMap = new HashMap<>();

				    inputMap.put("long_min_value", IValue.from(Long.MIN_VALUE));

				    inputMap.put("long_max_value", IValue.from(Long.MAX_VALUE));

				    inputMap.put("long_0", IValue.from(0l));

				    inputMap.put("long_1", IValue.from(1l));

				    inputMap.put("long_-1", IValue.from(-1l));

				    final IValue input = IValue.dictStringKeyFrom(inputMap);

				    assertTrue(input.isDictStringKey());

				    final IValue output = module.runMethod("eqDictStrKeyIntValue", input);

				    assertTrue(output.isDictStringKey());

				    final Map<String, IValue> outputMap = output.toDictStringKey();

				    assertTrue(inputMap.size() == outputMap.size());

				    for (Map.Entry<String, IValue> entry : inputMap.entrySet()) {

				      assertTrue(outputMap.get(entry.getKey()).toLong() == entry.getValue().toLong());

				    }

				  }

				  @Test

				  public void testListIntSumReturnTuple() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    for (int n : new int[] {0, 1, 128}) {

				      long[] a = new long[n];

				      long sum = 0;

				      for (int i = 0; i < n; i++) {

				        a[i] = i;

				        sum += a[i];

				      }

				      final IValue input = IValue.listFrom(a);

				      assertTrue(input.isLongList());

				      final IValue output = module.runMethod("listIntSumReturnTuple", input);

				      assertTrue(output.isTuple());

				      assertTrue(2 == output.toTuple().length);

				      IValue output0 = output.toTuple()[0];

				      IValue output1 = output.toTuple()[1];

				      assertArrayEquals(a, output0.toLongList());

				      assertTrue(sum == output1.toLong());

				    }

				  }

				  @Test

				  public void testOptionalIntIsNone() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    assertFalse(module.runMethod("optionalIntIsNone", IValue.from(1l)).toBool());

				    assertTrue(module.runMethod("optionalIntIsNone", IValue.optionalNull()).toBool());

				  }

				  @Test

				  public void testIntEq0None() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    assertTrue(module.runMethod("intEq0None", IValue.from(0l)).isNull());

				    assertTrue(module.runMethod("intEq0None", IValue.from(1l)).toLong() == 1l);

				  }

				  @Test(expected = IllegalArgumentException.class)

				  public void testRunUndefinedMethod() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    module.runMethod("test_undefined_method_throws_exception");

				  }

				  @Test

				  public void testTensorMethods() {

				    long[] shape = new long[] {1, 3, 224, 224};

				    final int numel = (int) Tensor.numel(shape);

				    int[] ints = new int[numel];

				    float[] floats = new float[numel];

				    byte[] bytes = new byte[numel];

				    for (int i = 0; i < numel; i++) {

				      bytes[i] = (byte) ((i % 255) - 128);

				      ints[i] = i;

				      floats[i] = i / 1000.f;

				    }

				    Tensor tensorBytes = Tensor.fromBlob(bytes, shape);

				    assertTrue(tensorBytes.dtype() == DType.INT8);

				    assertArrayEquals(bytes, tensorBytes.getDataAsByteArray());

				    Tensor tensorInts = Tensor.fromBlob(ints, shape);

				    assertTrue(tensorInts.dtype() == DType.INT32);

				    assertArrayEquals(ints, tensorInts.getDataAsIntArray());

				    Tensor tensorFloats = Tensor.fromBlob(floats, shape);

				    assertTrue(tensorFloats.dtype() == DType.FLOAT32);

				    float[] floatsOut = tensorFloats.getDataAsFloatArray();

				    assertTrue(floatsOut.length == numel);

				    for (int i = 0; i < numel; i++) {

				      assertTrue(floats[i] == floatsOut[i]);

				    }

				  }

				  @Test(expected = IllegalStateException.class)

				  public void testTensorIllegalStateOnWrongType() {

				    long[] shape = new long[] {1, 3, 224, 224};

				    final int numel = (int) Tensor.numel(shape);

				    float[] floats = new float[numel];

				    Tensor tensorFloats = Tensor.fromBlob(floats, shape);

				    assertTrue(tensorFloats.dtype() == DType.FLOAT32);

				    tensorFloats.getDataAsByteArray();

				  }

				  @Test

				  public void testEqString() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    String[] values =

				        new String[] {

				            "smoketest",

				            "проверка не латинских символов", // not latin symbols check

				            "#@$!@#)($*!@#$)(!@*#$"

				        };

				    for (String value : values) {

				      final IValue input = IValue.from(value);

				      assertTrue(input.isString());

				      assertTrue(value.equals(input.toStr()));

				      final IValue output = module.runMethod("eqStr", input);

				      assertTrue(output.isString());

				      assertTrue(value.equals(output.toStr()));

				    }

				  }

				  @Test

				  public void testStr3Concat() throws IOException {

				    final Module module = Module.load(assetFilePath(TEST_MODULE_ASSET_NAME));

				    String[] values =

				        new String[] {

				            "smoketest",

				            "проверка не латинских символов", // not latin symbols check

				            "#@$!@#)($*!@#$)(!@*#$"

				        };

				    for (String value : values) {

				      final IValue input = IValue.from(value);

				      assertTrue(input.isString());

				      assertTrue(value.equals(input.toStr()));

				      final IValue output = module.runMethod("str3Concat", input);

				      assertTrue(output.isString());

				      String expectedOutput = new StringBuilder().append(value).append(value).append(value).toString();

				      assertTrue(expectedOutput.equals(output.toStr()));

				    }

				  }

				  protected abstract String assetFilePath(String assetName) throws IOException;

				}

									
										3

android/pytorch_android/src/main/cpp/cmake_macros.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,3 @@

				#pragma once

				/* #undef TRACE_ENABLED */

									
										3

android/pytorch_android/src/main/cpp/cmake_macros.h.in
									
										Normal file
									
												View File
												
				@ -0,0 +1,3 @@

				#pragma once

				#cmakedefine TRACE_ENABLED

									
										657

android/pytorch_android/src/main/cpp/pytorch_jni.cpp
									
												View File
											
				@ -1,657 +0,0 @@

				#include <cassert>

				#include <iostream>

				#include <memory>

				#include <string>

				#include <fbjni/ByteBuffer.h>

				#include <fbjni/fbjni.h>

				#include <torch/script.h>

				namespace pytorch_jni {

				constexpr static int kTensorDTypeUInt8 = 1;

				constexpr static int kTensorDTypeInt8 = 2;

				constexpr static int kTensorDTypeInt32 = 3;

				constexpr static int kTensorDTypeFloat32 = 4;

				constexpr static int kTensorDTypeInt64 = 5;

				constexpr static int kTensorDTypeFloat64 = 6;

				template <typename K = jobject, typename V = jobject>

				struct JHashMap

				    : facebook::jni::JavaClass<JHashMap<K, V>, facebook::jni::JMap<K, V>> {

				  constexpr static auto kJavaDescriptor = "Ljava/util/HashMap;";

				  using Super =

				      facebook::jni::JavaClass<JHashMap<K, V>, facebook::jni::JMap<K, V>>;

				  static facebook::jni::local_ref<JHashMap<K, V>> create() {

				    return Super::newInstance();

				  }

				  void put(

				      facebook::jni::alias_ref<facebook::jni::JObject::javaobject> key,

				      facebook::jni::alias_ref<facebook::jni::JObject::javaobject> value) {

				    static auto putMethod =

				        Super::javaClassStatic()

				            ->template getMethod<facebook::jni::alias_ref<

				                facebook::jni::JObject::javaobject>(

				                facebook::jni::alias_ref<facebook::jni::JObject::javaobject>,

				                facebook::jni::alias_ref<facebook::jni::JObject::javaobject>)>(

				                "put");

				    putMethod(Super::self(), key, value);

				  }

				};

				static at::Tensor newAtTensor(

				    facebook::jni::alias_ref<facebook::jni::JBuffer> jbuffer,

				    facebook::jni::alias_ref<jlongArray> jshape,

				    jint jdtype) {

				  const auto rank = jshape->size();

				  const auto shapeArr = jshape->getRegion(0, rank);

				  std::vector<int64_t> shapeVec{};

				  shapeVec.reserve(rank);

				  auto numel = 1;

				  for (auto i = 0; i < rank; ++i) {

				    shapeVec.push_back(shapeArr[i]);

				    numel *= shapeArr[i];

				  }

				  JNIEnv* jni = facebook::jni::Environment::current();

				  caffe2::TypeMeta typeMeta{};

				  int dataElementSizeBytes = 0;

				  if (kTensorDTypeFloat32 == jdtype) {

				    dataElementSizeBytes = 4;

				    typeMeta = caffe2::TypeMeta::Make<float>();

				  } else if (kTensorDTypeInt32 == jdtype) {

				    dataElementSizeBytes = 4;

				    typeMeta = caffe2::TypeMeta::Make<int32_t>();

				  } else if (kTensorDTypeInt8 == jdtype) {

				    dataElementSizeBytes = 1;

				    typeMeta = caffe2::TypeMeta::Make<int8_t>();

				  } else if (kTensorDTypeUInt8 == jdtype) {

				    dataElementSizeBytes = 1;

				    typeMeta = caffe2::TypeMeta::Make<uint8_t>();

				  } else if (kTensorDTypeFloat64 == jdtype) {

				    dataElementSizeBytes = 8;

				    typeMeta = caffe2::TypeMeta::Make<double>();

				  } else if (kTensorDTypeInt64 == jdtype) {

				    dataElementSizeBytes = 8;

				    typeMeta = caffe2::TypeMeta::Make<int64_t>();

				  } else {

				    facebook::jni::throwNewJavaException(

				        facebook::jni::gJavaLangIllegalArgumentException,

				        "Unknown Tensor jdtype %d",

				        jdtype);

				  }

				  const auto dataCapacity = jni->GetDirectBufferCapacity(jbuffer.get());

				  if (dataCapacity != numel) {

				    facebook::jni::throwNewJavaException(

				        facebook::jni::gJavaLangIllegalArgumentException,

				        "Tensor dimensions(elements number:%d, element byte size:%d, total "

				        "bytes:%d) inconsistent with buffer capacity(%d)",

				        numel,

				        dataElementSizeBytes,

				        numel * dataElementSizeBytes,

				        dataCapacity);

				  }

				  return torch::from_blob(

				      jni->GetDirectBufferAddress(jbuffer.get()),

				      torch::IntArrayRef(shapeVec),

				      at::TensorOptions(typeMeta));

				}

				class JTensor : public facebook::jni::JavaClass<JTensor> {

				 public:

				  constexpr static const char* kJavaDescriptor = "Lorg/pytorch/Tensor;";

				  static facebook::jni::local_ref<JTensor> newJTensor(

				      facebook::jni::alias_ref<facebook::jni::JByteBuffer> jBuffer,

				      facebook::jni::alias_ref<jlongArray> jShape,

				      jint jdtype) {

				    static auto jMethodNewTensor =

				        JTensor::javaClassStatic()

				            ->getStaticMethod<facebook::jni::local_ref<JTensor>(

				                facebook::jni::alias_ref<facebook::jni::JByteBuffer>,

				                facebook::jni::alias_ref<jlongArray>,

				                jint)>("nativeNewTensor");

				    return jMethodNewTensor(

				        JTensor::javaClassStatic(), jBuffer, jShape, jdtype);

				  }

				  static facebook::jni::local_ref<JTensor> newJTensorFromAtTensor(

				      const at::Tensor& tensor) {

				    const auto scalarType = tensor.scalar_type();

				    int jdtype = 0;

				    if (at::kFloat == scalarType) {

				      jdtype = kTensorDTypeFloat32;

				    } else if (at::kInt == scalarType) {

				      jdtype = kTensorDTypeInt32;

				    } else if (at::kByte == scalarType) {

				      jdtype = kTensorDTypeUInt8;

				    } else if (at::kChar == scalarType) {

				      jdtype = kTensorDTypeInt8;

				    } else if (at::kLong == scalarType) {

				      jdtype = kTensorDTypeInt64;

				    } else if (at::kDouble == scalarType) {

				      jdtype = kTensorDTypeFloat64;

				    } else {

				      facebook::jni::throwNewJavaException(

				          facebook::jni::gJavaLangIllegalArgumentException,

				          "at::Tensor scalar type is not supported on java side");

				    }

				    const auto& tensorShape = tensor.sizes();

				    std::vector<int64_t> tensorShapeVec;

				    for (const auto& s : tensorShape) {

				      tensorShapeVec.push_back(s);

				    }

				    facebook::jni::local_ref<jlongArray> jTensorShape =

				        facebook::jni::make_long_array(tensorShapeVec.size());

				    jTensorShape->setRegion(0, tensorShapeVec.size(), tensorShapeVec.data());

				    facebook::jni::local_ref<facebook::jni::JByteBuffer> jTensorBuffer =

				        facebook::jni::JByteBuffer::allocateDirect(tensor.nbytes());

				    jTensorBuffer->order(facebook::jni::JByteOrder::nativeOrder());

				    std::memcpy(

				        jTensorBuffer->getDirectBytes(),

				        tensor.storage().data(),

				        tensor.nbytes());

				    return JTensor::newJTensor(jTensorBuffer, jTensorShape, jdtype);

				  }

				  static at::Tensor newAtTensorFromJTensor(

				      facebook::jni::alias_ref<JTensor> jtensor) {

				    static const auto dtypeMethod =

				        JTensor::javaClassStatic()->getMethod<jint()>("dtype");

				    jint jdtype = dtypeMethod(jtensor);

				    static const auto shapeField =

				        JTensor::javaClassStatic()->getField<jlongArray>("shape");

				    auto jshape = jtensor->getFieldValue(shapeField);

				    static auto dataBufferMethod =

				        JTensor::javaClassStatic()

				            ->getMethod<

				                facebook::jni::local_ref<facebook::jni::JBuffer::javaobject>()>(

				                "getRawDataBuffer");

				    facebook::jni::local_ref<facebook::jni::JBuffer> jbuffer =

				        dataBufferMethod(jtensor);

				    return newAtTensor(jbuffer, jshape, jdtype);

				  }

				};

				class JIValue : public facebook::jni::JavaClass<JIValue> {

				 public:

				  constexpr static const char* kJavaDescriptor = "Lorg/pytorch/IValue;";

				  constexpr static int kTypeCodeNull = 1;

				  constexpr static int kTypeCodeTensor = 2;

				  constexpr static int kTypeCodeBool = 3;

				  constexpr static int kTypeCodeLong = 4;

				  constexpr static int kTypeCodeDouble = 5;

				  constexpr static int kTypeCodeString = 6;

				  constexpr static int kTypeCodeTuple = 7;

				  constexpr static int kTypeCodeBoolList = 8;

				  constexpr static int kTypeCodeLongList = 9;

				  constexpr static int kTypeCodeDoubleList = 10;

				  constexpr static int kTypeCodeTensorList = 11;

				  constexpr static int kTypeCodeList = 12;

				  constexpr static int kTypeCodeDictStringKey = 13;

				  constexpr static int kTypeCodeDictLongKey = 14;

				  static facebook::jni::local_ref<JIValue> newJIValueFromAtIValue(

				      const at::IValue& ivalue) {

				    if (ivalue.isNone()) {

				      static auto jMethodOptionalNull =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>()>(

				                  "optionalNull");

				      return jMethodOptionalNull(JIValue::javaClassStatic());

				    } else if (ivalue.isTensor()) {

				      static auto jMethodTensor =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::local_ref<JTensor>)>("tensor");

				      return jMethodTensor(

				          JIValue::javaClassStatic(),

				          JTensor::newJTensorFromAtTensor(ivalue.toTensor()));

				    } else if (ivalue.isBool()) {

				      static auto jMethodBool =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(jboolean)>(

				                  "bool");

				      return jMethodBool(JIValue::javaClassStatic(), ivalue.toBool());

				    } else if (ivalue.isInt()) {

				      static auto jMethodInt =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(jlong)>(

				                  "long64");

				      return jMethodInt(JIValue::javaClassStatic(), ivalue.toInt());

				    } else if (ivalue.isDouble()) {

				      static auto jMethodDouble =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(jdouble)>(

				                  "double64");

				      return jMethodDouble(JIValue::javaClassStatic(), ivalue.toDouble());

				    } else if (ivalue.isString()) {

				      static auto jMethodString =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<

				                      facebook::jni::JString::javaobject>)>("string");

				      return jMethodString(

				          JIValue::javaClassStatic(),

				          facebook::jni::make_jstring(ivalue.toStringRef()));

				    } else if (ivalue.isTuple()) {

				      auto elementsVec = ivalue.toTuple()->elements();

				      static auto jMethodTupleArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<facebook::jni::JArrayClass<

				                      JIValue::javaobject>::javaobject>)>("tuple");

				      auto jElementsArray =

				          facebook::jni::JArrayClass<JIValue::javaobject>::newArray(

				              elementsVec.size());

				      auto index = 0;

				      for (const auto& e : elementsVec) {

				        (*jElementsArray)[index++] = JIValue::newJIValueFromAtIValue(e);

				      }

				      return jMethodTupleArr(JIValue::javaClassStatic(), jElementsArray);

				    } else if (ivalue.isBoolList()) {

				      auto list = ivalue.toBoolList();

				      static auto jMethodBoolListArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<jbooleanArray>)>("boolList");

				      size_t n = list.size();

				      auto jArray = facebook::jni::make_boolean_array(n);

				      auto jArrayPinned = jArray->pin();

				      auto index = 0;

				      for (const auto& e : list) {

				        jArrayPinned[index++] = e;

				      }

				      return jMethodBoolListArr(JIValue::javaClassStatic(), jArray);

				    } else if (ivalue.isIntList()) {

				      auto list = ivalue.toIntList();

				      static auto jMethodLongListArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<jlongArray>)>("longList");

				      size_t n = list.size();

				      auto jArray = facebook::jni::make_long_array(n);

				      auto jArrayPinned = jArray->pin();

				      auto index = 0;

				      for (const auto& e : list) {

				        jArrayPinned[index++] = e;

				      }

				      return jMethodLongListArr(JIValue::javaClassStatic(), jArray);

				    } else if (ivalue.isDoubleList()) {

				      auto list = ivalue.toDoubleList();

				      static auto jMethoDoubleListArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<jdoubleArray>)>("doubleList");

				      size_t n = list.size();

				      auto jArray = facebook::jni::make_double_array(n);

				      auto jArrayPinned = jArray->pin();

				      auto index = 0;

				      for (const auto& e : list) {

				        jArrayPinned[index++] = e;

				      }

				      return jMethoDoubleListArr(JIValue::javaClassStatic(), jArray);

				    } else if (ivalue.isTensorList()) {

				      auto list = ivalue.toTensorList();

				      static auto jMethodTensorListArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<facebook::jni::JArrayClass<

				                      JTensor::javaobject>::javaobject>)>("tensorList");

				      auto jArray = facebook::jni::JArrayClass<JTensor::javaobject>::newArray(

				          list.size());

				      auto index = 0;

				      for (const auto& e : list) {

				        (*jArray)[index++] = JTensor::newJTensorFromAtTensor(e);

				      }

				      return jMethodTensorListArr(JIValue::javaClassStatic(), jArray);

				    } else if (ivalue.isGenericList()) {

				      auto list = ivalue.toGenericList();

				      static auto jMethodListArr =

				          JIValue::javaClassStatic()

				              ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                  facebook::jni::alias_ref<facebook::jni::JArrayClass<

				                      JIValue::javaobject>::javaobject>)>("list");

				      auto jArray = facebook::jni::JArrayClass<JIValue::javaobject>::newArray(

				          list.size());

				      auto index = 0;

				      for (const auto& e : list) {

				        (*jArray)[index++] = JIValue::newJIValueFromAtIValue(e);

				      }

				      return jMethodListArr(JIValue::javaClassStatic(), jArray);

				    } else if (ivalue.isGenericDict()) {

				      auto dict = ivalue.toGenericDict();

				      const auto keyType = dict.keyType();

				      if (!keyType) {

				        facebook::jni::throwNewJavaException(

				            facebook::jni::gJavaLangIllegalArgumentException,

				            "Unknown IValue-Dict key type");

				      }

				      const auto keyTypeKind = keyType->kind();

				      if (c10::TypeKind::StringType == keyTypeKind) {

				        static auto jMethodDictStringKey =

				            JIValue::javaClassStatic()

				                ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                    facebook::jni::alias_ref<facebook::jni::JMap<

				                        facebook::jni::alias_ref<

				                            facebook::jni::JString::javaobject>,

				                        facebook::jni::alias_ref<JIValue::javaobject>>>)>(

				                    "dictStringKey");

				        auto jmap = JHashMap<

				            facebook::jni::alias_ref<facebook::jni::JString::javaobject>,

				            facebook::jni::alias_ref<JIValue::javaobject>>::create();

				        for (auto& pair : dict) {

				          jmap->put(

				              facebook::jni::make_jstring(pair.key().toString()->string()),

				              JIValue::newJIValueFromAtIValue(pair.value()));

				        }

				        return jMethodDictStringKey(JIValue::javaClassStatic(), jmap);

				      } else if (c10::TypeKind::IntType == keyTypeKind) {

				        static auto jMethodDictLongKey =

				            JIValue::javaClassStatic()

				                ->getStaticMethod<facebook::jni::local_ref<JIValue>(

				                    facebook::jni::alias_ref<facebook::jni::JMap<

				                        facebook::jni::alias_ref<

				                            facebook::jni::JLong::javaobject>,

				                        facebook::jni::alias_ref<JIValue::javaobject>>>)>(

				                    "dictLongKey");

				        auto jmap = JHashMap<

				            facebook::jni::alias_ref<facebook::jni::JLong::javaobject>,

				            facebook::jni::alias_ref<JIValue::javaobject>>::create();

				        for (auto& pair : dict) {

				          jmap->put(

				              facebook::jni::JLong::valueOf(pair.key().toInt()),

				              JIValue::newJIValueFromAtIValue(pair.value()));

				        }

				        return jMethodDictLongKey(JIValue::javaClassStatic(), jmap);

				      }

				      facebook::jni::throwNewJavaException(

				          facebook::jni::gJavaLangIllegalArgumentException,

				          "Unsupported IValue-Dict key type");

				    }

				    facebook::jni::throwNewJavaException(

				        facebook::jni::gJavaLangIllegalArgumentException,

				        "Unsupported IValue type %s",

				        ivalue.tagKind().c_str());

				  }

				  static at::IValue JIValueToAtIValue(

				      facebook::jni::alias_ref<JIValue> jivalue) {

				    static const auto typeCodeField =

				        JIValue::javaClassStatic()->getField<jint>("mTypeCode");

				    const auto typeCode = jivalue->getFieldValue(typeCodeField);

				    if (JIValue::kTypeCodeNull == typeCode) {

				      return at::IValue{};

				    } else if (JIValue::kTypeCodeTensor == typeCode) {

				      static const auto jMethodGetTensor =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::alias_ref<JTensor::javaobject>()>(

				                  "getTensor");

				      return JTensor::newAtTensorFromJTensor(jMethodGetTensor(jivalue));

				    } else if (JIValue::kTypeCodeBool == typeCode) {

				      static const auto jMethodGetBool =

				          JIValue::javaClassStatic()->getMethod<jboolean()>("getBool");

				      // explicit cast to bool as jboolean is defined as uint8_t, IValue ctor

				      // for int will be called for jboolean

				      bool b = jMethodGetBool(jivalue);

				      return at::IValue{b};

				    } else if (JIValue::kTypeCodeLong == typeCode) {

				      static const auto jMethodGetLong =

				          JIValue::javaClassStatic()->getMethod<jlong()>("getLong");

				      return at::IValue{jMethodGetLong(jivalue)};

				    } else if (JIValue::kTypeCodeDouble == typeCode) {

				      static const auto jMethodGetDouble =

				          JIValue::javaClassStatic()->getMethod<jdouble()>("getDouble");

				      return at::IValue{jMethodGetDouble(jivalue)};

				    } else if (JIValue::kTypeCodeString == typeCode) {

				      static const auto jMethodGetString =

				          JIValue::javaClassStatic()->getMethod<jstring()>("getString");

				      return at::IValue{jMethodGetString(jivalue)->toStdString()};

				    } else if (JIValue::kTypeCodeTuple == typeCode) {

				      static const auto jMethodGetTuple =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::JArrayClass<

				                  JIValue::javaobject>::javaobject()>("getTuple");

				      auto jarray = jMethodGetTuple(jivalue);

				      size_t n = jarray->size();

				      std::vector<at::IValue> elements;

				      elements.reserve(n);

				      for (auto i = 0; i < n; ++i) {

				        auto jivalue_element = jarray->getElement(i);

				        auto element = JIValue::JIValueToAtIValue(jivalue_element);

				        elements.push_back(std::move(element));

				      }

				      return c10::ivalue::Tuple::create(std::move(elements));

				    } else if (JIValue::kTypeCodeBoolList == typeCode) {

				      static const auto jMethodGetBoolList =

				          JIValue::javaClassStatic()->getMethod<jbooleanArray()>("getBoolList");

				      auto jArray = jMethodGetBoolList(jivalue);

				      auto jArrayPinned = jArray->pin();

				      size_t n = jArrayPinned.size();

				      c10::List<bool> list{};

				      list.reserve(n);

				      for (size_t i = 0; i < n; ++i) {

				        list.push_back(jArrayPinned[i]);

				      }

				      return at::IValue{std::move(list)};

				    } else if (JIValue::kTypeCodeLongList == typeCode) {

				      static const auto jMethodGetLongList =

				          JIValue::javaClassStatic()->getMethod<jlongArray()>("getLongList");

				      auto jArray = jMethodGetLongList(jivalue);

				      auto jArrayPinned = jArray->pin();

				      size_t n = jArrayPinned.size();

				      c10::List<int64_t> list{};

				      list.reserve(n);

				      for (size_t i = 0; i < n; ++i) {

				        list.push_back(jArrayPinned[i]);

				      }

				      return at::IValue{std::move(list)};

				    } else if (JIValue::kTypeCodeDoubleList == typeCode) {

				      static const auto jMethodGetDoubleList =

				          JIValue::javaClassStatic()->getMethod<jdoubleArray()>(

				              "getDoubleList");

				      auto jArray = jMethodGetDoubleList(jivalue);

				      auto jArrayPinned = jArray->pin();

				      size_t n = jArrayPinned.size();

				      c10::List<double> list{};

				      list.reserve(n);

				      for (size_t i = 0; i < n; ++i) {

				        list.push_back(jArrayPinned[i]);

				      }

				      return at::IValue{std::move(list)};

				    } else if (JIValue::kTypeCodeTensorList == typeCode) {

				      static const auto jMethodGetTensorList =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::JArrayClass<

				                  JTensor::javaobject>::javaobject()>("getTensorList");

				      auto jArray = jMethodGetTensorList(jivalue);

				      size_t n = jArray->size();

				      c10::List<at::Tensor> list{};

				      list.reserve(n);

				      for (size_t i = 0; i < n; ++i) {

				        list.push_back(JTensor::newAtTensorFromJTensor(jArray->getElement(i)));

				      }

				      return at::IValue{std::move(list)};

				    } else if (JIValue::kTypeCodeList == typeCode) {

				      static const auto jMethodGetList =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::JArrayClass<

				                  JIValue::javaobject>::javaobject()>("getList");

				      auto jarray = jMethodGetList(jivalue);

				      size_t n = jarray->size();

				      if (n == 0) {

				        return at::IValue{c10::impl::GenericList(c10::TensorType::get())};

				      }

				      auto jivalue_first_element = jarray->getElement(0);

				      auto first_element = JIValue::JIValueToAtIValue(jivalue_first_element);

				      c10::TypePtr typePtr = c10::attemptToRecoverType(first_element);

				      c10::impl::GenericList list{typePtr};

				      list.reserve(n);

				      list.push_back(first_element);

				      for (auto i = 1; i < n; ++i) {

				        auto jivalue_element = jarray->getElement(i);

				        auto element = JIValue::JIValueToAtIValue(jivalue_element);

				        list.push_back(element);

				      }

				      return at::IValue{list};

				    } else if (JIValue::kTypeCodeDictStringKey == typeCode) {

				      static const auto jMethodGetDictStringKey =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::JMap<jstring, JIValue::javaobject>::

				                              javaobject()>("getDictStringKey");

				      auto jmap = jMethodGetDictStringKey(jivalue);

				      auto it = jmap->begin();

				      if (it == jmap->end()) {

				        return at::IValue{c10::impl::GenericDict(

				            c10::StringType::get(), c10::TensorType::get())};

				      }

				      auto firstEntryValue = JIValue::JIValueToAtIValue(it->second);

				      c10::TypePtr typePtr = c10::attemptToRecoverType(firstEntryValue);

				      c10::impl::GenericDict dict{c10::StringType::get(), typePtr};

				      dict.insert(it->first->toStdString(), firstEntryValue);

				      it++;

				      for (; it != jmap->end(); it++) {

				        dict.insert(

				            it->first->toStdString(), JIValue::JIValueToAtIValue(it->second));

				      }

				      return at::IValue{dict};

				    } else if (JIValue::kTypeCodeDictLongKey == typeCode) {

				      static const auto jMethodGetDictLongKey =

				          JIValue::javaClassStatic()

				              ->getMethod<facebook::jni::JMap<

				                  facebook::jni::JLong::javaobject,

				                  JIValue::javaobject>::javaobject()>("getDictLongKey");

				      auto jmap = jMethodGetDictLongKey(jivalue);

				      auto it = jmap->begin();

				      if (it == jmap->end()) {

				        return at::IValue{c10::impl::GenericDict(

				            c10::IntType::get(), c10::TensorType::get())};

				      }

				      auto firstEntryValue = JIValue::JIValueToAtIValue(it->second);

				      c10::TypePtr typePtr = c10::attemptToRecoverType(firstEntryValue);

				      c10::impl::GenericDict dict{c10::IntType::get(), typePtr};

				      dict.insert(it->first->longValue(), firstEntryValue);

				      it++;

				      for (; it != jmap->end(); it++) {

				        dict.insert(

				            it->first->longValue(), JIValue::JIValueToAtIValue(it->second));

				      }

				      return at::IValue{dict};

				    }

				    facebook::jni::throwNewJavaException(

				        facebook::jni::gJavaLangIllegalArgumentException,

				        "Unknown IValue typeCode %d",

				        typeCode);

				  }

				};

				class PytorchJni : public facebook::jni::HybridClass<PytorchJni> {

				 private:

				  friend HybridBase;

				  torch::jit::script::Module module_;

				 public:

				  constexpr static auto kJavaDescriptor = "Lorg/pytorch/Module$NativePeer;";

				  static facebook::jni::local_ref<jhybriddata> initHybrid(

				      facebook::jni::alias_ref<jclass>,

				      facebook::jni::alias_ref<jstring> modelPath) {

				    return makeCxxInstance(modelPath);

				  }

				  PytorchJni(facebook::jni::alias_ref<jstring> modelPath) {

				    auto qengines = at::globalContext().supportedQEngines();

				    if (std::find(qengines.begin(), qengines.end(), at::QEngine::QNNPACK) !=

				        qengines.end()) {

				      at::globalContext().setQEngine(at::QEngine::QNNPACK);

				    }

				    module_ = torch::jit::load(std::move(modelPath->toStdString()));

				    module_.eval();

				  }

				  static void registerNatives() {

				    registerHybrid({

				        makeNativeMethod("initHybrid", PytorchJni::initHybrid),

				        makeNativeMethod("forward", PytorchJni::forward),

				        makeNativeMethod("runMethod", PytorchJni::runMethod),

				    });

				  }

				  facebook::jni::local_ref<JIValue> forward(

				      facebook::jni::alias_ref<

				          facebook::jni::JArrayClass<JIValue::javaobject>::javaobject>

				          jinputs) {

				    std::vector<at::IValue> inputs{};

				    size_t n = jinputs->size();

				    inputs.reserve(n);

				    for (size_t i = 0; i < n; i++) {

				      at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i));

				      inputs.push_back(std::move(atIValue));

				    }

				    auto output = [&]() {

				      torch::autograd::AutoGradMode guard(false);

				      at::AutoNonVariableTypeMode non_var_type_mode(true);

				      return module_.forward(std::move(inputs));

				    }();

				    return JIValue::newJIValueFromAtIValue(output);

				  }

				  facebook::jni::local_ref<JIValue> runMethod(

				      facebook::jni::alias_ref<facebook::jni::JString::javaobject> jmethodName,

				      facebook::jni::alias_ref<

				          facebook::jni::JArrayClass<JIValue::javaobject>::javaobject>

				          jinputs) {

				    std::string methodName = jmethodName->toStdString();

				    std::vector<at::IValue> inputs{};

				    size_t n = jinputs->size();

				    inputs.reserve(n);

				    for (size_t i = 0; i < n; i++) {

				      at::IValue atIValue = JIValue::JIValueToAtIValue(jinputs->getElement(i));

				      inputs.push_back(std::move(atIValue));

				    }

				    if (auto method = module_.find_method(methodName)) {

				      auto output = [&]() {

				        torch::autograd::AutoGradMode guard(false);

				        at::AutoNonVariableTypeMode non_var_type_mode(true);

				        return (*method)(std::move(inputs));

				      }();

				      return JIValue::newJIValueFromAtIValue(output);

				    }

				    facebook::jni::throwNewJavaException(

				        facebook::jni::gJavaLangIllegalArgumentException,

				        "Undefined method %s",

				        methodName.c_str());

				  }

				};

				} // namespace pytorch_jni

				JNIEXPORT jint JNICALL JNI_OnLoad(JavaVM* vm, void*) {

				  return facebook::jni::initialize(

				      vm, [] { pytorch_jni::PytorchJni::registerNatives(); });

				}

				`@ -0,0 +1 @@`
				`<manifest package="org.pytorch.deps" />`

Compare commits

1494 Commits v1.0.3 ... v1.4.0

12 .circleci/README.md Unescape Escape View File

2 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

58 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

38 .circleci/cimodel/data/caffe2_build_data.py Unescape Escape View File

31 .circleci/cimodel/data/caffe2_build_definitions.py Unescape Escape View File

3 .circleci/cimodel/data/dimensions.py Unescape Escape View File

56 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

29 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

3 .circleci/cimodel/lib/conf_tree.py Unescape Escape View File

3 .circleci/cimodel/lib/miniutils.py Unescape Escape View File

3 .circleci/cimodel/lib/miniyaml.py Unescape Escape View File

2 .circleci/cimodel/lib/visualization.py Unescape Escape View File

2431 .circleci/config.yml View File

19 .circleci/docker/README.md Normal file Unescape Escape View File

1 .circleci/docker/android/AndroidManifest.xml Normal file Unescape Escape View File

68 .circleci/docker/android/build.gradle Normal file Unescape Escape View File

275 .circleci/docker/build.sh Executable file Unescape Escape View File

49 .circleci/docker/build_docker.sh Executable file Unescape Escape View File

129 .circleci/docker/common/install_android.sh Executable file Unescape Escape View File

75 .circleci/docker/common/install_base.sh Executable file Unescape Escape View File

35 .circleci/docker/common/install_cache.sh Normal file Unescape Escape View File

44 .circleci/docker/common/install_clang.sh Executable file Unescape Escape View File

16 .circleci/docker/common/install_cmake.sh Executable file Unescape Escape View File

94 .circleci/docker/common/install_conda.sh Executable file Unescape Escape View File

61 .circleci/docker/common/install_db.sh Executable file Unescape Escape View File

19 .circleci/docker/common/install_gcc.sh Normal file Unescape Escape View File

6 .circleci/docker/common/install_jni.sh Normal file Unescape Escape View File

20 .circleci/docker/common/install_katex.sh Normal file Unescape Escape View File

13 .circleci/docker/common/install_ninja.sh Normal file Unescape Escape View File

56 .circleci/docker/common/install_protobuf.sh Executable file Unescape Escape View File

14 .circleci/docker/common/install_thrift.sh Executable file Unescape Escape View File

94 .circleci/docker/common/install_travis_python.sh Executable file Unescape Escape View File

20 .circleci/docker/common/install_user.sh Executable file Unescape Escape View File

57 .circleci/docker/common/install_vision.sh Executable file Unescape Escape View File

1143 .circleci/docker/java/jni.h Normal file View File

85 .circleci/docker/ubuntu-cuda/Dockerfile Normal file Unescape Escape View File

114 .circleci/docker/ubuntu/Dockerfile Normal file Unescape Escape View File

10 .circleci/generate_config_yml.py Unescape Escape View File

4 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

29 .circleci/scripts/binary_ios_test.sh Normal file Unescape Escape View File

4 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

2 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

28 .circleci/scripts/binary_macos_test.sh Unescape Escape View File

37 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

4 .circleci/scripts/binary_run_in_docker.sh Unescape Escape View File

36 .circleci/scripts/build_android_gradle.sh Unescape Escape View File

2 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

26 .circleci/scripts/should_run_job.py Unescape Escape View File

12 .circleci/verbatim-sources/binary-job-specs.yml Unescape Escape View File

16 .circleci/verbatim-sources/caffe2-job-specs.yml Unescape Escape View File

21 .circleci/verbatim-sources/docker_build_job.yml Normal file Unescape Escape View File

134 .circleci/verbatim-sources/job-specs-custom.yml Unescape Escape View File

4 .circleci/verbatim-sources/job-specs-setup.yml Unescape Escape View File

106 .circleci/verbatim-sources/pytorch-job-specs.yml Unescape Escape View File

20 .circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml Unescape Escape View File

66 .circleci/verbatim-sources/workflows-docker-builder.yml Normal file Unescape Escape View File

8 .circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml Unescape Escape View File

16 .circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml Unescape Escape View File

16 .circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml Normal file Unescape Escape View File

12 .circleci/verbatim-sources/workflows-pytorch-ios-builds.yml Unescape Escape View File

7 .circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml Normal file Unescape Escape View File

13 .circleci/verbatim-sources/workflows-s3-html.yml Unescape Escape View File

56 .clang-tidy Unescape Escape View File

5 .flake8 Unescape Escape View File

182 .github/workflows/lint.yml vendored Unescape Escape View File

1 .gitignore vendored Unescape Escape View File

2 .gitmodules vendored Unescape Escape View File

9 .jenkins/caffe2/build.sh Unescape Escape View File

27 .jenkins/caffe2/test.sh Unescape Escape View File

135 .jenkins/pytorch/build.sh Unescape Escape View File

8 .jenkins/pytorch/common.sh Unescape Escape View File

12 .jenkins/pytorch/macos-common.sh Unescape Escape View File

3 .jenkins/pytorch/macos-test.sh Unescape Escape View File

6 .jenkins/pytorch/multigpu-test.sh Unescape Escape View File

40 .jenkins/pytorch/test.sh Unescape Escape View File

2 .jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat Unescape Escape View File

2 .jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat Unescape Escape View File

22 CMakeLists.txt Unescape Escape View File

1494 Commits

v1.0.3 ... v1.4.0

12

.circleci/README.md

View File

2

.circleci/cimodel/data/binary_build_data.py

View File

58

.circleci/cimodel/data/binary_build_definitions.py

View File

38

.circleci/cimodel/data/caffe2_build_data.py

View File

31

.circleci/cimodel/data/caffe2_build_definitions.py

View File

3

.circleci/cimodel/data/dimensions.py

View File

56

.circleci/cimodel/data/pytorch_build_data.py

View File

29

.circleci/cimodel/data/pytorch_build_definitions.py

View File

3

.circleci/cimodel/lib/conf_tree.py

View File

3

.circleci/cimodel/lib/miniutils.py

View File

3

.circleci/cimodel/lib/miniyaml.py

View File

2

.circleci/cimodel/lib/visualization.py

View File

2431

.circleci/config.yml

View File

19

.circleci/docker/README.md Normal file

View File

1

.circleci/docker/android/AndroidManifest.xml Normal file

View File

68

.circleci/docker/android/build.gradle Normal file

View File

275

.circleci/docker/build.sh Executable file

View File

49

.circleci/docker/build_docker.sh Executable file

View File

129

.circleci/docker/common/install_android.sh Executable file

View File

75

.circleci/docker/common/install_base.sh Executable file

View File

35

.circleci/docker/common/install_cache.sh Normal file

View File

44

.circleci/docker/common/install_clang.sh Executable file

View File

16

.circleci/docker/common/install_cmake.sh Executable file

View File

94

.circleci/docker/common/install_conda.sh Executable file

View File

61

.circleci/docker/common/install_db.sh Executable file

View File

19

.circleci/docker/common/install_gcc.sh Normal file

View File

6

.circleci/docker/common/install_jni.sh Normal file

View File

20

.circleci/docker/common/install_katex.sh Normal file

View File

13

.circleci/docker/common/install_ninja.sh Normal file

View File

56

.circleci/docker/common/install_protobuf.sh Executable file

View File

14

.circleci/docker/common/install_thrift.sh Executable file

View File

94

.circleci/docker/common/install_travis_python.sh Executable file

View File

20

.circleci/docker/common/install_user.sh Executable file

View File

57

.circleci/docker/common/install_vision.sh Executable file

View File

1143

.circleci/docker/java/jni.h Normal file

View File

85

.circleci/docker/ubuntu-cuda/Dockerfile Normal file

View File

114

.circleci/docker/ubuntu/Dockerfile Normal file

View File

10

.circleci/generate_config_yml.py

View File

4

.circleci/scripts/binary_ios_build.sh

View File

29

.circleci/scripts/binary_ios_test.sh Normal file

View File

4

.circleci/scripts/binary_ios_upload.sh

View File

2

.circleci/scripts/binary_linux_test.sh

View File

28

.circleci/scripts/binary_macos_test.sh

View File

37

.circleci/scripts/binary_populate_env.sh

View File

4

.circleci/scripts/binary_run_in_docker.sh

View File

36

.circleci/scripts/build_android_gradle.sh

View File

2

.circleci/scripts/cpp_doc_push_script.sh

View File

26

.circleci/scripts/should_run_job.py

View File

12

.circleci/verbatim-sources/binary-job-specs.yml

View File

16

.circleci/verbatim-sources/caffe2-job-specs.yml

View File

21

.circleci/verbatim-sources/docker_build_job.yml Normal file

View File

134

.circleci/verbatim-sources/job-specs-custom.yml

View File

4

.circleci/verbatim-sources/job-specs-setup.yml

View File

106

.circleci/verbatim-sources/pytorch-job-specs.yml

View File

20

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml

View File

66

.circleci/verbatim-sources/workflows-docker-builder.yml Normal file

View File

8

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml

View File

16

.circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml

View File

16

.circleci/verbatim-sources/workflows-pytorch-ge-config-tests.yml Normal file

View File

12

.circleci/verbatim-sources/workflows-pytorch-ios-builds.yml

View File

7

.circleci/verbatim-sources/workflows-pytorch-mobile-builds.yml Normal file

View File

13

.circleci/verbatim-sources/workflows-s3-html.yml

View File

56

.clang-tidy

View File

5

.flake8

View File

182

.github/workflows/lint.yml vendored

View File

1

.gitignore vendored

View File

2

.gitmodules vendored

View File

9

.jenkins/caffe2/build.sh

View File

27

.jenkins/caffe2/test.sh

View File

135

.jenkins/pytorch/build.sh

View File

8

.jenkins/pytorch/common.sh

View File

12

.jenkins/pytorch/macos-common.sh

View File

3

.jenkins/pytorch/macos-test.sh

View File

6

.jenkins/pytorch/multigpu-test.sh

View File

40

.jenkins/pytorch/test.sh

View File

2

.jenkins/pytorch/win-test-helpers/setup_pytorch_env.bat

View File

2

.jenkins/pytorch/win-test-helpers/test_python_all_except_nn.bat

View File

22

CMakeLists.txt

View File

14

CODEOWNERS

View File